Ebook476 pages2 hours

Mastering Python Data Analysis

Name: Mastering Python Data Analysis
Author: Magnus Vilhelm Persson
ISBN: 9781783553303

By Magnus Vilhelm Persson and Luiz Felipe Martins

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Clean, format, and explore data using graphical and numerical summaries
Leverage the IPython environment to efficiently analyze data with Python
Packed with easy-to-follow examples to develop advanced computational skills for the analysis of complex data

Who This Book Is For

If you are a competent Python developer who wants to take your data analysis skills to the next level by solving complex problems, then this advanced guide is for you. Familiarity with the basics of applying Python libraries to data sets is assumed.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJun 27, 2016

ISBN9781783553303

Author

Magnus Vilhelm Persson

Related authors

Skip carousel

Related to Mastering Python Data Analysis

Related ebooks

Skip carousel

Python Data Analysis
Ebook
Python Data Analysis
byIvan Idris
Rating: 4 out of 5 stars
4/5
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Python Data Science Essentials - Second Edition
Ebook
Python Data Science Essentials - Second Edition
byBoschetti Alberto
Rating: 4 out of 5 stars
4/5
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Python For Data Science
Ebook
Python For Data Science
byKevin Clark
Rating: 0 out of 5 stars
0 ratings
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Practical Data Science Cookbook - Second Edition
Ebook
Practical Data Science Cookbook - Second Edition
byTony Ojeda
Rating: 0 out of 5 stars
0 ratings
Regression Analysis with Python
Ebook
Regression Analysis with Python
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Ebook
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
byFabio Nelli
Rating: 0 out of 5 stars
0 ratings
Mastering Data Mining with Python – Find patterns hidden in your data
Ebook
Mastering Data Mining with Python – Find patterns hidden in your data
byMegan Squire
Rating: 0 out of 5 stars
0 ratings
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Building Machine Learning Systems with Python
Ebook
Building Machine Learning Systems with Python
byWilli Richert
Rating: 4 out of 5 stars
4/5
NumPy Essentials
Ebook
NumPy Essentials
byLeo (Liang-Huan) Chin
Rating: 0 out of 5 stars
0 ratings
Large Scale Machine Learning with Python
Ebook
Large Scale Machine Learning with Python
byBastiaan Sjardin
Rating: 2 out of 5 stars
2/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
Learning Bayesian Models with R
Ebook
Learning Bayesian Models with R
byM.Koduvely Dr. Hari
Rating: 5 out of 5 stars
5/5
Interactive Applications Using Matplotlib
Ebook
Interactive Applications Using Matplotlib
byBenjamin V. Root
Rating: 0 out of 5 stars
0 ratings
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others
Ebook
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others
byAnish Chapagain
Rating: 0 out of 5 stars
0 ratings
Python Web Scraping - Second Edition
Ebook
Python Web Scraping - Second Edition
byKatharine Jarmul
Rating: 5 out of 5 stars
5/5
Web Scraping with Python
Ebook
Web Scraping with Python
byRichard Lawson
Rating: 4 out of 5 stars
4/5
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Finance - Second Edition: Implement advanced state-of-the-art financial statistical applications using Python, 2nd Edition
Ebook
Mastering Python for Finance - Second Edition: Implement advanced state-of-the-art financial statistical applications using Python, 2nd Edition
byJames Ma Weiming
Rating: 4 out of 5 stars
4/5
Python Data Visualization Essentials Guide: Become a Data Visualization expert by building strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh
Ebook
Python Data Visualization Essentials Guide: Become a Data Visualization expert by building strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh
byKalilur Rahman
Rating: 0 out of 5 stars
0 ratings

Data Modeling & Design For You

Skip carousel

Programmable Logic Controllers
Ebook
Programmable Logic Controllers
byWilliam Bolton
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
AI and UX: Why Artificial Intelligence Needs User Experience
Ebook
AI and UX: Why Artificial Intelligence Needs User Experience
byGavin Lew
Rating: 0 out of 5 stars
0 ratings
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
Python: Master the Art of Design Patterns
Ebook
Python: Master the Art of Design Patterns
byDusty Phillips
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Graph Databases in Action: Examples in Gremlin
Ebook
Graph Databases in Action: Examples in Gremlin
byJosh Perryman
Rating: 0 out of 5 stars
0 ratings
Mastering VB.NET: A Comprehensive Guide to Visual Basic .NET Programming
Ebook
Mastering VB.NET: A Comprehensive Guide to Visual Basic .NET Programming
byKameron Hussain
Rating: 0 out of 5 stars
0 ratings
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Ebook
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
byMatt Allington
Rating: 5 out of 5 stars
5/5
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Quality metrics for semantic interoperability in Health Informatics
Ebook
Quality metrics for semantic interoperability in Health Informatics
byAlberto Moreno Conde
Rating: 0 out of 5 stars
0 ratings
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Living in Data: A Citizen's Guide to a Better Information Future
Ebook
Living in Data: A Citizen's Guide to a Better Information Future
byJer Thorp
Rating: 4 out of 5 stars
4/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction
Ebook
The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction
byAndy Mitchell
Rating: 0 out of 5 stars
0 ratings
Minding the Machines: Building and Leading Data Science and Analytics Teams
Ebook
Minding the Machines: Building and Leading Data Science and Analytics Teams
byJeremy Adamson
Rating: 0 out of 5 stars
0 ratings
Neural Networks: Neural Networks Tools and Techniques for Beginners
Ebook
Neural Networks: Neural Networks Tools and Techniques for Beginners
byJohn Slavio
Rating: 5 out of 5 stars
5/5
Kafka in Action
Ebook
Kafka in Action
byDylan Scott
Rating: 0 out of 5 stars
0 ratings
Learning Cypher
Ebook
Learning Cypher
byOnofrio Panzarino
Rating: 0 out of 5 stars
0 ratings
Think Like a Data Scientist: Tackle the data science process step-by-step
Ebook
Think Like a Data Scientist: Tackle the data science process step-by-step
byBrian Godsey
Rating: 0 out of 5 stars
0 ratings
What Makes Us Smart: The Computational Logic of Human Cognition
Ebook
What Makes Us Smart: The Computational Logic of Human Cognition
bySamuel J. Gershman
Rating: 0 out of 5 stars
0 ratings
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
Ebook
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
byHerbert Jones
Rating: 5 out of 5 stars
5/5
Mastering Agile User Stories
Ebook
Mastering Agile User Stories
byDeEtta Balthazar
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Measuring Your Python Learning Progress
Podcast episode
Measuring Your Python Learning Progress
byThe Real Python Podcast
100%
100% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Advantages of Completing Small Python Projects
Podcast episode
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
Going Beyond the Basic Stuff With Python and Al Sweigart
Podcast episode
Going Beyond the Basic Stuff With Python and Al Sweigart
byThe Real Python Podcast
0 ratings
0% found this document useful
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
Podcast episode
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
byScreaming in the Cloud
0 ratings
0% found this document useful
#35 Data Science in Finance
Podcast episode
#35 Data Science in Finance
byDataFramed
0 ratings
0% found this document useful
Effective Pandas Patterns For Data Engineering: An interview with Matt Harrison about how to write effective pandas code for scalable and maintainable data processing logic that can be understood by other members of your team.
Podcast episode
Effective Pandas Patterns For Data Engineering: An interview with Matt Harrison about how to write effective pandas code for scalable and maintainable data processing logic that can be understood by other members of your team.
byData Engineering Podcast
0 ratings
0% found this document useful
Unraveling Python's Syntax to Its Core With Brett Cannon
Podcast episode
Unraveling Python's Syntax to Its Core With Brett Cannon
byThe Real Python Podcast
100%
100% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
#54 Women in Data Science
Podcast episode
#54 Women in Data Science
byDataFramed
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Harnessing Python for Research: Scientific Applications of Python with Michael Kennedy: Still scrabbling with Excel? Consider Python language uses, says programmer and podcaster Michael Kennedy. A general programming language that is easy to use in multiple environments, Python programming is limitless and has numerous open source...
Podcast episode
Harnessing Python for Research: Scientific Applications of Python with Michael Kennedy: Still scrabbling with Excel? Consider Python language uses, says programmer and podcaster Michael Kennedy. A general programming language that is easy to use in multiple environments, Python programming is limitless and has numerous open source...
byFinding Genius Podcast
0 ratings
0% found this document useful
386 The Top 10 Books To Learn Python - Simple Programmer Podcast: Have you ever wondered what are the best books to learn Python? "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic...
Podcast episode
386 The Top 10 Books To Learn Python - Simple Programmer Podcast: Have you ever wondered what are the best books to learn Python? "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic...
bySimple Programmer Podcast
0 ratings
0% found this document useful
Naomi Cedar - People-Centric Community Building: Robby has a chat with Independent Python Instructor and Consultant, Naomi Ceder (she/her/hers), about the importance of weighing up the costs of using 3rd party tools vs rolling your own solution, working in small teams through a career, what to consider when weighing up a rewrite vs refactoring, considerations one should make to become a technical writer and much more.
Podcast episode
Naomi Cedar - People-Centric Community Building: Robby has a chat with Independent Python Instructor and Consultant, Naomi Ceder (she/her/hers), about the importance of weighing up the costs of using 3rd party tools vs rolling your own solution, working in small teams through a career, what to consider when weighing up a rewrite vs refactoring, considerations one should make to become a technical writer and much more.
byMaintainable
0 ratings
0% found this document useful
#110 - Dane Hillard on Python packaging and effective developer tooling
Podcast episode
#110 - Dane Hillard on Python packaging and effective developer tooling
byPybites Podcast
0 ratings
0% found this document useful
Episode 77 - Open Source
Podcast episode
Episode 77 - Open Source
byThe Structural Engineering Podcast
0 ratings
0% found this document useful
Supercharging Your Process Mining with Python
Podcast episode
Supercharging Your Process Mining with Python
byMining Your Business
0 ratings
0% found this document useful
QuPath - open source quantitative pathology not only for pathologists w/ Pete Bankhead, University of Edinburgh
Podcast episode
QuPath - open source quantitative pathology not only for pathologists w/ Pete Bankhead, University of Edinburgh
byDigital Pathology Podcast
0 ratings
0% found this document useful
Agile Applied AI Research with Parvez Ahammad - #492: Today we’re joined by Parvez Ahammad, head of data science applied research at LinkedIn. In our conversation, Parvez shares his interesting take on organizing principles for his organization, starting with how data science teams are broadly...
Podcast episode
Agile Applied AI Research with Parvez Ahammad - #492: Today we’re joined by Parvez Ahammad, head of data science applied research at LinkedIn. In our conversation, Parvez shares his interesting take on organizing principles for his organization, starting with how data science teams are broadly...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
Podcast episode
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
byAI Live & Unbiased
0 ratings
0% found this document useful
ChatGPT and InstructGPT: Aligning Language Models to Human Intention
Podcast episode
ChatGPT and InstructGPT: Aligning Language Models to Human Intention
byDeep Papers
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
#043 - Becoming a prolific Python content provider
Podcast episode
#043 - Becoming a prolific Python content provider
byPybites Podcast
0 ratings
0% found this document useful
Too DEV.to Quit: This week on the podcast, we sit down with Jess Lee, one of the co-founders of DEV, a social network where programmers come to learn, chat, and share ideas with a community of other coders. She explains her strange journey from working as a tour manager for Kidz Bop to building one of the fastest growing and most progressive online platforms for software developers.
Podcast episode
Too DEV.to Quit: This week on the podcast, we sit down with Jess Lee, one of the co-founders of DEV, a social network where programmers come to learn, chat, and share ideas with a community of other coders. She explains her strange journey from working as a tour manager for Kidz Bop to building one of the fastest growing and most progressive online platforms for software developers.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
Podcast episode
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
byMLOps.community
0 ratings
0% found this document useful
DOP 206: Open Source Supply Chain Security With Pyrsia: #206: As an application developer, you’re probably used to pulling libraries from Maven Central, PyPI, or npm Registry. Has it ever crossed your mind how secure is this thing that I’m pulling or do you just YOLO so you can get the job done? In...
Podcast episode
DOP 206: Open Source Supply Chain Security With Pyrsia: #206: As an application developer, you’re probably used to pulling libraries from Maven Central, PyPI, or npm Registry. Has it ever crossed your mind how secure is this thing that I’m pulling or do you just YOLO so you can get the job done? In...
byDevOps Paradox
0 ratings
0% found this document useful
#044 - Grow your skills by building
Podcast episode
#044 - Grow your skills by building
byPybites Podcast
0 ratings
0% found this document useful

Skip carousel

Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
PYTHON/GO Parsing XML files
Linux Format
Article
PYTHON/GO Parsing XML files
Jul 2, 2019
8 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
Family History Software: An Introduction
Family Tree UK
Article
Family History Software: An Introduction
Feb 11, 2020
5 min read
Note-taking Applications For Family History
Family Tree UK
Article
Note-taking Applications For Family History
Mar 10, 2023
7 min read
ChatGPT Changed Everything. Now Its Follow-Up Is Here.
The Atlantic
Article
ChatGPT Changed Everything. Now Its Follow-Up Is Here.
Mar 14, 2023
6 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Zulip Economy
Linux Format
Article
Zulip Economy
Oct 20, 2020
10 min read
AI Is Making Literary Leaps – Now We Need The Rules To Catch Up
The Guardian
Article
AI Is Making Literary Leaps – Now We Need The Rules To Catch Up
Nov 2, 2019
3 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
TechLife News
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 29, 2023
4 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
AppleMagazine
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 28, 2023
4 min read
This PC Does Not Exist
Maximum PC
Article
This PC Does Not Exist
May 23, 2023
7 min read
Terminal Velocity
Linux Format
Article
Terminal Velocity
Jun 4, 2019
9 min read
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Family Tree UK
Article
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
May 13, 2022
7 min read
Django Packages
Linux Format
Article
Django Packages
Jun 4, 2019
Django Packages (https://djangopackages.org) is a site where web developers and enthusiasts can share their tools and creations. Before you start any project, go there and pick the closest thing to what you are trying to create. These packages are no
1 min read
As AI Language Skills Grow, So Do Scientists' Concerns
The Independent
Article
As AI Language Skills Grow, So Do Scientists' Concerns
Jul 17, 2022
5 min read
How An A.i. Chatbot Works
Muse: The magazine of science, culture, and smart laughs for kids and children
Article
How An A.i. Chatbot Works
Feb 1, 2024
1 min read
Mathematics Packages
Linux Format
Article
Mathematics Packages
Sep 22, 2020
1 min read
Best New Apps
TechLife
Article
Best New Apps
Jul 26, 2021
3 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
As AI Language Skills Grow, So Do Scientists’ Concerns
AppleMagazine
Article
As AI Language Skills Grow, So Do Scientists’ Concerns
Jul 22, 2022
5 min read
As AI Language Skills Grow, So Do Scientists’ Concerns
TechLife News
Article
As AI Language Skills Grow, So Do Scientists’ Concerns
Dec 31, 2022
5 min read
As AI Language Skills Grow, So Do Scientists’ Concerns
TechLife News
Article
As AI Language Skills Grow, So Do Scientists’ Concerns
Jul 23, 2022
5 min read
Word Nerds May Be Faster At Learning To Code Than Math Whizzes
Futurity
Article
Word Nerds May Be Faster At Learning To Code Than Math Whizzes
Mar 3, 2020
4 min read

Related categories

Skip carousel

Reviews for Mastering Python Data Analysis

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Mastering Python Data Analysis - Magnus Vilhelm Persson

Mastering Python Data Analysis

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Tools of the Trade

Before you start

Using the notebook interface

Imports

An example using the Pandas library

Summary

2. Exploring Data

The General Social Survey

Obtaining the data

Reading the data

Univariate data

Histograms

Making things pretty

Characterization

Concept of statistical inference

Numeric summaries and boxplots

Relationships between variables – scatterplots

Summary

3. Learning About Models

Models and experiments

The cumulative distribution function

Working with distributions

The probability density function

Where do models come from?

Multivariate distributions

Summary

4. Regression

Introducing linear regression

Getting the dataset

Testing with linear regression

Multivariate regression

Adding economic indicators

Taking a step back

Logistic regression

Some notes

Summary

5. Clustering

Introduction to cluster finding

Starting out simple – John Snow on cholera

K-means clustering

Suicide rate versus GDP versus absolute latitude

Hierarchical clustering analysis

Reading in and reducing the data

Hierarchical cluster algorithm

Summary

6. Bayesian Methods

The Bayesian method

Credible versus confidence intervals

Bayes formula

Python packages

U.S. air travel safety record

Getting the NTSB database

Binning the data

Bayesian analysis of the data

Binning by month

Plotting coordinates

Cartopy

Mpl toolkits – basemap

Climate change - CO2 in the atmosphere

Getting the data

Creating and sampling the model

Summary

7. Supervised and Unsupervised Learning

Introduction to machine learning

Scikit-learn

Linear regression

Climate data

Checking with Bayesian analysis and OLS

Clustering

Seeds classification

Visualizing the data

Feature selection

Classifying the data

The SVC linear kernel

The SVC Radial Basis Function

The SVC polynomial

K-Nearest Neighbour

Random Forest

Choosing your classifier

Summary

8. Time Series Analysis

Introduction

Pandas and time series data

Indexing and slicing

Resampling, smoothing, and other estimates

Stationarity

Patterns and components

Decomposing components

Differencing

Time series models

Autoregressive – AR

Moving average – MA

Selecting p and q

Automatic function

The (Partial) AutoCorrelation Function

Autoregressive Integrated Moving Average – ARIMA

Summary

A. More on Jupyter Notebook and matplotlib Styles

Jupyter Notebook

Useful keyboard shortcuts

Command mode shortcuts

Edit mode shortcuts

Markdown cells

Notebook Python extensions

Installing the extensions

Codefolding

Collapsible headings

Help panel

Initialization cells

NbExtensions menu item

Ruler

Skip-traceback

Table of contents

Other Jupyter Notebook tips

External connections

Export

Additional file types

Matplotlib styles

Useful resources

General resources

Packages

Data repositories

Visualization of data

Summary

Mastering Python Data Analysis

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Month: June 2016

Production reference: 1230616

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78355-329-7

www.packtpub.com

Credits

About the Authors

Magnus Vilhelm Persson is a scientist with a passion for Python and open source software usage and development. He obtained his PhD in Physics/Astronomy from Copenhagen University’s Centre for Star and Planet Formation (StarPlan) in 2013. Since then, he has continued his research in Astronomy at various academic institutes across Europe. In his research, he uses various types of data and analysis to gain insights into how stars are formed. He has participated in radio shows about Astronomy and also organized workshops and intensive courses about the use of Python for data analysis.

You can check out his web page at http://vilhelm.nu.

This book would not have been possible without the great work that all the people at Packt are doing. I would like to highlight Arun, Bharat, Vinay, and Pranil's work. Thank you for your patience during the whole process. Furthermore, I would like to thank Packt for giving me the opportunity to develop and write this book, it was really fun and I learned a lot. There where times when the work was little overwhelming, but at those times, my colleague and friend Alan Heays always had some supporting words to say. Finally, my wife, Mihaela, is the most supportive partner anyone could ever have. For all the late evenings and nights where you pushed me to continue working on this to finish it, thank you. You are the most loving wife and best friend anyone could ever ask for.

Luiz Felipe Martins holds a PhD in applied mathematics from Brown University and has worked as a researcher and educator for more than 20 years. His research is mainly in the field of applied probability. He has been involved in developing code for open source homework system, WeBWorK, where he wrote a library for the visualization of systems of differential equations. He was supported by an NSF grant for this project. Currently, he is an associate professor in the department of mathematics at Cleveland State University, Cleveland, Ohio, where he has developed several courses in applied mathematics and scientific computing. His current duties include coordinating all first-year calculus sessions.

About the Reviewer

Hang (Harvey) Yu is a data scientist in Silicon Valley. He works on search engine development and model optimization. He has ample experience in big data and machine learning. He graduated from the University of Illinois at Urbana-Champaign with a background in data mining and statistics. Besides this book, he has also reviewed multiple other books and papers including Mastering Python Data Visualization and R Data Analysis Cookbook both by Packt Publishing. When Harvey is not coding, he is playing soccer, reading fiction books, or listening to classical music. You can get in touch with him at hangyu1@illinois.edu or on LinkedIn at http://www.linkedin.com/in/hangyu1.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

The use of Python for data analysis and visualization has only increased in popularity in the last few years. One reason for this is the availability and continued development of a number of excellent tools for conducting advanced data analysis and visualization. Another reason is the possibility of rapid and easy development, deployment, and sharing of code. For these reasons, Python has become one of the most widely used programming and scripting language for data analysis in many industries.

The aim of this book is to develop skills to effectively approach almost any data analysis problem, and extract all of the available information. This is done by introducing a range of varying techniques and methods such as uni- and multi-variate linear regression, cluster finding, Bayesian analysis, machine learning, and time series analysis. Exploratory data analysis is a key aspect to get a sense of what can be done and to maximize the insights that are gained from the data. Additionally, emphasis is put on presentation-ready figures that are clear and easy to interpret.

Knowing how to explore data and present results and conclusions from data analysis in a meaningful way is an important skill. While the theory behind statistical analysis is important to know, to be able to quickly and accurately perform hands-on sorting, reduction, analysis, and subsequently present the insights gained, is a make or break for today's quickly evolving business and academic sector.

What this book covers

Chapter 1, Tools of the Trade, provides an overview of the tools available for data analysis in Python and details the packages and libraries that will be used in the book with some installation tips. A quick example highlights the common data structure used in the Pandas package.

Chapter 2, Exploring Data, introduces methods for initial exploration of data, including numeric summaries and distributions, and various ways of displaying data, such as histograms, Kernel Density Estimation (KDE) plots, and box plots.

Chapter 3, Learning About Models, covers the concept of models in data analysis and how using the cumulative distribution function and probability density function can help characterize a variable. Furthermore, it shows how to make point estimates and generate random numbers with a given distribution.

Chapter 4, Regression, introduces linear, multiple, and logistic regression with in-depth examples of using SciPy and statsmodels packages to test various hypotheses of relationships between variables.

Chapter 5, Clustering, explains some of the theory behind cluster finding analysis and goes through some more complex examples using the K-means and hierarchical clustering algorithms available in SciPy.

Chapter 6, Bayesian Methods, explains how to construct and test a model using Bayesian analysis in Python using the PyMC package. It covers setting up stochastic and deterministic variables with prior information, constructing the model, running the Markov Chain Monte Carlo (MCMC) sampler, and interpreting the results. In addition, a short bonus section covers how to plot coordinates on maps using both the basemap and cartopy packages, which are important for presenting and analyzing data with geographical coordinate information.

Chapter 7, Supervised and Unsupervised Learning, looks at linear regression, clustering, and classification with two machine learning analysis techniques available in the Scikit-learn package.

Chapter 8, Time Series Analysis, examines various aspects of time series modeling using Pandas and statsmodels. Initially, the important concepts of smoothing, resampling, rolling estimates, and stationarity are covered. Later, autoregressive (AR), moving average (MA), and combined ARIMA models are explained and applied to one of the data sets, including making shorter forecasts using the constructed models.

Appendix, More on Jupyter Notebook and matplotlib Styles, shows some convenient extensions of Jupyter Notebook and some useful keyboard shortcuts to make the Jupyter workflow more efficient. The matplotlib style files are explained and how to customize plots even further to make beautiful figures ready for inclusion in reports. Lastly, various useful online resources are listed and described.

What you need for this book

All you need to follow through the examples in this book is a computer running any recent version of Python. While the examples use Python 3, they can easily be adapted to work with Python 2, with only minor changes. The packages used in the examples are NumPy, SciPy, matplotlib, Pandas, statsmodels, PyMC, Scikit-learn. Optionally, the packages basemap and cartopy are used to plot coordinate points on maps. The easiest way to obtain and maintain a Python environment that meets all the requirements of this book is to download a prepackaged Python distribution. In this book, we have checked all the code against Continuum Analytics' Anaconda Python distribution and Ubuntu Xenial Xerus (16.04) running Python 3.

To download the example data and code, an Internet connection is needed.

Who this book is for

This book is intended for professionals with a beginner to intermediate level of Python programming knowledge who want to move in the direction of solving more sophisticated problems and gain deeper insights through advanced data analysis. Some experience with the math behind basic statistics is assumed, but quick introductions are given where required. If you want to learn the breadth of statistical analysis techniques in Python and get an overview of the methods and tools available, you will find this book helpful. Each chapter consists of a number of examples using mostly real-world data to highlight various aspects of the topic and teach how to conduct data analysis from start to finish.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: This code has the effect of selecting matplotlib stylesheet mystyle.mplstyle.

A block of code is set as follows:

gss_data = pd.read_stata('data/GSS2012merged_R5.dta',

convert_categoricals=False)

gss_data.head()

Any command-line input or output is written as follows:

python -c 'import numpy'

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Here, you can check the box for add a toolbar button to open the shortcuts dialog/panel.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Python-Data-Analysis. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/masteringpythondataanalysis_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at copyright@packtpub.com with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have

Enjoying the preview?

Page 1 of 1

Mastering Python Data Analysis

About this ebook

Magnus Vilhelm Persson

Related authors

Related to Mastering Python Data Analysis

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for Mastering Python Data Analysis

What did you think?

Book preview

Mastering Python Data Analysis - Magnus Vilhelm Persson

Table of Contents

Mastering Python Data Analysis

Mastering Python Data Analysis

About the Authors

About the Reviewer

www.PacktPub.com

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions