Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way

Ebook243 pages1 hour

Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way

Name: Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way
Author: Hannah Stepanek
ISBN: 9781484258392

By Hannah Stepanek

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Understand and implement big data analysis solutions in pandas with an emphasis on performance. This book strengthens your intuition for working with pandas, the Python data analysis library, by exploring its underlying implementation and data structures.

Thinking in Pandas introduces the topic of big data and demonstrates concepts by looking at exciting and impactful projects that pandas helped to solve. From there, you will learn to assess your own projects by size and type to see if pandas is the appropriate library for your needs. Author Hannah Stepanek explains how to load and normalize data in pandas efficiently, and reviews some of the most commonly used loaders and several of their most powerful options. You will then learn how to access and transform data efficiently, what methods to avoid, and when to employ more advanced performance techniques. You will also go over basic data access and munging in pandas and the intuitive dictionary syntax. Choosing the right DataFrame format, working with multi-level DataFrames, and how pandas might be improved upon in the future are also covered.

By the end of the book, you will have a solid understanding of how the pandas library works under the hood. Get ready to make confident decisions in your own projects by utilizing pandas—the right way.

What You Will Learn

Understand the underlying data structure of pandas and why it performs the way it does under certain circumstances
Discover how to use pandas to extract, transform, and load data correctly with an emphasis on performance
Choose the right DataFrame so that the data analysis is simple and efficient.
Improve performance of pandas operations with other Python libraries

Who This Book Is For Software engineers with basic programming skills in Python keen on using pandas for a big data analysis project. Python software developers interested in big data.

Skip carousel

LanguageEnglish

PublisherApress

Release dateJun 5, 2020

ISBN9781484258392

Author

Hannah Stepanek

Related authors

Skip carousel

Related to Thinking in Pandas

Related ebooks

Skip carousel

Python for SAS Users: A SAS-Oriented Introduction to Python
Ebook
Python for SAS Users: A SAS-Oriented Introduction to Python
byRandy Betancourt
Rating: 0 out of 5 stars
0 ratings
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
Ebook
Pandas in 7 Days: Utilize Python to Manipulate Data, Conduct Scientific Computing, Time Series Analysis, and Exploratory Data Analysis
byFabio Nelli
Rating: 0 out of 5 stars
0 ratings
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Ebook
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
byPrateek Gupta
Rating: 0 out of 5 stars
0 ratings
Hands-On Machine Learning Recommender Systems with Apache Spark
Ebook
Hands-On Machine Learning Recommender Systems with Apache Spark
byErnesto Lee
Rating: 0 out of 5 stars
0 ratings
Mastering Pandas in Python: Course Book
Ebook
Mastering Pandas in Python: Course Book
byPedro Martins
Rating: 0 out of 5 stars
0 ratings
Large Scale Machine Learning with Python
Ebook
Large Scale Machine Learning with Python
byBastiaan Sjardin
Rating: 2 out of 5 stars
2/5
Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition)
Ebook
Data Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition)
byRituraj Dixit
Rating: 0 out of 5 stars
0 ratings
Advanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples
Ebook
Advanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples
bySayan Mukhopadhyay
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
Ebook
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
byPurna Chander Rao. Kathula
Rating: 5 out of 5 stars
5/5
Parallel and High Performance Programming with Python: Unlock parallel and concurrent programming in Python using multithreading, CUDA, Pytorch and Dask. (English Edition)
Ebook
Parallel and High Performance Programming with Python: Unlock parallel and concurrent programming in Python using multithreading, CUDA, Pytorch and Dask. (English Edition)
byFabio Nelli
Rating: 0 out of 5 stars
0 ratings
Python Data Persistence
Ebook
Python Data Persistence
byMalhar Lathkar
Rating: 0 out of 5 stars
0 ratings
DevOps in Python: Infrastructure as Python
Ebook
DevOps in Python: Infrastructure as Python
byMoshe Zadka
Rating: 0 out of 5 stars
0 ratings
Python for Secret Agents - Volume II
Ebook
Python for Secret Agents - Volume II
byLott Steven
Rating: 0 out of 5 stars
0 ratings
Python Mastery Unleashed: Advanced Programming Techniques
Ebook
Python Mastery Unleashed: Advanced Programming Techniques
byJarrel E.
Rating: 0 out of 5 stars
0 ratings
Instant MapReduce Patterns – Hadoop Essentials How-to
Ebook
Instant MapReduce Patterns – Hadoop Essentials How-to
bySrinath Perera
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Ebook
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
byYounes Hamdani
Rating: 0 out of 5 stars
0 ratings
Building Python Real-Time Applications with Storm
Ebook
Building Python Real-Time Applications with Storm
byBhatnagar Kartik
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Mastering Python Networking - Third Edition: Your one-stop solution to using Python for network automation, programmability, and DevOps, 3rd Edition
Ebook
Mastering Python Networking - Third Edition: Your one-stop solution to using Python for network automation, programmability, and DevOps, 3rd Edition
byEric Chou
Rating: 3 out of 5 stars
3/5
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Your First Python Program
Ebook
Your First Python Program
byAlexander Paz
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
PYTHON FOR BEGINNERS: Unraveling the Power of Python for Novice Coders (2023 Guide)
Ebook
PYTHON FOR BEGINNERS: Unraveling the Power of Python for Novice Coders (2023 Guide)
byMatthew Neel
Rating: 0 out of 5 stars
0 ratings
Hadoop Blueprints
Ebook
Hadoop Blueprints
byAnurag Shrivastava
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
Python: Learn Python in 24 Hours
Ebook
Python: Learn Python in 24 Hours
byAlex Nordeen
Rating: 4 out of 5 stars
4/5
A Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick
Ebook
A Slackers Guide to Coding with Python: Ultimate Beginners Guide to Learning Python Quick
byChris Y. Reynolds
Rating: 0 out of 5 stars
0 ratings
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Learn JavaScript in 24 Hours
Ebook
Learn JavaScript in 24 Hours
byAlex Nordeen
Rating: 3 out of 5 stars
3/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Programming Arduino: Getting Started with Sketches
Ebook
Programming Arduino: Getting Started with Sketches
bySimon Monk
Rating: 4 out of 5 stars
4/5
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
Ebook
Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications
byRobert Oliver
Rating: 0 out of 5 stars
0 ratings
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
Ebook
The Unofficial Guide to Open Broadcaster Software: OBS: The World's Most Popular Free Live-Streaming Application
byPaul Richards
Rating: 0 out of 5 stars
0 ratings
HTML in 30 Pages
Ebook
HTML in 30 Pages
byU.Q. Magnusson
Rating: 5 out of 5 stars
5/5
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
Ebook
Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS
byTravis Plunk
Rating: 0 out of 5 stars
0 ratings
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python for Beginners: Learn the Fundamentals of Computer Programming
Ebook
Python for Beginners: Learn the Fundamentals of Computer Programming
byJ Foster
Rating: 0 out of 5 stars
0 ratings
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
TensorFlow in 1 Day: Make your own Neural Network
Ebook
TensorFlow in 1 Day: Make your own Neural Network
byKrishna Rungta
Rating: 4 out of 5 stars
4/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byChris Minnick
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
Podcast episode
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
byMLOps.community
0 ratings
0% found this document useful
Scalable Python for Everyone, Everywhere // Matthew Rocklin // MLOps Meetup #38
Podcast episode
Scalable Python for Everyone, Everywhere // Matthew Rocklin // MLOps Meetup #38
byMLOps.community
0 ratings
0% found this document useful
Building the world's most popular data science platform: with Peter Wang, CEO of Anaconda
Podcast episode
Building the world's most popular data science platform: with Peter Wang, CEO of Anaconda
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
Podcast episode
Build Your Second Brain One Piece At A Time: Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.
byData Engineering Podcast
0 ratings
0% found this document useful
Move Your Database To The Data And Speed Up Your Analytics With DuckDB: An interview with Hannes Mühleisen about the DuckDB engine for in-process OLAP queries that lets you use the power of SQL and the flexibility of programming languages side by side.
Podcast episode
Move Your Database To The Data And Speed Up Your Analytics With DuckDB: An interview with Hannes Mühleisen about the DuckDB engine for in-process OLAP queries that lets you use the power of SQL and the flexibility of programming languages side by side.
byData Engineering Podcast
0 ratings
0% found this document useful
Cloud Dataflow with Frances Perry: Cloud Dataflow and its OSS counterpart Apache Beam are amazing tools for Big Data so we asked Frances Perry, the Tech Lead and PMC for those projects, to join us and tell us more about it.
Podcast episode
Cloud Dataflow with Frances Perry: Cloud Dataflow and its OSS counterpart Apache Beam are amazing tools for Big Data so we asked Frances Perry, the Tech Lead and PMC for those projects, to join us and tell us more about it.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Combining Python And SQL To Build A PyData Warehouse: An interview about how data warehouses fit into the PyData ecosystem for advanced analytics on big data
Podcast episode
Combining Python And SQL To Build A PyData Warehouse: An interview about how data warehouses fit into the PyData ecosystem for advanced analytics on big data
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Too DEV.to Quit: This week on the podcast, we sit down with Jess Lee, one of the co-founders of DEV, a social network where programmers come to learn, chat, and share ideas with a community of other coders. She explains her strange journey from working as a tour manager for Kidz Bop to building one of the fastest growing and most progressive online platforms for software developers.
Podcast episode
Too DEV.to Quit: This week on the podcast, we sit down with Jess Lee, one of the co-founders of DEV, a social network where programmers come to learn, chat, and share ideas with a community of other coders. She explains her strange journey from working as a tour manager for Kidz Bop to building one of the fastest growing and most progressive online platforms for software developers.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
386 The Top 10 Books To Learn Python - Simple Programmer Podcast: Have you ever wondered what are the best books to learn Python? "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic...
Podcast episode
386 The Top 10 Books To Learn Python - Simple Programmer Podcast: Have you ever wondered what are the best books to learn Python? "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic...
bySimple Programmer Podcast
0 ratings
0% found this document useful
#110 - Dane Hillard on Python packaging and effective developer tooling
Podcast episode
#110 - Dane Hillard on Python packaging and effective developer tooling
byPybites Podcast
0 ratings
0% found this document useful
Reflecting On The Past 6 Years Of Data Engineering: This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.
Podcast episode
Reflecting On The Past 6 Years Of Data Engineering: This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.
byData Engineering Podcast
0 ratings
0% found this document useful
Make Your Code More Readable With The Magic Of Refactoring Using Sourcery: An interview about the benefits of refactoring your code for clarity and ease of understanding and how Sourcery can help make it a habit.
Podcast episode
Make Your Code More Readable With The Magic Of Refactoring Using Sourcery: An interview about the benefits of refactoring your code for clarity and ease of understanding and how Sourcery can help make it a habit.
byThe Python Podcast.__init__
100%
100% found this document useful
756 Is PYTHON The FUTURE Of Programming? (With Rafeh Qazi From Clever Programmer) - Simple Programmer Podcast
Podcast episode
756 Is PYTHON The FUTURE Of Programming? (With Rafeh Qazi From Clever Programmer) - Simple Programmer Podcast
bySimple Programmer Podcast
0 ratings
0% found this document useful
Exploratory Data Analysis Made Easy At The Command Line: An interview about building and using the Visidata tool for exploratory data analysis in your terminal
Podcast episode
Exploratory Data Analysis Made Easy At The Command Line: An interview about building and using the Visidata tool for exploratory data analysis in your terminal
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Supercharging Your Process Mining with Python
Podcast episode
Supercharging Your Process Mining with Python
byMining Your Business
0 ratings
0% found this document useful
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
Podcast episode
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development: Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.
byData Engineering Podcast
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Bringing Pure Python to Apache Kafka (with Tomáš Neubauer)
Podcast episode
Bringing Pure Python to Apache Kafka (with Tomáš Neubauer)
byDeveloper Voices
0 ratings
0% found this document useful
Pandas Extension Arrays with Tom Augspurger: Adding New Tools To The Pandas Swiss Army Knife (Interview)
Podcast episode
Pandas Extension Arrays with Tom Augspurger: Adding New Tools To The Pandas Swiss Army Knife (Interview)
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Ep. 039, You want chili powder with that?: You want chili powder with that?
Podcast episode
Ep. 039, You want chili powder with that?: You want chili powder with that?
byUnderserved
0 ratings
0% found this document useful
ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news
Podcast episode
ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
Build Your Python Data Processing Your Way And Run It Anywhere With Fugue: An interview with Kevin Kho about the open source Fugue framework for abstracting away the execution engine for your Python data workflows so you can write it once and run it anywhere.
Podcast episode
Build Your Python Data Processing Your Way And Run It Anywhere With Fugue: An interview with Kevin Kho about the open source Fugue framework for abstracting away the execution engine for your Python data workflows so you can write it once and run it anywhere.
byData Engineering Podcast
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs: An interview with Andy Dang about the open source WhyLogs library and how it simplifies the work of data logging for instrumenting your machine learning workflows and unlocking observability.
Podcast episode
Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs: An interview with Andy Dang about the open source WhyLogs library and how it simplifies the work of data logging for instrumenting your machine learning workflows and unlocking observability.
byData Engineering Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
#48 Managing Data Science Teams
Podcast episode
#48 Managing Data Science Teams
byDataFramed
0 ratings
0% found this document useful
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
Podcast episode
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+: A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units.
Podcast episode
Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+: A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. With their launch of Dagster+ as the redesigned commercial companion to the open source project they are investing in that capability with a suite of new features. In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units.
byData Engineering Podcast
0 ratings
0% found this document useful
Heavy Networking 702: Supporting Network Automation With The Pandas Python Library: Today's Heavy Networking covers Pandas. Not the cuddly bears that eat bamboo, but the Python library that makes it easy for you to work with a set of data. Import Pandas at the top of your Python script, follow one of many Pandas tutorials online,
Podcast episode
Heavy Networking 702: Supporting Network Automation With The Pandas Python Library: Today's Heavy Networking covers Pandas. Not the cuddly bears that eat bamboo, but the Python library that makes it easy for you to work with a set of data. Import Pandas at the top of your Python script, follow one of many Pandas tutorials online,
byHeavy Networking
0 ratings
0% found this document useful
Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda: An interview with David Bader about the Arkouda framework for exploratory data analysis at interactive speeds across massive data sets and how it supports operating from a single laptop to multiple servers in the cloud or thousands of cores on a supercomputer
Podcast episode
Interactive Exploratory Data Analysis On Petabyte Scale Data Sets With Arkouda: An interview with David Bader about the Arkouda framework for exploratory data analysis at interactive speeds across massive data sets and how it supports operating from a single laptop to multiple servers in the cloud or thousands of cores on a supercomputer
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
A.I.-POWERED RASPBERRY Pi
Linux Format
Article
A.I.-POWERED RASPBERRY Pi
Sep 19, 2023
1 min read
ParrotOS
Linux Format
Article
ParrotOS
Apr 2, 2024
2 min read
Artificial Intelligence Rules Of The Road
Linux Format
Article
Artificial Intelligence Rules Of The Road
Nov 14, 2023
AI FOR ALL! Anyone who works with computers needs to understand that AI will undoubtedly change how work is executed. That said, I don’t think we are anywhere near the much bleated “Everyone will lose their jobs!” IT-related jobs will change but they
2 min read
Hot Picks
Linux Format
Article
Hot Picks
Mar 9, 2021
13 min read
PyScript – Bring Python Coding To The Web
APC
Article
PyScript – Bring Python Coding To The Web
Aug 8, 2022
4 min read
Get To Grips With Kali
Linux Format
Article
Get To Grips With Kali
May 2, 2023
5 min read
Use A Raspberry Pi As Your Daily Driver
APC
Article
Use A Raspberry Pi As Your Daily Driver
Mar 22, 2021
5 min read
Choosing The Right OS For Your Mobile Tablet
APC
Article
Choosing The Right OS For Your Mobile Tablet
Apr 20, 2020
5 min read
Build A Pi-powered Network Storage Device
Linux Format
Article
Build A Pi-powered Network Storage Device
Dec 14, 2021
10 min read
Use A Raspberry Pi As Your Daily Driver
Linux Format
Article
Use A Raspberry Pi As Your Daily Driver
Dec 15, 2020
Christian Cawley is a technology writer and Doctor Who fan. Sigh. The Pi 4 works as a daily driver. But if there’s something you can’t do with it, go and use your wheezing desktop and come back to the low-budget, environmentally friendly Pi. Popu
5 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Choosing The Right Mobile Tablet OS
Linux Format
Article
Choosing The Right Mobile Tablet OS
Mar 10, 2020
5 min read
Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
Does Ampere Make Password Crackers Useful?
APC
Article
Does Ampere Make Password Crackers Useful?
Nov 2, 2020
2 min read
Intelligent Machine Fun
Linux Format
Article
Intelligent Machine Fun
Apr 5, 2022
For our final project we’ll try something a bit more complicated. We’re going to leverage the extra grunt of the Pi 4 (this will work on a Pi 3 but it won’t be fun) and the TensorFlow machine learning software to enable the Pi, via a camera, to class
4 min read
Intelligent Machine Fun
Linux Format
Article
Intelligent Machine Fun
Apr 5, 2022
For our final project we’ll try something a bit more complicated. We’re going to leverage the extra grunt of the Pi 4 (this will work on a Pi 3 but it won’t be fun) and the TensorFlow machine learning software to enable the Pi, via a camera, to class
4 min read
Poisoning The Well
Linux Format
Article
Poisoning The Well
Jan 11, 2022
4 min read
25 Apps For Geeks
PC Pro Magazine
Article
25 Apps For Geeks
Aug 7, 2022
12 min read
Mailserver
Linux Format
Article
Mailserver
Aug 22, 2023
Do you have a burning Linuxrelated issue that you want to discuss? Write to us at Linux Format, Future Publishing, Quay House, The Ambury, Bath, BA1 1UA or email letters@ linuxformat.com. It has been said that one can tell what language a programmer
4 min read
Manage Your Apps!
Linux Format
Article
Manage Your Apps!
Nov 14, 2023
17 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Component Installation
Linux Format
Article
Component Installation
Feb 7, 2023
1 min read
The Verdict
Linux Format
Article
The Verdict
Feb 7, 2023
2 min read
“Allowing Connections From Any Public IP Address Is, Shall We Say, Courageous, But Is Required”
PC Pro Magazine
Article
“Allowing Connections From Any Public IP Address Is, Shall We Say, Courageous, But Is Required”
Dec 8, 2022
I have written before about my love for Roon, the music management and streaming platform, but for those who don’t recall a little recap is probably in order. The first thing to recognise is that the problem with most streaming tools is that they hav
9 min read
Set Up A Production-ready Web Server
Linux Format
Article
Set Up A Production-ready Web Server
Sep 24, 2019
8 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
20 Computer Performance Tips
Music Tech Focus
Article
20 Computer Performance Tips
Sep 1, 2016
7 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Smarter Pi Guy
Linux Format
Article
Smarter Pi Guy
Sep 19, 2023
We’re not fighting the tide of the AI phenomenon – in fact, you might say we’re embracing it, though I can safely say the clunky prose you’ll find littering the magazine is entirely humangenerated. We’ve tried using chatbots to write Linux copy and i
1 min read

Related categories

Skip carousel

Reviews for Thinking in Pandas

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Thinking in Pandas - Hannah Stepanek

H. StepanekThinking in Pandashttps://doi.org/10.1007/978-1-4842-5839-2_1

1. Introduction

Hannah Stepanek¹

(1)

Portland, OR, USA

We live in a world full of data. In fact, there is so much data that it’s nearly impossible to comprehend it all. We rely more heavily than ever on computers to assist us in making sense of this massive amount of information. Whether it’s data discovery via search engines, presentation via graphical user interfaces, or aggregation via algorithms, we use software to process, extract, and present the data in ways that make sense to us. pandas has become an increasingly popular package for working with big data sets. Whether it’s analyzing large amounts of data, presenting it, or normalizing it and re-storing it, pandas has a wide range of features that support big data needs. While pandas is not the most performant option available, it’s written in Python, so it’s easy for beginners to learn, quick to write, and has a rich API.

About pandas

pandas is the go-to package for working with big data sets in Python. It’s made for working with data sets generally below or around 1 GB in size, but really this limit varies depending on the memory constraints of the device you run it on. A good rule of thumb is have at least five to ten times the amount of memory on the device as your data set. Once the data set starts to exceed the single-digit gigabyte range, it’s generally recommended to use a different library such as Vaex.

The name pandas came from the term panel data referring to tabular data. The idea is that you can make panels out of a larger panel of the data, as shown in Figure 1-1.

../images/487367_1_En_1_Chapter/487367_1_En_1_Fig1_HTML.png

Figure 1-1

Panel data

When pandas was first implemented, it was tightly coupled to NumPy, a popular Python package for scientific computing providing an n-dimensional array object for performing efficient matrix math operations. Using the modern implementation of pandas today, you can still see evidence of its tight coupling in the exposition of the Not a Number (NaN) type and its API such as the dtype parameter.

pandas was a truly open source project from the start. The original author Wes McKinney in the Python Podcast.__init__ admitted, in order to foster an open source community and encourage contributions, pandas was tied perhaps a little too closely to the NumPy Python package, but looking back, he wouldn’t have done it any different. NumPy was and still is a very popular and powerful Python library for efficient mathematical arithmetic. At the time of pandas inception, NumPy was the main data computation package of the scientific community, and in order to implement pandas quickly and simply in a way that was familiar to its existing user and contributor base, the NumPy package became the underlying data structure of the pandas DataFrame. NumPy is built on C extensions, and while it supplies a Python API, the main computation happens almost entirely in C, which is why it is so efficient. C is much faster than Python because it is a low-level language and thus doesn’t consume the memory and CPU overhead that Python does in order to provide all the high-level niceties such as memory management. Even today, developers still rely heavily on NumPy and often perform exclusively NumPy-based operations in their pandas programs.

The difference in performance between Python and C is often not very significant to the average developer. Python is generally fast enough in most cases, and the nicety of Python’s high-level language qualities (built-in memory management and pseudo-code like syntax, to name a few) generally outweighs the headaches of having to manage the memory yourself. However, when operating on huge data sets with thousands of rows, these subtle performance differences compound into a much more significant difference. For the average developer, this may seem absolutely outrageous, but it isn’t unusual for the scientific research community to spend days waiting for big data computations to run. Sometimes the computations do really take this long; however, other times the programs are simply written in an inefficient way. There are many different ways to do the same thing in pandas which makes it flexible and powerful but also means it can lead developers down less efficient implementation paths that result in very slow data processing.

As developers, we live in an age where compute resources are considered cheap. If a program is CPU heavy, it’s easier for us to simply upgrade our AWS instance to a larger machine and pay an extra couple bucks than it is to invest our time to root cause our program and address the overtaxing of the CPU. While it is wonderful to have such readily available compute resources, it also makes us lazy developers. We often forget that 50 years ago computers took up whole rooms and took several seconds just to add two numbers together. A lot of programs are simply fast enough and still meet performance requirements even though they are not written in the most optimal way. Compute resources for big data processing take up a significant amount of energy compared to a simple web service; they require large amounts of memory and CPU, often requiring large machines to run at their resource limits over multiple hours. These programs are taxing on the hardware, potentially resulting in faster aging, and require a large amount of energy both to keep the machines cool and also to keep the computation running. As developers we have a responsibility to write efficient programs, not just because they are faster and cost less but also because they will reduce compute resources which means less electricity, less hardware, and in general more sustainability.

It is the goal of this book in the coming chapters to assist developers in implementing performant pandas programs and to help them develop an intuition for choosing efficient data processing techniques. Before we deep dive into the underlying data structures that pandas is built on, let’s take a look at how some existing impactful projects utilize pandas.

How pandas helped build an image of a black hole

pandas was used to normalize all the data collected from several large telescopes to construct the first image of a black hole. Since the black hole was so far away, it would have required a telescope as big as the Earth to capture an image of the black hole directly, so, instead, scientists came up with a way to piece one together using the largest telescopes we have today. In this international collaboration, the largest telescopes on Earth were used as a representative single mirror of a larger theoretical telescope that would be needed to capture the image of a black hole. Since the Earth turns, each telescope could act as more than one mirror, filling in a significant portion of the theoretical larger telescope image. Figure 1-2 demonstrates this technique. These pieces of the larger theoretical image were then passed through several different image prediction algorithms trained to recognize different types of images. The idea was if each of these different image reproduction techniques outputs the same image, then they could be confident that the image of the black hole was the real image (or reasonably close).

../images/487367_1_En_1_Chapter/487367_1_En_1_Fig2_HTML.jpg

Figure 1-2

Using the telescopes on Earth to represent pieces of a larger theoretical telescope

The library is open source and posted on GitHub.¹ The images from radio telescopes were captured on hard disks and flown across the world to a lab at the Massachusetts Institute of Technology where they were loaded into pandas. The data was then normalized, synchronizing the captures from the telescopes in time, removing things like interference from the Earth’s atmosphere, and calculating things like absolute phase of a single telescope over time. The data was then sent into the different image prediction algorithms, and finally the first image of a black hole was born.²

How pandas helps financial institutions make more informed predictions about the future market

Financial advisors are always looking for an edge up on the competition. Many financial institutions use pandas along with machine learning libraries to determine whether new data points may be relevant in helping financial advisors make better investment decisions. New data sets are often loaded into pandas, normalized, and then evaluated against historical market data to see if the data correlates to trends in the market. If it does, the data is then passed along to the advisors to be used in making financial investment decisions. It may also be passed along to their customers so they can make more informed decisions as well.

Financial institutions also use pandas to monitor their systems. They look for outages or slowness in servers that might impact their trade performance.

How pandas helps improve discoverability of content

Companies collect tons of data on users every day. For broadcast companies' viewership, data is particularly relevant both for showing relevant advertisements and for bringing the right content in front of interested users. Typically, the data collected about users is loaded into pandas and analyzed for viewership patterns in the content they watch. They may look for patterns such as when they watch certain content, what content they watch, and when they are finished watching certain content and looking for something new. Then, new content or relevant product advertisements are recommended based on those patterns. There has been a lot of work recently to also improve business models so that users don’t get put into a bubble (i.e., recommended content isn’t just the same type of content they’ve been watching before or presenting the same opinions). Often this is done by avoiding content silos from the business side.

Now that we’ve looked at some interesting use cases for pandas, in Chapter 2 we’ll take a look at how to use pandas to access and merge data.

Footnotes

https://github.com/achael/eht-imaging

https://solarsystem.nasa.gov/resources/2319/first-image-of-a-black-hole/

H. StepanekThinking in Pandashttps://doi.org/10.1007/978-1-4842-5839-2_2

2. Basic Data Access and Merging

Hannah Stepanek¹

(1)

Portland, OR, USA

There are many ways of accessing and merging DataFrames with pandas. This chapter will go over the basic methods for getting data out of a DataFrame, creating a sub-DataFrame, and merging DataFrames together.

DataFrame creation and access

pandas has a dictionary-like syntax that is very intuitive for those familiar with

Enjoying the preview?

Page 1 of 1

Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way

About this ebook

Hannah Stepanek

Related authors

Related to Thinking in Pandas

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Thinking in Pandas

What did you think?

Book preview

Thinking in Pandas - Hannah Stepanek

1. Introduction

About pandas

How pandas helped build an image of a black hole

How pandas helps financial institutions make more informed predictions about the future market

How pandas helps improve discoverability of content

2. Basic Data Access and Merging

DataFrame creation and access