Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

Ebook565 pages4 hours

Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

Name: Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
Author: Ervin Varga
ISBN: 9781484248591

By Ervin Varga

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Gain insight into essential data science skills in a holistic manner using data engineering and associated scalable computational methods. This book covers the most popular Python 3 frameworks for both local and distributed (in premise and cloud based) processing. Along the way, you will be introduced to many popular open-source frameworks, like, SciPy, scikitlearn, Numba, Apache Spark, etc. The book is structured around examples, so you will grasp core concepts via case studies and Python 3 code.
As data science projects gets continuously larger and more complex, software engineering knowledge and experience is crucial to produce evolvable solutions. You'll see how to create maintainable software for data science and how to document data engineering practices.
This book is a good starting point for people who want to gain practical skills to perform data science. All the code willbe available in the form of IPython notebooks and Python 3 programs, which allow you to reproduce all analyses from the book and customize them for your own purpose. You'll also benefit from advanced topics like Machine Learning, Recommender Systems, and Security in Data Science.
Practical Data Science with Python will empower you analyze data, formulate proper questions, and produce actionable insights, three core stages in most data science endeavors.
What You'll Learn

Play the role of a data scientist when completing increasingly challenging exercises using Python 3
Work work with proven data science techniques/technologies
Review scalable software engineering practices to ramp up data analysis abilities in the realm of Big Data
Apply theory of probability, statistical inference, and algebra to understand the data sciencepractices

Who This Book Is For
Anyone who would like to embark into the realm of data science using Python 3.

Skip carousel

LanguageEnglish

PublisherApress

Release dateSep 7, 2019

ISBN9781484248591

Author

Ervin Varga

Related authors

Skip carousel

Related to Practical Data Science with Python 3

Related ebooks

Skip carousel

Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
Ebook
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
byTaweh Beysolow II
Rating: 0 out of 5 stars
0 ratings
Learning NumPy Array
Ebook
Learning NumPy Array
byIvan Idris
Rating: 0 out of 5 stars
0 ratings
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
Ebook
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
byThurupathan Vijayakumar
Rating: 0 out of 5 stars
0 ratings
Foundations of Python Network Programming
Ebook
Foundations of Python Network Programming
byBrandon Rhodes
Rating: 4 out of 5 stars
4/5
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
The SQL Server DBA’s Guide to Docker Containers: Agile Deployment without Infrastructure Lock-in
Ebook
The SQL Server DBA’s Guide to Docker Containers: Agile Deployment without Infrastructure Lock-in
byEdwin M Sarmiento
Rating: 0 out of 5 stars
0 ratings
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Data Science Fundamentals for Python and MongoDB
Ebook
Data Science Fundamentals for Python and MongoDB
byDavid Paper
Rating: 0 out of 5 stars
0 ratings
Learn Data Science Using SAS Studio: A Quick-Start Guide
Ebook
Learn Data Science Using SAS Studio: A Quick-Start Guide
byEngy Fouda
Rating: 0 out of 5 stars
0 ratings
Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python
Ebook
Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python
byAshwin Pajankar
Rating: 4 out of 5 stars
4/5
Getting Started with Visual Studio 2019: Learning and Implementing New Features
Ebook
Getting Started with Visual Studio 2019: Learning and Implementing New Features
byDirk Strauss
Rating: 0 out of 5 stars
0 ratings
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
Ebook
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
byAxel Ross
Rating: 0 out of 5 stars
0 ratings
PyTorch Recipes: A Problem-Solution Approach
Ebook
PyTorch Recipes: A Problem-Solution Approach
byPradeepta Mishra
Rating: 0 out of 5 stars
0 ratings
Deep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks
Ebook
Deep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings
Beginning Application Lifecycle Management
Ebook
Beginning Application Lifecycle Management
byJoachim Rossberg
Rating: 0 out of 5 stars
0 ratings
Building REST APIs with Flask: Create Python Web Services with MySQL
Ebook
Building REST APIs with Flask: Create Python Web Services with MySQL
byKunal Relan
Rating: 0 out of 5 stars
0 ratings
Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
Ebook
Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
byTshepo Chris Nokeri
Rating: 0 out of 5 stars
0 ratings
Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib
Ebook
Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib
byRobert Johansson
Rating: 0 out of 5 stars
0 ratings
Instant StyleCop Code Analysis How-to
Ebook
Instant StyleCop Code Analysis How-to
byFranck LEVEQUE
Rating: 0 out of 5 stars
0 ratings
Beginning Game Programming with Pygame Zero: Coding Interactive Games on Raspberry Pi Using Python
Ebook
Beginning Game Programming with Pygame Zero: Coding Interactive Games on Raspberry Pi Using Python
byStewart Watkiss
Rating: 0 out of 5 stars
0 ratings
Learn Java with Math: Using Fun Projects and Games
Ebook
Learn Java with Math: Using Fun Projects and Games
byRon Dai
Rating: 0 out of 5 stars
0 ratings
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Ebook
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
byMatthew Rosch
Rating: 0 out of 5 stars
0 ratings
Advanced Forecasting with Python: With State-of-the-Art-Models Including LSTMs, Facebook’s Prophet, and Amazon’s DeepAR
Ebook
Advanced Forecasting with Python: With State-of-the-Art-Models Including LSTMs, Facebook’s Prophet, and Amazon’s DeepAR
byJoos Korstanje
Rating: 0 out of 5 stars
0 ratings
UNIX for OpenVMS Users
Ebook
UNIX for OpenVMS Users
byPhilip Bourne
Rating: 0 out of 5 stars
0 ratings
Modern C for Absolute Beginners: A Friendly Introduction to the C Programming Language
Ebook
Modern C for Absolute Beginners: A Friendly Introduction to the C Programming Language
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
Advanced Python Development: Using Powerful Language Features in Real-World Applications
Ebook
Advanced Python Development: Using Powerful Language Features in Real-World Applications
byMatthew Wilkes
Rating: 0 out of 5 stars
0 ratings
Bayesian Optimization and Data Science
Ebook
Bayesian Optimization and Data Science
byFrancesco Archetti
Rating: 0 out of 5 stars
0 ratings

Programming For You

Skip carousel

The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
Ebook
The Advanced Roblox Coding Book: An Unofficial Guide, Updated Edition: Learn How to Script Games, Code Objects and Settings, and Create Your Own World!
byHeath Haskins
Rating: 5 out of 5 stars
5/5
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
Ebook
Python: For Beginners A Crash Course Guide To Learn Python in 1 Week
byTimothy C. Needham
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
Ebook
Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps
byJason Scotts
Rating: 4 out of 5 stars
4/5
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
Ebook
Python Programming For Beginners: Learn The Basics Of Python Programming (Python Crash Course, Programming for Dummies)
byJames Tudor
Rating: 5 out of 5 stars
5/5
HTML & CSS: Learn the Fundaments in 7 Days
Ebook
HTML & CSS: Learn the Fundaments in 7 Days
byMichael Knapp
Rating: 4 out of 5 stars
4/5
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
Ebook
Java for Beginners: A Crash Course to Learn Java Programming in 1 Week
byBrady Ellison
Rating: 5 out of 5 stars
5/5
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
Ebook
SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days
byi Code Academy
Rating: 5 out of 5 stars
5/5
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
Ebook
The JavaScript Workshop: Learn to develop interactive web applications with clean and maintainable JavaScript code
byJoseph Labrecque
Rating: 5 out of 5 stars
5/5
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
Ebook
HTML & CSS QuickStart Guide: The Simplified Beginners Guide to Developing a Strong Coding Foundation, Building Responsive Websites, and Mastering the Fundamentals of Modern Web Design
byDavid DuRocher
Rating: 4 out of 5 stars
4/5
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
Ebook
CODING FOR ABSOLUTE BEGINNERS: How to Keep Your Data Safe from Hackers by Mastering the Basic Functions of Python, Java, and C++ (2022 Guide for Newbies)
byEric Vargas
Rating: 0 out of 5 stars
0 ratings
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
Ebook
Python Programming for Beginners: A Comprehensive Crash Course With Practical Exercises to Quickly Learn Coding and Programming for Data Analysis and Machine Learning
byAnthony Adams
Rating: 4 out of 5 stars
4/5
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
Ebook
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer.
byGwendolyn Faraday
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
Ebook
101 Amazing Nintendo NES Facts: Includes facts about the Famicom
byJimmy Russell
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
Ebook
Pokemon Go: Guide + 20 Tips and Tricks You Must Read Hints, Tricks, Tips, Secrets, Android, iOS
byGame Guidez
Rating: 5 out of 5 stars
5/5
Linux: Learn in 24 Hours
Ebook
Linux: Learn in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 3 out of 5 stars
3/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Ebook
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
byMark Chan
Rating: 5 out of 5 stars
5/5
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
Ebook
Modern C++ for Absolute Beginners: A Friendly Introduction to C++ Programming Language and C++11 to C++20 Standards
bySlobodan Dmitrović
Rating: 0 out of 5 stars
0 ratings
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
Ebook
Python Projects for Beginners: A Ten-Week Bootcamp Approach to Python Programming
byConnor P. Milliken
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Harnessing Python for Research: Scientific Applications of Python with Michael Kennedy: Still scrabbling with Excel? Consider Python language uses, says programmer and podcaster Michael Kennedy. A general programming language that is easy to use in multiple environments, Python programming is limitless and has numerous open source...
Podcast episode
Harnessing Python for Research: Scientific Applications of Python with Michael Kennedy: Still scrabbling with Excel? Consider Python language uses, says programmer and podcaster Michael Kennedy. A general programming language that is easy to use in multiple environments, Python programming is limitless and has numerous open source...
byFinding Genius Podcast
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
#124 Using AI to Improve Data Quality in Healthcare
Podcast episode
#124 Using AI to Improve Data Quality in Healthcare
byDataFramed
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
Podcast episode
235: Pair programming with Ben Orenstein & Tuple: In this episode, Kaushik goes solo and interviews Ben Orenstein. Ben is a prolific Ruby developer, an amazing conference speaker, an ardent vim-ster, and now the CEO of Tuple. Kaushik has been a big fan of Ben's work and was super stoked to talk to Ben and pick his brains on a host of topics: starting the company Tuple, pair programming in general, learning different programming languages and technology, giving better conference talks and more! This episode is chock full of wisdom from Ben. Enjoy!
byFragmented - An Android Developer Podcast
0 ratings
0% found this document useful
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
Podcast episode
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
#121 — ChatGPT and How Generative AI is Augmenting Workflows
Podcast episode
#121 — ChatGPT and How Generative AI is Augmenting Workflows
byDataFramed
0 ratings
0% found this document useful
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
Podcast episode
Anaconda + Pyston and more: with Peter Wang, CEO of Anaconda
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Computational Thinking & Learning Python During an AI Revolution
Podcast episode
Computational Thinking & Learning Python During an AI Revolution
byThe Real Python Podcast
0 ratings
0% found this document useful
MLA 018 Descript: (Optional episode) just showcasing a cool application using machine learning Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it...
Podcast episode
MLA 018 Descript: (Optional episode) just showcasing a cool application using machine learning Dept uses Descript for some of their podcasting. I'm using it like a maniac, I think they're surprised at how into it I am. Check out the transcript & see how it...
byMachine Learning Guide
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
Exploring Functional Programming in Python With Bruce Eckel
Podcast episode
Exploring Functional Programming in Python With Bruce Eckel
byThe Real Python Podcast
0 ratings
0% found this document useful
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
Podcast episode
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
byTest and Code
0 ratings
0% found this document useful
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
Podcast episode
Crafting Interpreters With Bob Nystrom: Bob Nystrom is the author of Crafting Interpreters. I speak with Nystrom about building a programming language and an interpreter implementation for it. We talk about parsing, the difference between compiler and interpreters and a lot more. If you are...
byCoRecursive: Coding Stories
0 ratings
0% found this document useful
TypeScript Fundamentals: In this episode of Syntax, Scott and Wes talk about TypeScript fundamentals — what it is, how you use it, why people love it so much, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in...
Podcast episode
TypeScript Fundamentals: In this episode of Syntax, Scott and Wes talk about TypeScript fundamentals — what it is, how you use it, why people love it so much, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Making Automated Machine Learning More Accessible With EvalML: An interview with Angela Lin and Jeremy Shih about the open source EvalML framework for building automated machine learning workflows.
Podcast episode
Making Automated Machine Learning More Accessible With EvalML: An interview with Angela Lin and Jeremy Shih about the open source EvalML framework for building automated machine learning workflows.
byThe Python Podcast.__init__
100%
100% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Helping Teacher's Bring Python Into The Classroom With Nicholas Tollervey: Helping Teacher's Bring Python Into The Classroom (Interview)
Podcast episode
Helping Teacher's Bring Python Into The Classroom With Nicholas Tollervey: Helping Teacher's Bring Python Into The Classroom (Interview)
byThe Python Podcast.__init__
0 ratings
0% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
Podcast episode
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
byTest and Code
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful

Skip carousel

Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Coding Secure Rust System Tools
Linux Format
Article
Coding Secure Rust System Tools
Apr 5, 2022
8 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Set Up Your First Database
Linux Format
Article
Set Up Your First Database
Aug 25, 2020
1 min read
Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
Fast And Easy Image Processing
Linux Format
Article
Fast And Easy Image Processing
Apr 4, 2023
Credit: https://github.com/oguzhaninan/korkut Shashank Sharma is a trial lawyer in New Delhi and an avid Arch user. He’s been writing about open source software for 20 years and lawyering for 10. You wouldn’t think of the command line as the go-to re
4 min read
Get Into Coding!
Linux Format
Article
Get Into Coding!
Aug 23, 2022
1 min read
Revisit The Arcade Classic Pong In Python
Linux Format
Article
Revisit The Arcade Classic Pong In Python
Jul 28, 2020
This series of building retro games in Python has so far seen us coding a lunar landing space module, a side-scrolling platformer, the famous pellet-munching, ghost-chasing Pac-Man, and in this issue we’re going to develop our own version of Pong! To
7 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Saving and Executing Your Code
Essential Apple User Magazine
Article
Saving and Executing Your Code
Jul 31, 2019
2 min read
Use Python To Get More From Dropbox
Linux Format
Article
Use Python To Get More From Dropbox
Feb 8, 2022
8 min read
Coding The Arcade Classic Space Invaders
Linux Format
Article
Coding The Arcade Classic Space Invaders
Mar 9, 2021
4 min read
Homebrew Heroes
Retro Gamer
Article
Homebrew Heroes
Sep 5, 2019
2 min read
Cutting Edges
Linux Format
Article
Cutting Edges
Feb 6, 2024
Living in the open source world means if you want to sit on the cutting edge of development, you can. Nightly builds, compiling the kernel from source, grabbing Git repositories, if you want to try the latest, no one is going to try and stop or gatek
1 min read
Finding A New Career In AI
APC
Article
Finding A New Career In AI
Mar 23, 2020
4 min read
Safer Cyber
Cosmos Magazine
Article
Safer Cyber
Mar 14, 2024
3 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
Join The Revolution
You South Africa
Article
Join The Revolution
May 19, 2022
6 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Cracking The Code: Introducing The Next-gen To STEM
APC
Article
Cracking The Code: Introducing The Next-gen To STEM
May 17, 2021
4 min read
Join The Revolution
You South Africa
Article
Join The Revolution
Nov 4, 2022
8 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
How AI Joins The Fight Against Coronavirus
APC
Article
How AI Joins The Fight Against Coronavirus
Apr 20, 2020
4 min read
“There’s A Big Difference Between Research Work And The Risk You’re Likely To Be Exposed To”
PC Pro Magazine
Article
“There’s A Big Difference Between Research Work And The Risk You’re Likely To Be Exposed To”
Aug 7, 2022
Most cyber-scare stories have more in common with horror fiction than practical reality, and I’m not talking purely about the hyped-up cyber-warfare stuff that appears online. Me being me, I’m focused on the hacking threat stuff. Regular readers of m
6 min read
Over The Edge
Linux Format
Article
Over The Edge
Nov 19, 2019
9 min read
Intel …ON THE FUTURE OF… Computing
T3
Article
Intel …ON THE FUTURE OF… Computing
Sep 27, 2019
5 min read

Related categories

Skip carousel

Reviews for Practical Data Science with Python 3

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Practical Data Science with Python 3 - Ervin Varga

E. Varga Practical Data Science with Python 3https://doi.org/10.1007/978-1-4842-4859-1_1

1. Introduction to Data Science

Ervin Varga¹

(1)

Kikinda, Serbia

Let me start by making an analogy between software engineering and data science. Software engineering may be summarized as the application of engineering principles and methods to the development of software. The aim is to produce a dependable software product. In a similar vein, data science may be described as the application of scientific principles and methods to data collection, analysis, and reporting. The goal is to synthesize reliable and actionable insights from data (sometimes referred as data product). To continue with our analogy, the systems/software development life cycle (SDLC) prescribes the major phases of a software development process: project initiation, requirements engineering, design, construction, testing, deployment, and maintenance. The data science process also encompasses multiple phases: project initiation, data acquisition, data preparation, data analysis, reporting, and execution of actions (another phase is data exploration, which is more of an all-embracing activity than a stand-alone phase). As in software development, these phases are quite interwoven, and the process is inherently iterative and incremental. An overarching activity that is indispensable in both software engineering and data science (and any other iterative and incremental endeavor) is retrospection, which involves reviewing a project or process to determine what was successful and what could be improved. Another similarity to software engineering is that data science also relies on a multidimensional team or team of teams. A typical project requires domain experts, software engineers specializing in various technologies, and mathematicians (a single person may take different roles at various times). Yet another common denominator with software engineering is a penchant for automation (via programmability of most activities) to increase productivity, reproducibility, and quality. The aim of this chapter is to explain the key concepts regarding data science and put them into proper context.

Main Phases of a Data Science Project

Figure 1-1 illuminates the major phases of a data science process. These phases shouldn’t be treated in a waterfall fashion (meaning one phase must be completed before the next begins). Rather, a typical data science project involves many iterations, similarly to an Agile software development project. The concept of phases is intended only to remind us that focus shifts over time from one phase to another. It isn’t evident from looking at Figure 1-1, but the most critical stage is project initiation. To quote John Dewey (The Theory of Inquiry, 1938):

It is a familiar and significant saying that a problem well put is half-solved.

../images/473947_1_En_1_Chapter/473947_1_En_1_Fig1_HTML.jpg

Figure 1-1

The main phases of a data science project

The Brown Cow model from the Volere requirements engineering method is a helpful resource to explain the project initiation phase (see reference [1] in the References section at the end of the chapter). Figure 1-2 represents an adaptation of this model that depicts what happens when a data science project is initiated. The mere fact that we have some data to play with doesn’t automatically warrant a full-blown project. It is important to properly define the problem, assess the situation (risks, assets, resources, contingencies), and describe the goals, including evaluation criteria. We must end up with well-formulated research questions and have a good grasp of implementation details. The whole endeavor must be fathomed in a holistic fashion. The beginning must be connected to the end, and the end must instigate further inquiries.

../images/473947_1_En_1_Chapter/473947_1_En_1_Fig2_HTML.jpg

Figure 1-2

The Brown Cow model adapted for a data science project

The Brown Cow model contains four segments: the How tackles the solution space, the What touches upon the problem space, the Now designates the current situation, while the Future describes the desired state. Each segment designates one specific viewpoint, which helps to avoid confusion among stakeholders. The Brown Cow model incorporates a systematic procedure for transitioning from the current state toward the future, while satisfying the interests of key stakeholders of a project. Without their global consensus, it is very hard to judge the successfulness of the final data product. The next section illustrates this model in a case study.

Brown Cow Model Case Study

You may be wondering whether data science is just a new fad on the horizon. If you examine this question in the light of the previous section’s first paragraph, then you might conclude that it isn’t new at all. A superb application of data science was done by Dr. John Snow, a father of modern epidemiology. His study of interest was related to a cholera outbreak in London in 1854 (see [2] in the References section for more details). He followed the data science process entirely, as described here (the numbers 1 and 2 denote increments that resulted in major milestones):

Project initiation: Based on symptoms of the disease (vomiting and diarrhea), Dr. Snow properly surmised that it must be caused by an infection carried by something people ate or drank. Prevalent public opinion at that time was that cholera was transmitted by bad smells (a.k.a. miasma theory). According to the Brown Cow model, the Now-How is the existing registry of deaths as well as total confusion regarding what is the root cause of cholera. The Now-What is the prevalent wrong theory of how the disease spreads (with a futile rule to cover your nose and mouth). The Future-What is the new theory of what may cause cholera. The Future-How is the incremental procedure to establish causation between contaminated water and cholera as well as clear advice about what should be the immediate next steps. It is also important to note that this revelation opened up the possibility to further investigate cholera, which eventually led to the discovery of the matching bacteria.

Data engineering 1 (acquisition, preparation, and exploration): Snow recorded the location of each death around the Soho district of London (where the epidemy had emanated) and put everything on a map (as shown in Figure 1-3). He also marked the water pumps on a map to be able to discern patterns. This is a fine example of using visualization in parallel to raw tabular data; exploration most frequently entails some sort of graphics (image, plot, graph, etc.). Exploration as an activity permeates all phases of the data science process, though.

../images/473947_1_En_1_Chapter/473947_1_En_1_Fig3_HTML.jpg

Figure 1-3

Snow’s original map. Black bars denote deaths. Black discs are pumps. The clustering of death cases around the Broad Street pump is apparent.

Data analysis 1: After Snow carefully examined every case (including important outliers), he established a strong correlation between deaths and the Broad Street pump. He was cautious and resisted establishing causation at this stage. This is a fine example of applying a scientific skillset in working with data (the Web is full of embarrassing stories of correlation being misapplied as causation).

Retrospective: The first milestone was an enabler for the second part.¹ Snow had revisited the original plan and prepared the next stage to answer his initial research question. He needed to carry out a randomized control experiment (the best way to assuredly reach causality) and had to use the method of comparison. In a randomized experiment, participants are segregated into two groups: treatment (e.g., those who drank contaminated water) and control (e.g., those who were not exposed to infection). The comparison method seeks to find an association between the applied treatment and observed outcome. It is very important for groups to systematically differ only in that single characteristic that is the criteria for separation. In Snow’s case, this was the water supply received by people in these categories. Luckily, there were two water suppliers whose customers weren’t one-sidedly different in any other aspect except water supply. One supplier delivered clean water upriver from the sewage discharge point, and the other drew its contaminated water below it. The groups were naturally randomized and formed.

Data engineering 2: Snow collected data on all cases of cholera for a broader area of London that were covered by these water suppliers.

Data analysis 2: After a careful analysis, it was evident that people got sick by drinking contaminated water. This was the moment when Snow could safely claim a causal relationship between an infection through contaminated water and cholera.

Reporting: Snow prepared a detailed table showing death rates of people belonging to the two groups. The death rate in the treatment group was ten times higher than that of the control group, so he was confident in his statement.

Action: The authorities removed the handle from the Broad Street pump to prevent further infections, and it proved effective. Further investigations and actions followed this study.

Evidently, this is a compelling data science project from the 19th century. Why, then, is data science such a hot topic nowadays? The answer lurks in the name of the discipline itself (data science). It is popular again due to the vastly different data and concomitant complex problem space(s) embodied by Big Data (see also the sidebar Big Data Requires Data Scientists). In the past, data was relatively scarce and data management solutions were much more expensive. Today, we have a data deluge phenomenon that was aptly commented on by John Naisbitt as follows: We are drowning in information and starving for knowledge.

Big Data Requires Data Scientists

Big Data, covered in the next section, refers to data at a scale that is difficult to conceptualize. An analogy to help you understand why data scientists are required when designing a Big Data system is that of designing a building. If you want to design your own home, you might be able to draft your own floor plan (or find one on the Internet) and give it to the builder; you don’t need to be an architect. Scale that up to designing a multistory apartment building, then you must be a certified architect, but you can rely on existing resources and tools to complete the design.

Finally, if you’re hired to design the world’s tallest skyscraper or build apartments on an artificial island in the middle of a sea, then you not only must be a certified architect, but also must possess extraordinary knowledge and experience, apply unconventional methods, and devise unique tools. Designing a Big Data system is analogous to designing a skyscraper and requires data scientists with vast knowledge and experience.

Big Data

The notion of being big has multiple dimensions. In the realm of Big Data this is articulated as four Vs:

Volume denotes the sheer amount of data. The assumption is that the vast quantity of data cannot fit on a single machine (not even on disk, let alone in memory), so the data must be distributed over dozens of networked machines. This brings in all sorts of issues related to distributed computing that don’t exist in the case of a single machine.

Varietydesignates the property of data being organized in various ways. In a classical setup, we presume well-structured data, whose schema is properly documented. Usually such data resides in relational databases. With Big Data, we must also deal with unstructured and semi-structured forms. Nonetheless, all data eventually needs to be aligned and managed in a unified fashion.

Velocity dictates the pace of data changes (arrival of new data, update of existing data, and removal of data). Besides processing data at rest, many times we must handle data changes in real time to avoid any data loss. This kind of data management is known as stream processing. Streams of data may arrive to a system previously trained on historical data and this data combination (historical + real-time) is sometimes called actionable information.

Veracity is about trustworthiness of data. As we amalgamate disparate data sources, we must handle inaccurate, incomplete, and misapplied (sometimes adversary) data.

All these Vs require novel methods, algorithms, and technologies. Also, the complexity and size of the required software systems become larger. These are the principal reasons why data science gets so much attention from both research communities and industry. The next example illustrates these dimensions.

Big Data Example: MOOC Platforms

A massive open online course (MOOC) is an online course offered via a MOOC platform to a large, worldwide community. Some of the most popular platforms are Coursera, edX, Udacity, Khan Academy, and Stanford Online. (Most courses are free, but some require payment.) Generating actionable insights on top of a MOOC platform belongs to the Big Data problem space. Table 1-1 shows how Dr. Snow’s cholera project differs from a data science project built on top of a MOOC data-generation platform in regard to the four Vs of Big Data.

Table 1-1

Differences Between Old and Modern Data Science Projects in Light of Big Data

You may want to read How Video Production Affects Student Engagement: An Empirical Study of MOOC Videos (see reference [3]) as an excellent primer for deriving general wisdom from MOOC data (it explores optimal video length range for lectures). The researchers tracked 862 MOOC videos, 120,000+ students, and 6.9 million video-watching sessions on the edX platform. Now compare this MOOC-based project to an even larger system of systems. For example, the Large Hadron Collider (LHC) is the largest data production facility today. If a MOOC platform can be compared to designing a structure, then the LHC is surely an artificial palm island full of expensive houses. Each of the four experiments currently being conducted at the LHC facility produces thousands of gigabytes per second of data, which on a yearly basis results in around 15 petabytes of data. Another data monster is the Internet of Things (IoT), and you can consult reference [4] for more details.

How to Learn Data Science

This is a fundamental topic of data science, since constant learning on all levels (domain, technology, algorithm, programming language, etc.) together with practice are the hallmarks of this profession. Learning data science means gaining knowledge and understanding of data science through study, instruction, and experience. There is an ancient Chinese proverb that emphasizes the advantage of procuring knowledge over sheer consumption: Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime. This is exactly what we strive to achieve in data science, too; instead of purely devouring current facts, we must synthesize knowledge for the future.

There are three interconnected core competency areas in data science: domain knowledge, mathematics (including probability theory and statistics), and software engineering. This triad can be conveniently illustrated with a Venn diagram, although there are a myriad of variants published since Drew Conway’s original version from 2010 (you may read a funny article about these variants in reference [6]).² Most people are very strong in only one particular area, which is OK if they are solid in the other two. Recall that data science revolves around teamwork, which means communication capabilities at multiple levels is crucial. For example, you must speak the language of the domain to communicate properly with important stakeholders and other team members throughout the project. To make this elaboration more concrete, I will give two examples: how to acquire domain knowledge and how to acquire programming skills (see also reference [5]).

../images/473947_1_En_1_Chapter/473947_1_En_1_Figa_HTML.jpg

Domain Knowledge Attainment—Example

In this section, I will share my tactic to gain domain knowledge. I entered data science as a software engineer, which, as a profession, also embraces domain comprehension and mathematics. Nonetheless, data science requires a different type of acquaintance with these areas. Besides enabling effective communication, deep understanding of a domain is mandatory for the following tasks (all of them are related to data quality problems):

Pruning of outliers (data points that are inconsistent with the rest of the dataset)

Deleting incomplete observations

Replacing parts of incomplete observations with estimates (for example, by a feature’s mean value)

Removing duplicates (for example, by merging records)

It is not enough to simply remove missing values. Sometimes, blind removal of data may completely distort the result. You must make a judicious decision based on domain knowledge before doing this kind of cleanup.

Suppose that you are a rookie in the field of finance and banking. You are given the task of implementing a function to calculate the growth of money. The inputs are the starting deposit (principal investment) P0 in a bank and the annual rate of interest r (assume it is a fraction between 0 and 1). The bank uses a continuously compounded interest. You also notice a handy formula for this task, P = P0ert, where t is the time in years. The final amount is P (see Exercise 1-2). Easy, right?

You should resist the temptation to immediately commence writing code without understanding all the underlying terms and mechanics. Dissecting formulas like this is the best way to peek under the hood and comprehend part of the domain (at least, from my experience). Here are the domain-related terms mentioned in the text:

Interest rate

Annual interest rate

Compound interest

Continuously compounded interest

The mathematical constant e in the final formula

The interest rate is the percentage by which to increase the current amount. If we start with P0, then we will end up with P1 = P0 + P0r = P0(1 + r). The annual interest rate is what we would get after one year using the previous equation. The compound interest relies on the previously accrued amount. Therefore, after 2 years we would have P2 = P1 + P1r = P1(1 + r) = P0(1 + r)². Notice how P2 indirectly depends upon P0. If a bank applies this compounding interest multiple times per year (say, with frequency n), then we would use $$ \frac{r}{n} $$ as our individual interestrate and apply it subsequently n times. All in all, this would result in

$$ {P}_1={P}_0{\left(1+\frac{r}{n}\right)}^n $$

. Finally, the continuous variant of compounding is when we let n → ∞.

This is the point where we must remind ourselves fromcalculus that

$$ \underset{n\to \infty }{\lim }{\left(1+\frac{1}{n}\right)}^n=\underset{t\to 0}{\lim }{\left(1+t\right)}^{\frac{1}{t}}=e $$

. Consequently, we have

$$ \underset{n\to \infty }{\lim }{\left(1+\frac{r}{n}\right)}^n=\underset{n\to \infty }{\lim }{\left(1+\frac{1}{\frac{n}{r}}\right)}^n=\underset{n\to \infty }{\lim }{\left(1+\frac{1}{\frac{n}{r}}\right)}^{\frac{n}{r}r}={e}^r $$

, after substituting

$$ t=\frac{1}{\frac{n}{r}}\wedge t\to 0 $$

. If we repeat this compounding over t years, then we get our term from the initially given formula.

This technique of dissecting formulas and studying terminology is one that I always use when learning a new domain. Once you grasp all the concepts and vocabulary of a domain, then it is OK to just accept formulas. Until then, try to split up any novel (to you) descriptions from a domain into constituent parts and analyze them one by one.

Note

If you decide to use a support vector machine (SVM), then you should first read what the kernel method and kernel trick are. Similarly, before you try principal component analysis (PCA), read about eigenvectors, eigenvalues, and orthonormal basis. It is so easy to fool yourself that you’ve achieved something remarkable simply because a particular machine learning method has returned positive results.

Programming Skills Attainment—Example

Suppose that you have strong domain knowledge but not enough programming experience in Python. You may end up with a solution as shown in Listing 1-1 (for the same problem as in the previous section). Let’s assume that the function should process a full list of amounts over different time periods.

import math

def calculate_money_growth(p0, r, t):

# List of final amounts.

p = []

for i in range(len(p0)):

p.append(p0[i] * math.exp(r * t[i]))

return p

Listing 1-1

Attempt to Implement a Function to Calculate the Growth of Money

The test would consist of calling the function with p0 = [1, 2, 3], r = 0.5, and t = [1, 10, 100]. After seeing the expected output of [1.6487212707001282, 296.8263182051532, 1.5554116585761217e+22], the task would be marked as completed. Of course, for this miniature dataset, all seems to be perfect. However, when working with massive amounts of data, this approach doesn’t scale. For the sake of completeness, Listing 1-2 shows the improved version utilizing NumPy. Notice the sleekness of the code and its expressive power; it speaks for itself. It also employs a basic defensive programming element.

import numpy as np

def calculate_money_growth(p0, r, t):

assert p0.size == t.size

return p0 * np.exp(r * t)

Listing 1-2

Optimized and Safer Version of Money Growth Calculator

Basic programming knowledge is not enough. You must be acquainted with efficient and powerful data science frameworks and technologies available for Python. Those are the true enablers to tackle Big Data problems. We will see many such frameworks in action throughout this book.

Note

At the very least, you must be proficient in SciPy, a Python-based ecosystem for data science. Moreover, as code complexity grows, you will need to know the principles, rules, and techniques for creating maintainable solutions. For example, if you’ve never heard of defensive programming, then it is time to brush up on your software engineering proficiencies. What would have happened above with p0 and t having different lengths? Where is this checked?

Overview of the Anaconda Ecosystem

Anaconda Distribution is a free ecosystem (includes a sophisticated package and environment manager) of prepackaged libraries for scientific computing and data science in Python. You may download the latest version for your operating system by visiting https://www.anaconda.com/distribution . There is also a paid Enterprise edition, but we will work here with the free Community variant. At the time of this writing, the current version is 5.3, which comes with Python 3.7. You may want to visit first the documentation at https://docs.anaconda.com .

There is also the Miniconda distribution , which contains only the bare minimum Python environment. We will use the full distribution, as we will need many bundled packages. Miniconda is good if you want to have better control over what gets installed, enabling you to save space.

After a successful installation, start Anaconda Navigator, which is a dashboard for managing environments and launching applications. Click Spyder’s Launch button, as shown on Figure 1-4.

../images/473947_1_En_1_Chapter/473947_1_En_1_Fig4_HTML.jpg

Figure 1-4

Home page of Anaconda Navigator with the Spyder Launch button selected

Spyder is a sophisticated scientific Python integrated development environment (IDE). It uses an internal IPython instance to execute code inside the editor and also to allow interactive computing. Figure 1-5 shows the layout of the user interface (UI).

../images/473947_1_En_1_Chapter/473947_1_En_1_Fig5_HTML.jpg

Figure 1-5

Spyder UI with three main regions

The Editor is located on the left side, the tabbed explorer component is in the upper-right pane (the variable explorer is shown in Figure 1-5), and the IPython console is in the lower-right pane, with expressions to run our fast money growth calculator. The runfile was autogenerated by Spyder after I clicked the Run button (large green arrow) on the toolbar.

If you manage to run the money growth calculator (click the Open Folder button in the toolbar to select and load a file) or evaluate any expression in the IPython console, then all is properly set up. You can also try out the embedded debugger (see the Debug menu, which is the first blue button from the left) by running some code with a configured breakpoint.

Managing Packages and Environments

conda is the command-line tool for managing packages and environments in Anaconda. You should use it instead of pip (which manages standard Python packages) and virtualenv (which creates isolated Python environments). The following is the dump of an interactive session showing the first couple of packages in the base (root) environment (observe in Figure 1-4 that Spyder was started from this environment, as shown in the Applications on field):

In [6]: packages = !conda list

In [7]: packages

Out[7]:

['# packages in environment at /Users/evarga/anaconda3:',

'#',

'# Name Version Build Channel',

'_ipyw_jlab_nb_ext_conf 0.1.0 py37_0 ',

'alabaster 0.7.11 py37_0 ',

'anaconda 5.3.0 py37_0 ',

'anaconda-client 1.7.2 py37_0 ',

'anaconda-navigator 1.9.2 py37_0 ',

'anaconda-project 0.8.2 py37_0 ',

'appdirs 1.4.3 py37h28b3542_0 ',

'appnope 0.1.0 py37_0 ',

'appscript 1.0.1 py37h1de35cc_1 ',

'asn1crypto 0.24.0 py37_0 ',

The first (bold) line shows an easy technique to run shell commands directly within an IPython

Enjoying the preview?

Page 1 of 1

Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

About this ebook

Ervin Varga

Related authors

Related to Practical Data Science with Python 3

Related ebooks

Programming For You

Related podcast episodes

Related articles

Related categories

Reviews for Practical Data Science with Python 3

What did you think?

Book preview

Practical Data Science with Python 3 - Ervin Varga

1. Introduction to Data Science

Main Phases of a Data Science Project

Brown Cow Model Case Study

Big Data Requires Data Scientists

Big Data

Big Data Example: MOOC Platforms

How to Learn Data Science

Domain Knowledge Attainment—Example

Note

Programming Skills Attainment—Example

Note

Overview of the Anaconda Ecosystem

Managing Packages and Environments