Python for Data Mining Quick Syntax Reference
()
About this ebook
Python for Data Mining Quick Syntax Reference covers each concept concisely, with many illustrative examples. You'll be introduced to several data mining packages, with examples of how to use each of them.
The first part covers core Python including objects, lists, functions, modules, and error handling. The second part covers Python's most important data mining packages: NumPy and SciPy for mathematical functions and random data generation, pandas for dataframe management and data import, Matplotlib for drawing charts, and scikitlearn for machine learning.
What You'll Learn
- Install Python and choose a development environment
- Understand the basic concepts of object-oriented programming
- Import, open, and edit files
- Review the differences between Python 2.x and 3.x
Programmers new to Python's data mining packages or with experience in other languages, who want a quick guide to Pythonic tools and techniques.
Related to Python for Data Mining Quick Syntax Reference
Related ebooks
Beginner's guide to mastering python Rating: 0 out of 5 stars0 ratingsMaster Python Without Prior Experience Rating: 0 out of 5 stars0 ratingsPython For Data Science Rating: 0 out of 5 stars0 ratingsPython Mini Manual Rating: 0 out of 5 stars0 ratingsData Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems Rating: 0 out of 5 stars0 ratingsPython for Developers Rating: 0 out of 5 stars0 ratingsUnlocking the Code: Mastering Python for Beginners and Beyond: 1, #1 Rating: 0 out of 5 stars0 ratingsPython Data Persistence Rating: 0 out of 5 stars0 ratingsPython Machine Learning for Beginners: Unsupervised Learning, Clustering, and Dimensionality Reduction. Part 2 Rating: 0 out of 5 stars0 ratingsPython for Secret Agents - Volume II Rating: 0 out of 5 stars0 ratingsYour First Python Program Rating: 0 out of 5 stars0 ratingsAdvanced Python Development: Using Powerful Language Features in Real-World Applications Rating: 0 out of 5 stars0 ratingsPython for Mechanical and Aerospace Engineering Rating: 0 out of 5 stars0 ratingsData Driven Guide for Python Programming : Master Essentials to Advanced Data Structures Rating: 0 out of 5 stars0 ratingsPython Mastery Unleashed: Advanced Programming Techniques Rating: 0 out of 5 stars0 ratingsPractical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python Rating: 4 out of 5 stars4/5Python for Professionals Rating: 0 out of 5 stars0 ratingsComputer Programming JavaScript, Python, HTML, SQL, CSS Rating: 0 out of 5 stars0 ratingsBasic Python for Data Management, Finance, and Marketing: Advance Your Career by Learning the Most Powerful Analytical Tool Rating: 0 out of 5 stars0 ratingsData Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition) Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsTroubleshooting Puppet Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsA Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics Rating: 0 out of 5 stars0 ratingsNumerical Python: A Practical Techniques Approach for Industry Rating: 0 out of 5 stars0 ratingsPython Internals for Developers: Practice Python 3.x Fundamentals, Including Data Structures, Asymptotic Analysis, and Data Types Rating: 0 out of 5 stars0 ratingsPython 3 Programming: A Beginner Crash Course Guide to Learn Python 3 in 1 Week Rating: 3 out of 5 stars3/5
Programming For You
Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition) Rating: 0 out of 5 stars0 ratingsJava for Beginners: A Crash Course to Learn Java Programming in 1 Week Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5C# Programming from Zero to Proficiency (Beginner): C# from Zero to Proficiency, #2 Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Learn JavaScript in 24 Hours Rating: 3 out of 5 stars3/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Python Machine Learning By Example Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5Problem Solving in C and Python: Programming Exercises and Solutions, Part 1 Rating: 5 out of 5 stars5/5Python QuickStart Guide: The Simplified Beginner's Guide to Python Programming Using Hands-On Projects and Real-World Applications Rating: 0 out of 5 stars0 ratingsLearn SQL in 24 Hours Rating: 5 out of 5 stars5/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Raspberry Pi Cookbook for Python Programmers Rating: 0 out of 5 stars0 ratings
Reviews for Python for Data Mining Quick Syntax Reference
0 ratings0 reviews
Book preview
Python for Data Mining Quick Syntax Reference - Valentina Porcu
© Valentina Porcu 2018
Valentina PorcuPython for Data Mining Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4113-4_1
1. Getting Started
Valentina Porcu¹
(1)
Nuoro, Italy
Python is one of the most important programming languages used in data science. In this chapter, you’ll learn how to install Python and review some of the integrated development environments (IDEs) used for data analysis. You’ll also learn how to set up a working directory on your computer.
Installing Python
Python2 and Python3 can be downloaded easily from https://www.python.org/downloads/ (Figure 1-1) and then installed. Note that if you are working on a Unix system using a Mac or Linux, Python is preinstalled. Simply type python
to load the program.
Figure 1-1
Python home page
From the python.org ( http://python.org/ ) website, click Downloads then select the appropriate version to use based on your operating system. Then, follow the on-screen instructions to install Python.
Editor and IDEs
There are many ways to use a programming language such as Python. To start, type the word python
followed immediately by its version number. There is no space before the number. For example, in Figure 1-2, I’ve typed python2.
Figure 1-2
Terminal with Python open
Writing code this way may prove to be somewhat cumbersome, so we use text editors or IDEs to facilitate the process.
There are many editors (those that are free and those that can be purchased) that differ in their completeness, scalability, and ease of use. Some are simple and some are more advanced. The most used editors include Sublime Text, Text Wrangler ( http://www.barebones.com/ ), Notepad++ ( http://notepad-plus-plus.org/download/v7.3.1.html ) (for Windows), or TextMate ( http://macromates.com/ ) (for Mac).
As for Python-specific IDEs , Wingware ( http://wingware.com/ ), Komodo ( http://www.activestate.com/komodo-ide ), Pycharm, and Emacs ( http://www.gnu.org/software/emacs/ ) are popular, but there are plenty of others. They provide tools to simplify work, such as self-completion, auto-editing and auto-indentation, integrated documentation, syntax highlighting, and code folding (the ability to hide some pieces of code while you works on others), and to support debugging.
Spyder (which is included in Anaconda ( http://www.continuum.io/downloads )) and Jupyter ( http://jupyter.readthedocs.io/en/latest/ ), that you can download from the website www.anaconda.com , are the IDEs used most in data science, along with Canopy. A useful tool in Jupyter is nbviewer, which allows the exchange of Jupyter’s .ipynb files, and can be downloaded from http://nbviewer.jupyter.org . nbviewer can also be linked to GitHub.
As for Anaconda, which is a very useful tool because it also features Jupyter, it can be downloaded from http://www.continuum/ . A partial list of resources installed with Anaconda (which contains more than 100 packets for data mining, math, data analysis, and algebra) is presented in Figure 1-3. You can view the complete list by opening the a terminal window shown in Figure 1-3 and then typing:
conda list
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig3_HTML.jpgFigure 1-3
Part of the resources installed with Anaconda
We can program with Python using one or more of these tools, depending on our habits and what we want to do. Spyder (Figure 1-4) and Jupyter (Figure 1-5) are very common for data mining. Both can be used and installed individually. For example, Jupyter can be tested using http://try.jupyter.org/. However, both Spyder and Jupyter are available after Anaconda is installed.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig4_HTML.jpgFigure 1-4
Spyder home screen
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig5_HTML.jpgFigure 1-5
Example of open script on Jupyter IDE
Python code can be run directly from a computer terminal or saved as a .py file and then run from these other editors. As mentioned earlier, >>>
(displayed in Figure 1-6) tells us we are running Python code.
Figure 1-6
The command prompt in Python
To follow the examples presented in this book, I recommend you install Anaconda (Figure 1-7) from the AAnaconda.com web site and use Jupyter. Because Anaconda automatically includes (and installs) a set of packages and modules that we will use later, we won’t have to install packages or modules separately thereafter; we’ll already have them loaded and ready to use.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig7_HTML.jpgFigure 1-7
Anaconda’s main screen
Differences between Python2 and Python3
Python was released in two different versions: Python2 and Python3. Python2 was born in 2000 (currently, the latest release is 2.7) and its support is expected to continue until 2020. It is the historical and most complete version.
Python3 was released in 2008 (current version is 3.6). There are many libraries in Python3, but not all of them have been converted from Python2 for Python3.
The two versions are very similar but feature some differences. One example includes mathematical operations:
>>> 5/2
2
# Python2 performs division by breaking the decimal.
Listing 1-1
Mathematical Operations in Python 2.7
>>> 5/2
2.5
Listing 1-2
Mathematical Operations in Python 3.5.2
To get the correct result in Python2, we have to specify the decimal as
>>> 5.0/2
2.5
# or like this
>>> 5/2.0
2.5
# or specify we are talking about a decimal (float)
>>> float(5)/2
2.5
To keep the two versions of Python together, you can also import Python into a form called future , which allows you to import Python3 functions into Python2:
>>> from __future__ import division
>>> 5/2
2.5
For a closer look at the differences between the two versions of Python, access this online resource ( http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html ).
Why choose one version of Python over the other? Python2 is the best-defined and most stable version, whereas Python3 represents the future of the language, although the two versions may not always coincide. In the first part of this book, I highlight the differences between the two versions. However, beginning with Chapter 7 and moving to the end of the book, we will use Python3.
Let’s start by setting up a work directory. This directory will house our files.
Work Directory
A work directory stores our scripts and our files. It is where Python automatically looks when we ask it to import a file or run a script. To set up a work directory, type the following in the Python shell:
>>> import os
>>>> os.getcwd()
'~/mypc'
# to edit the work directory, we use the following notation, inserting the new directory in parentheses
>>> os.chdir(/~/Python_script
)
# then we determine whether it is correct
>>> os.getcwd()
'~/Python_script'
Now, when we want to import a file in our workbook, we simply type the name of the file followed by the extension, all surrounded by double quotation marks:
file_name.extension
For instance,
dataframe_data_collection1.csv
Python checks whether there is a file with that name inside that folder and imports it. The same thing happens when we save a Python file by typing it on a computer. Python automatically puts it in that folder. Even when we run a Python script, as we will see, we have to access the folder where the script (the work directory or another one) is located directly from the terminal.
If we want to import a file that is not in the work directory but is elsewhere on our computer or on the Web, we do this by entering the full file address:"
complete_address.file_name.extension"
For instance,
/~/dataframe_data1.csv
Now let’s make sure that you understand the difference between using a the terminal and starting a session in our favorite programming language.
Using a Terminal
To run Python scripts, we first open a terminal window, as shown in Figure 1-8.
../images/469457_1_En_1_Chapter/469457_1_En_1_Fig8_HTML.jpgFigure 1-8
My terminal
As you can see, the dollar symbol ($) is displayed, not the Python shell symbol (>>>). To view a list of our folders and files, use the ls
command (Figure 1-9).
Figure 1-9
List of resources on my computer
At this point, we can move to the Python_test folder by typing
cd Python_test
In that folder, I find my Python scripts—that is, the .py files I can run by typing
python test.py
test.py is the name of the script I am going to run.
Summary
In this chapter we learned how to install Python and I reviewed some of the various IDEs we can use for data analysis. We also examined Python2 and Python3, and learned how to set up a work directory on a terminal.
© Valentina Porcu 2018
Valentina PorcuPython for Data Mining Quick Syntax Referencehttps://doi.org/10.1007/978-1-4842-4113-4_2
2. Introductory Notes
Valentina Porcu¹
(1)
Nuoro, Italy
In this chapter we examine Python objects and operators, and learn how to write comments in our code. Including comments in your code is very important for two reasons. First, they serve as a reminder of our thought processes on the work we did weeks or months after we’ve created a script. Second, they help other programmers understand why we did what we did.
Objects in Python
In Python, any item is considered an object, a container to place data. Python objects include tuples, lists, sets, dictionaries, and containers. Python processing is based on objects.
Each object is distinguished by three properties:
1.
A name
2.
A type
3.
An ID
Object names consist of alphanumeric characters and underscores—in other words, all characters from