Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Python High Performance - Second Edition
Python High Performance - Second Edition
Python High Performance - Second Edition
Ebook353 pages7 hours

Python High Performance - Second Edition

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • Identify the bottlenecks in your applications and solve them using the best profiling techniques
  • Write efficient numerical code in NumPy, Cython, and Pandas
  • Adapt your programs to run on multiple processors and machines with parallel programming
Who This Book Is For

The book is aimed at Python developers who want to improve the performance of their application. Basic knowledge of Python is expected

LanguageEnglish
Release dateMay 24, 2017
ISBN9781787282438
Python High Performance - Second Edition
Author

Gabriele Lanaro

Gabriele Lanaro is a PhD student in Chemistry at the University of British Columbia, in the field of Molecular Simulation. He writes high performance Python code to analyze chemical systems in large-scale simulations. He is the creator of Chemlab—a high performance visualization software in Python—and emacs-for-python—a collection of emacs extensions that facilitate working with Python code in the emacs text editor. This book builds on his experience in writing scientific Python code for his research and personal projects.

Related authors

Related to Python High Performance - Second Edition

Related ebooks

Programming For You

View More

Related articles

Reviews for Python High Performance - Second Edition

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Python High Performance - Second Edition - Gabriele Lanaro

    Title Page

    Python High Performance

    Second Edition

    Build robust applications by implementing concurrent and distributed processing techniques

    Gabriele Lanaro

           BIRMINGHAM - MUMBAI

    Copyright

    Python High Performance

    Second Edition

    Copyright © 2017 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: December 2013

    Second edition: May 2017

    Production reference: 2250517

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham 

    B3 2PB, UK.

    ISBN 978-1-78728-289-6

    www.packtpub.com

    Credits

    About the Author

    Dr. Gabriele Lanaro has been conducting research to study the formation and growth of crystals using medium and large-scale computer simulations. In 2017, he obtained his PhD in theoretical chemistry. His interests span machine learning, numerical computing visualization, and web technologies. He has a sheer passion for good software and is the author of the chemlab and chemview open source packages. In 2013, he authored the first edition of the book "High Performance Python Programming".

    I'd like to acknowledge the support from Packt editors, including Vikas Tiwari. I would also like to thank my girlfriend, Harani, who had to tolerate the way-too-long writing nights, and friends who provided company and support throughout. Also, as always, I’d love to thank my parents for giving me the opportunity to pursue my ambitions.

    Lastly, I would like to thank Blenz coffee for powering the execution engine of this book through electricity and caffeine.

    About the Reviewer

    Will Brennan is a C++/Python developer based in London with previous experience in writing molecular dynamics simulations. He is currently working on high-performance image processing and machine learning applications. You can refer to his repositories at https://github.com/WillBrennan.

    www.PacktPub.com

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www.packtpub.com/mapt

    Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Customer Feedback

    Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787282899.

    If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

    Table of Contents

    Customer Feedback

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book

    Errata

    Piracy

    Questions

    Benchmarking and Profiling

    Designing your application

    Writing tests and benchmarks

    Timing your benchmark

    Better tests and benchmarks with pytest-benchmark

    Finding bottlenecks with cProfile

    Profile line by line with line_profiler

    Optimizing our code

    The dis module

    Profiling memory usage with memory_profiler

    Summary

    Pure Python Optimizations

    Useful algorithms and data structures

    Lists and deques

    Dictionaries

    Building an in-memory search index using a hash map

    Sets

    Heaps

    Tries

    Caching and memoization

    Joblib

    Comprehensions and generators

    Summary

    Fast Array Operations with NumPy and Pandas

    Getting started with NumPy

    Creating arrays

    Accessing arrays

    Broadcasting

    Mathematical operations

    Calculating the norm

    Rewriting the particle simulator in NumPy

    Reaching optimal performance with numexpr

    Pandas

    Pandas fundamentals

    Indexing Series and DataFrame objects

    Database-style operations with Pandas

    Mapping

    Grouping, aggregations, and transforms

    Joining

    Summary

    C Performance with Cython

    Compiling Cython extensions

    Adding static types

    Variables

    Functions

    Classes

    Sharing declarations

    Working with arrays

    C arrays and pointers

    NumPy arrays

    Typed memoryviews

    Particle simulator in Cython

    Profiling Cython

    Using Cython with Jupyter

    Summary

    Exploring Compilers

    Numba

    First steps with Numba

    Type specializations

    Object mode versus native mode

    Numba and NumPy

    Universal functions with Numba

    Generalized universal functions

    JIT classes

    Limitations in Numba

    The PyPy project

    Setting up PyPy

    Running a particle simulator in PyPy

    Other interesting projects

    Summary

    Implementing Concurrency

    Asynchronous programming

    Waiting for I/O

    Concurrency

    Callbacks

    Futures

    Event loops

    The asyncio framework

    Coroutines

    Converting blocking code into non-blocking code

    Reactive programming

    Observables

    Useful operators

    Hot and cold observables

    Building a CPU monitor

    Summary

    Parallel Processing

    Introduction to parallel programming

    Graphic processing units

    Using multiple processes

    The Process and Pool classes

    The Executor interface

    Monte Carlo approximation of pi

    Synchronization and locks

    Parallel Cython with OpenMP

    Automatic parallelism

    Getting started with Theano

    Profiling Theano

    Tensorflow

    Running code on a GPU

    Summary

    Distributed Processing

    Introduction to distributed computing

    An introduction to MapReduce

    Dask

    Directed Acyclic Graphs

    Dask arrays

    Dask Bag and DataFrame

    Dask distributed

    Manual cluster setup

    Using PySpark

    Setting up Spark and PySpark

    Spark architecture

    Resilient Distributed Datasets

    Spark DataFrame

    Scientific computing with mpi4py

    Summary

    Designing for High Performance

    Choosing a suitable strategy

    Generic applications

    Numerical code

    Big data

    Organizing your source code

    Isolation, virtual environments, and containers

    Using conda environments

    Virtualization and Containers

    Creating docker images

    Continuous integration

    Summary

    Preface

    The Python programming language has seen a huge surge in popularity in recent years, thanks to its intuitive, fun syntax, and its vast array of top-quality third-party libraries. Python has been the language of choice for many introductory and advanced university courses as well as for numerically intense fields, such as the sciences and engineering. Its primary applications also lies in machine learning, system scripting, and web applications.

    The reference Python interpreter, CPython, is generally regarded as inefficient when compared to lower-level languages, such as C, C++, and Fortran. CPython’s poor performance lies in the fact that the program instructions are processed by an interpreter rather than being compiled to efficient machine code. While using an interpreter has several advantages, such as portability and the additional compilation step, it does introduce an extra layer of indirection between the program and the machine, which causes a less efficient execution.

    Over the years, many strategies have been developed to overcome CPython's performance shortcomings. This book aims to fill this gap and will teach how to consistently achieve strong performance out of your Python programs.

    This book will appeal to a broad audience as it covers both the optimization of numerical and scientific codes as well as strategies to improve the response times of web services and applications.

    The book can be read cover-to-cover ; however, chapters are designed to be self-contained so that you can skip to a section of interest if you are already familiar with the previous topics.

    What this book covers

    Chapter 1, Benchmark and Profiling, will teach you how to assess the performance of Python programs and practical strategies on how to identify and isolate the slow sections of your code.

    Chapter 2, Pure Python Optimizations, discusses how to improve your running times by order of magnitudes using the efficient data structures and algorithms available in the Python standard library and pure-Python third-party modules.

    Chapter 3, Fast Array Operations with NumPy and Pandas, is a guide to the NumPy and Pandas packages. Mastery of these packages will allow you to implement fast numerical algorithms with an expressive, concise interface.

    Chapter 4, C Performance with Cython, is a tutorial on Cython, a language that uses a Python-compatible syntax to generate efficient C code.

    Chapter 5, Exploring Compilers, covers tools that can be used to compile Python to efficient machine code. The chapter will teach you how to use Numba, an optimizing compiler for Python functions, and PyPy, an alternative interpreter that can execute and optimize Python programs on the fly.

    Chapter 6, Implementing Concurrency, is a guide to asynchronous and reactive programming. We will learn about key terms and concepts, and demonstrate how to write clean, concurrent code using the asyncio and RxPy frameworks.

    Chapter 7, Parallel Processing, is an introduction to parallel programming on multi-core processors and GPUs. In this chapter, you will learn to achieve parallelism using the multiprocessing module and by expressing your code using Theano and Tensorflow.

    Chapter 8, Distributed Processing, extends the content of the preceding chapter by focusing on running parallel algorithms on distributed systems for large-scale problems and big data. This chapter will cover the Dask, PySpark, and mpi4py libraries.

    Chapter 9, Designing for High Performance, discusses general optimization strategies and best practices to develop, test, and deploy your high-performance Python applications.

    What you need for this book

    The software in this book is tested on Python version 3.5 and on Ubuntu version 16.04. However, majority of the examples can also be run on the Windows and Mac OS X operating systems.

    The recommended way to install Python and the associated libraries is through the Anaconda distribution, which can be downloaded from https://www.continuum.io/downloads, for Linux, Windows, and Mac OS X.

    Who this book is for

    The book is aimed at Python developers who want to improve the performance of their application; basic knowledge of Python is expected.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: To summarize, we will implement a method called ParticleSimulator.evolve_numpy and benchmark it against the pure Python version, renamed as ParticleSimulator.evolve_python

    A block of code is set as follows:

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    Any command-line input or output is written as follows:

    New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: On the right, clicking on the tab Callee Map will display a diagram of the function costs.

    Warnings or important notes appear in a box like this.

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

    To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR / 7-Zip for Windows

    Zipeg / iZip / UnRarX for Mac

    7-Zip / PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Python-High-Performance-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Downloading the color images of this book

    We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/PythonHighPerformanceSecondEdition_ColorImages.pdf.

    Errata

    Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

    To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

    Piracy

    Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

    Please contact us at copyright@packtpub.com with a link to the suspected pirated material.

    We appreciate your help in protecting our authors and our ability to bring you valuable content.

    Questions

    If you have a problem with any aspect of this book, you can contact us at questions@packtpub.com, and we will do our best to address the problem.

    Benchmarking and Profiling

    Recognizing the slow parts of your program is the single most important task when it comes to speeding up your code. Luckily, in most cases, the code that causes the application to slow down is a very small fraction of the program. By locating those critical sections, you can focus on the parts that need improvement without wasting time in micro-optimization.

    Profiling is the technique that allows us to pinpoint the most resource-intensive spots in an application. A profiler is a program that runs an application and monitors how long each function takes to execute, thus detecting the functions in which your application spends most of its time.

    Python provides several tools to help us find these bottlenecks and measure important performance metrics. In this chapter, we will learn how to use the standard cProfile module and the line_profiler third-party package.  We will also learn how to profile an application's memory consumption through the  memory_profiler tool. Another useful tool that we will cover is KCachegrind, which can be used to graphically display the data produced by various profilers.

    Benchmarks are small scripts used to assess the total execution time of your application. We will learn how to write benchmarks and how to accurately time your programs.

    The list of topics we will cover in this chapter is as follows:

    General principles of high performance programming

    Writing tests and benchmarks

    The Unix time command

    The Python timeit module

    Testing and benchmarking with pytest

    Profiling your application

    The cProfile standard tool

    Interpreting profiling results with KCachegrind

    line_profiler and  memory_profiler tools

    Disassembling Python code through the dis module

    Designing your application

    When designing a performance-intensive program, the very first step is to write your code without bothering with small optimizations:

    Premature optimization is the root of all evil.

    Donald Knuth

    In the early development stages, the design of the program can change quickly and may require large rewrites and reorganizations of the code base. By testing different prototypes without the burden of optimization, you are free to devote your time and energy to ensure that the program produces correct results and that the design is flexible. After all, who needs an application that runs fast but gives the wrong answer?

    The mantras that you should remember when optimizing your code are as follows:

    Make it run: We have to get the software in a working state, and ensure that it produces the correct results. This exploratory phase serves to better understand the application and to spot major design issues in the early stages.

    Make it right: We want to ensure that the design of the program is solid. Refactoring should be done before attempting any performance optimization. This really helps separate the application into independent and cohesive units that are easier to maintain.

    Make it fast: Once our program is working and is well structured, we can focus on performance optimization. We may also want to optimize memory usage if that constitutes an issue.

    In this section, we will write and profile a particle simulator test application. The simulator is

    Enjoying the preview?
    Page 1 of 1