Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Distributed Computing with Python
Distributed Computing with Python
Distributed Computing with Python
Ebook284 pages2 hours

Distributed Computing with Python

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • You'll learn to write data processing programs in Python that are highly available, reliable, and fault tolerant
  • Make use of Amazon Web Services along with Python to establish a powerful remote computation system
  • Train Python to handle data-intensive and resource hungry applications
Who This Book Is For

This book is for Python developers who have developed Python programs for data processing and now want to learn how to write fast, efficient programs that perform CPU-intensive data processing tasks.

LanguageEnglish
Release dateApr 12, 2016
ISBN9781785887048
Distributed Computing with Python

Related to Distributed Computing with Python

Related ebooks

Computers For You

View More

Related articles

Reviews for Distributed Computing with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Distributed Computing with Python - Francesco Pierfederici

    Table of Contents

    Distributed Computing with Python

    Credits

    About the Author

    About the Reviewer

    www.PacktPub.com

    eBooks, discount offers, and more

    Why subscribe?

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Errata

    Piracy

    Questions

    1. An Introduction to Parallel and Distributed Computing

    Parallel computing

    Distributed computing

    Shared memory versus distributed memory

    Amdahl's law

    The mixed paradigm

    Summary

    2. Asynchronous Programming

    Coroutines

    An asynchronous example

    Summary

    3. Parallelism in Python

    Multiple threads

    Multiple processes

    Multiprocess queues

    Closing thoughts

    Summary

    4. Distributed Applications – with Celery

    Establishing a multimachine environment

    Installing Celery

    Testing the installation

    A tour of Celery

    More complex Celery applications

    Celery in production

    Celery alternatives – Python-RQ

    Celery alternatives – Pyro

    Summary

    5. Python in the Cloud

    Cloud computing and AWS

    Creating an AWS account

    Creating an EC2 instance

    Storing data in Amazon S3

    Amazon elastic beanstalk

    Creating a private cloud

    Summary

    6. Python on an HPC Cluster

    Your typical HPC cluster

    Job schedulers

    Running a Python job using HTCondor

    Running a Python job using PBS

    Debugging

    Summary

    7. Testing and Debugging Distributed Applications

    The big picture

    Common problems – clocks and time

    Common problems – software environments

    Common problems – permissions and environments

    Common problems – the availability of hardware resources

    Challenges – the development environment

    A useful strategy – logging everything

    A useful strategy – simulating components

    Summary

    8. The Road Ahead

    The first two chapters

    The tools

    The cloud and the HPC world

    Debugging and monitoring

    Where to go next

    Index

    Distributed Computing with Python


    Distributed Computing with Python

    Copyright © 2016 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: April 2016

    Production reference: 1060416

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78588-969-1

    www.packtpub.com

    Credits

    Author

    Francesco Pierfederici

    Reviewer

    James King

    Commissioning Editor

    Veena Pagare

    Acquisition Editor

    Aaron Lazar

    Content Development Editor

    Parshva Sheth

    Technical Editor

    Abhishek R. Kotian

    Copy Editor

    Neha Vyas

    Project Coordinator

    Nikhil Nair

    Proofreader

    Safis Editing

    Indexer

    Rekha Nair

    Graphics

    Disha Haria

    Production Coordinator

    Melwyn Dsa

    Cover Work

    Melwyn Dsa

    About the Author

    Francesco Pierfederici is a software engineer who loves Python. He has been working in the fields of astronomy, biology, and numerical weather forecasting for the last 20 years.

    He has built large distributed systems that make use of tens of thousands of cores at a time and run on some of the fastest supercomputers in the world. He has also written a lot of applications of dubious usefulness but that are great fun. Mostly, he just likes to build things.

    I would like to thank my wife, Alicia, for her unreasonable patience during the gestation of this book. I would also like to thank Parshva Sheth and Aaron Lazar at Packt Publishing and the technical reviewer, James King, who were all instrumental in making this a better book.

    About the Reviewer

    James King is a software developer with a broad range of experience in distributed systems. He is a contributor to many open source projects including OpenStack and Mozilla Firefox. He enjoys mathematics, horsing around with his kids, games, and art.

    www.PacktPub.com

    eBooks, discount offers, and more

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www2.packtpub.com/books/subscription/packtlib

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Preface

    Parallel and distributed computing is a fascinating subject that only a few years ago developers in only a very few large companies and national labs were privy to. Things have changed dramatically in the last decade or so, and now everybody can build small- and medium-scale distributed applications in a variety of programming languages including, of course, our favorite one: Python.

    This book is a very practical guide for Python programmers who are starting to build their own distributed systems. It starts off by illustrating the bare minimum theoretical concepts needed to understand parallel and distributed computing in order to lay the basic foundations required for the rest of the (more practical) chapters.

    It then looks at some first examples of parallelism using nothing more than modules from the Python standard library. The next step is to move beyond the confines of a single computer and start using more and more nodes. This is accomplished using a number of third-party libraries, including Celery and Pyro.

    The remaining chapters investigate a few deployment options for our distributed applications. The cloud and classic High Performance Computing (HPC) clusters, together with their strengths and challenges, take center stage.

    Finally, the thorny issues of monitoring, logging, profiling, and debugging are touched upon.

    All in all, this is very much a hands-on book, teaching you how to use some of the most common frameworks and methodologies to build parallel and distributed systems in Python.

    What this book covers

    Chapter 1, An Introduction to Parallel and Distributed Computing, takes you through the basic theoretical foundations of parallel and distributed computing.

    Chapter 2, Asynchronous Programming, describes the two main programming styles used in distributed applications: synchronous and asynchronous programming.

    Chapter 3, Parallelism in Python, shows you how to do more than one thing at the same time in your Python code, using nothing more than the Python standard library.

    Chapter 4, Distributed Applications – with Celery, teaches you how to build simple distributed applications using Celery and some of its competitors: Python-RQ and Pyro.

    Chapter 5, Python in the Cloud, shows how you can deploy your Python applications on the cloud using Amazon Web Services.

    Chapter 6, Python on an HPC Cluster, shows how to deploy your Python applications on a classic HPC cluster, typical of many universities and national labs.

    Chapter 7, Testing and Debugging Distributed Applications, talks about the challenges of testing, profiling, and debugging distributed applications in Python.

    Chapter 8, The Road Ahead, looks at what you have learned so far and which directions interested readers could take to push their development of distributed systems further.

    What you need for this book

    The following software and hardware is recommended:

    Python 3.5 or later

    A laptop or desktop computer running Linux or Mac OS X

    Ideally, some extra computers or some extra virtual machines to test your distributed applications

    All software mentioned in this book is free of charge and can be downloaded from the Internet with the exception of PBS Pro, which is commercial. Most of the PBS Pro functionality, however, is available in its close sibling Torque, which is open source.

    Who this book is for

    This book is for developers who already know Python and want to learn how to parallelize their code and/or write distributed systems. While a Unix environment is assumed, most if not all of the examples should also work on Windows systems.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: Import the concurrent.futures module.

    A block of code is set as follows:

    class Foo:

        def __init__(self):

            Docstring

            self.bar = 42

            # A comment

            return

    Any command-line input or output is written as follows:

    bookuser@hostname$ python3.5 ./foo.py

    New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Clicking the Next button moves you to the next screen.

    Note

    Warnings or important notes appear in a box like this.

    Tip

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

    To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR / 7-Zip for Windows

    Zipeg / iZip / UnRarX for Mac

    7-Zip / PeaZip for Linux

    Errata

    Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

    To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

    Piracy

    Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

    Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

    We appreciate your help in protecting our authors and our ability to bring you valuable content.

    Questions

    If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

    Chapter 1. An Introduction to Parallel and Distributed Computing

    The first modern digital computer was invented in the late 30s and early 40s (that is, arguably, the Z1 from Konrad Zuse in 1936), probably before most of the readers of this book—let alone

    Enjoying the preview?
    Page 1 of 1