Getting Started with Beautiful Soup
3/5
()
About this ebook
Getting Started with Beautiful Soup is great for anybody who is interested in website scraping and extracting information. However, a basic knowledge of Python, HTML tags, and CSS is required for better understanding.
Related to Getting Started with Beautiful Soup
Related ebooks
Hands-On Web Scraping with Python: Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others Rating: 0 out of 5 stars0 ratingsWeb Scraping with Python Rating: 4 out of 5 stars4/5Mastering Social Media Mining with Python Rating: 5 out of 5 stars5/5NumPy Essentials Rating: 0 out of 5 stars0 ratingsLearning pandas - Second Edition Rating: 4 out of 5 stars4/5Python GUI Programming Cookbook - Second Edition Rating: 5 out of 5 stars5/5Modular Programming with Python Rating: 0 out of 5 stars0 ratingsMastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsInteractive Applications Using Matplotlib Rating: 0 out of 5 stars0 ratingsPython for Google App Engine Rating: 0 out of 5 stars0 ratingsPractical Data Science Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python - Second Edition Rating: 0 out of 5 stars0 ratingsPython Unlocked Rating: 0 out of 5 stars0 ratingsHands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python Rating: 0 out of 5 stars0 ratingsGetting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsLearning Website Development with Django Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsPython Data Structures and Algorithms Rating: 5 out of 5 stars5/5Building Web Applications with Python and Neo4j Rating: 0 out of 5 stars0 ratingsData Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition) Rating: 0 out of 5 stars0 ratingsDjango Design Patterns and Best Practices Rating: 5 out of 5 stars5/5Mastering Python Data Analysis Rating: 0 out of 5 stars0 ratingsReinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges Rating: 0 out of 5 stars0 ratingsPython Tools for Visual Studio Rating: 0 out of 5 stars0 ratingsArtificial Intelligence with Python - Second Edition: Your complete guide to building intelligent apps using Python 3.x, 2nd Edition Rating: 0 out of 5 stars0 ratingsMastering Python Regular Expressions Rating: 5 out of 5 stars5/5Python Web Scraping - Second Edition Rating: 5 out of 5 stars5/5
Programming For You
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5HTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5C++ Learn in 24 Hours Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Java for Beginners: A Crash Course to Learn Java Programming in 1 Week Rating: 5 out of 5 stars5/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5SQL: For Beginners: Your Guide To Easily Learn SQL Programming in 7 Days Rating: 5 out of 5 stars5/5C# 7.0 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsPython: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 0 out of 5 stars0 ratingsLearn SQL in 24 Hours Rating: 5 out of 5 stars5/5Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1 Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Beginning Programming with Python For Dummies Rating: 3 out of 5 stars3/5C++ Programming Language Rating: 0 out of 5 stars0 ratingsPython: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Game Development with Unreal Engine 5: Learn the Basics of Game Development in Unreal Engine 5 (English Edition) Rating: 0 out of 5 stars0 ratingsData Structures and Algorithm Analysis in Java, Third Edition Rating: 4 out of 5 stars4/5
Reviews for Getting Started with Beautiful Soup
1 rating0 reviews
Book preview
Getting Started with Beautiful Soup - Vineeth G. Nair
Table of Contents
Getting Started with Beautiful Soup
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Installing Beautiful Soup
Installing Beautiful Soup
Installing Beautiful Soup in Linux
Installing Beautiful Soup using package manager
Installing Beautiful Soup using pip or easy_install
Installing Beautiful Soup using pip
Installing Beautiful Soup using easy_install
Installing Beautiful Soup in Windows
Verifying Python path in Windows
Installing Beautiful Soup using setup.py
Using Beautiful Soup without installation
Verifying the installation
Quick reference
Summary
2. Creating a BeautifulSoup Object
Creating a BeautifulSoup object
Creating a BeautifulSoup object from a string
Creating a BeautifulSoup object from a file-like object
Creating a BeautifulSoup object for XML parsing
Understanding the features argument
Tag
Accessing the Tag object from BeautifulSoup
Name of the Tag object
Attributes of a Tag object
The NavigableString object
Quick reference
Summary
3. Search Using Beautiful Soup
Searching in Beautiful Soup
Searching with find()
Finding the first producer
Explaining find()
Searching for tags
Searching for text
Searching based on regular expressions
Searching based on attribute values of a tag
Finding the first primary consumer
Searching based on custom attributes
Searching based on the CSS class
Searching using functions defined
Applying searching methods in combination
Searching with find_all()
Finding all tertiary consumers
Understanding parameters used with find_all()
Searching for Tags in relation
Searching for the parent tags
Searching for siblings
Searching for next
Searching for previous
Using search methods to scrape information from a web page
Quick reference
Summary
4. Navigation Using Beautiful Soup
Navigation using Beautiful Soup
Navigating down
Using the name of the child tag
Using predefined attributes
The .contents attribute
The .children attribute
The .descendants attribute
Special attributes for navigating down
The .string attribute
The .strings attribute
Navigating up
The .parent attribute
The .parents attribute
Navigating sideways to the siblings
The .next_sibling attribute
The .previous_sibling attribute
Navigating to the previous and next objects parsed
Quick reference
Summary
5. Modifying Content Using Beautiful Soup
Modifying Tag using Beautiful Soup
Modifying the name property of Tag
Modifying the attribute values of Tag
Updating the existing attribute value of Tag
Adding new attribute values to Tag
Deleting the tag attributes
Adding a new tag
Adding a new producer using new_tag() and append()
Creating a new tag using new_tag()
Adding a new tag using append()
Adding a new div tag to the li tag using insert()
Modifying string contents
Using .string to modify the string content
Adding strings using .append(), insert(), and new_string()
Deleting tags from the HTML document
Deleting the producer using decompose()
Deleting the producer using extract()
Deleting the contents of a tag using Beautiful Soup
Special functions to modify content
Quick reference
Summary
6. Encoding Support in Beautiful Soup
Encoding in Beautiful Soup
Understanding the original encoding of the HTML document
Specifying the encoding of the HTML document
Output encoding
Quick reference
Summary
7. Output in Beautiful Soup
Formatted printing
Unformatted printing
Output formatters in Beautiful Soup
The minimal formatter
The html formatter
The None formatter
The function formatter
Using get_text()
Quick reference
Summary
8. Creating a Web Scraper
Getting book details from PacktPub.com
Finding pages with a list of books
Finding book details
Getting selling prices from Amazon
Getting the selling price from Barnes and Noble
Summary
Index
Getting Started with Beautiful Soup
Getting Started with Beautiful Soup
Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2014
Production Reference: 1170114
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-955-4
www.packtpub.com
Cover Image by Mohamed Raoof (<raoofpmajeed@gmail.com>)
Credits
Author
Vineeth G. Nair
Reviewers
John J. Czaplewski
Christian S. Perone
Zhang Xiang
Acquisition Editor
Nikhil Karkal
Senior Commissioning Editor
Kunal Parikh
Commissioning Editor
Manasi Pandire
Technical Editors
Novina Kewalramani
Pooja Nair
Copy Editor
Janbal Dharmaraj
Project Coordinator
Jomin Varghese
Proofreader
Maria Gould
Indexer
Hemangini Bari
Graphics
Sheetal Aute
Abhinash Sahu
Production Coordinator
Adonia Jones
Cover Work
Adonia Jones
About the Author
Vineeth G. Nair completed his bachelors in Computer Science and Engineering from Model Engineering College, Cochin, Kerala. He is currently working with Oracle India Pvt. Ltd. as a Senior Applications Engineer.
He developed an interest in Python during his college days and began working as a freelance programmer. This led him to work on several web scraping projects using Beautiful Soup. It helped him gain a fair level of mastery on the technology and a good reputation in the freelance arena. He can be reached at <vineethgnair.mec@gmail.com>. You can visit his website at www.kochi-coders.com.
My sincere thanks to Leonard Richardson, the primary author of Beautiful Soup. I would like to thank my friends and family for their great support and encouragement for writing this book. My special thanks to Vijitha S. Menon, for always keeping my spirits up, providing valuable comments, and showing me the best ways to bring this book up. My sincere thanks to all the reviewers for their suggestions, corrections, and points of improvement.
I extend my gratitude to the team at Packt Publishing who helped me in making this book happen.
About the Reviewers
John J. Czaplewski is a Madison, Wisconsin-based mapper and web developer who specializes in web-based mapping, GIS, and data manipulation and visualization. He attended the University of Wisconsin – Madison, where he received his BA in Political Science and a graduate certificate in GIS. He is currently a Programmer Analyst for the UW-Madison Department of Geoscience working on data visualization, database, and web application development. When not sitting behind a computer, he enjoys rock climbing, cycling, hiking, traveling, cartography, languages, and nearly anything technology related.
Christian S. Perone is an experienced Pythonista, open source collaborator, and the project leader of Pyevolve, a very popular evolutionary computation framework chosen to be part of OpenMDAO, which is an effort by the NASA Glenn Research Center. He has been a programmer for 12 years, using a variety of languages including C, C++, Java, and Python. He has contributed to many open source projects and loves web scraping, open data, web development, machine learning, and evolutionary computation. Currently, he lives in Porto Alegre, Brazil.
Zhang Xiang is an engineer working for the Sina Corporation.
I'd like to thank my girlfriend, who supports me all the time.
www.PacktPub.com
Support files, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at