Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models
()
About this ebook
You'll start with an introduction to streaming data, the various challenges associated with it, some of its real-world business applications, and various windowing techniques. You'll then examine incremental and online learning algorithms, and the concept of model evaluation with streaming data and get introduced to the Scikit-Multiflow framework in Python. This is followed by a review of the various change detection/concept drift detection algorithms and the implementation of various datasets using Scikit-Multiflow.
Introduction to the various supervised and unsupervised algorithms for streaming data, and their implementation on various datasets using Python are also covered. The book concludes by briefly covering other open-source tools available for streaming data such as Spark, MOA (Massive Online Analysis), Kafka, and more.
What You'll Learn
- Understand machine learning with streaming data concepts
- Review incremental and online learning
- Develop models for detecting concept drift
- Explore techniques for classification, regression, and ensemble learning in streaming data contexts
- Apply best practices for debugging and validating machine learning models in streaming data context
- Get introduced to other open-source frameworks for handling streaming data.
Machine learning engineers and data science professionals
Related to Practical Machine Learning for Streaming Data with Python
Related ebooks
Practical Machine Learning with Python: A Problem-Solver's Guide to Building Real-World Intelligent Systems Rating: 0 out of 5 stars0 ratingsHands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python Rating: 0 out of 5 stars0 ratingsUltimate Enterprise Data Analysis and Forecasting using Python Rating: 0 out of 5 stars0 ratingsPractical Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsDeep Learning: Convergence to Big Data Analytics Rating: 0 out of 5 stars0 ratingsMastering Python for Data Science Rating: 3 out of 5 stars3/5IoT Data Analytics using Python: Learn how to use Python to collect, analyze, and visualize IoT data (English Edition) Rating: 0 out of 5 stars0 ratingsPractical Machine Learning and Image Processing: For Facial Recognition, Object Detection, and Pattern Recognition Using Python Rating: 0 out of 5 stars0 ratingsLearning Quantitative Finance with R Rating: 4 out of 5 stars4/5Cognitive Computing Recipes: Artificial Intelligence Solutions Using Microsoft Cognitive Services and TensorFlow Rating: 0 out of 5 stars0 ratingsMachine Learning for the Web Rating: 0 out of 5 stars0 ratingsData Science Fundamentals for Python and MongoDB Rating: 0 out of 5 stars0 ratingsData Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next Rating: 0 out of 5 stars0 ratingsPro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R Rating: 0 out of 5 stars0 ratingsAdvanced Data Analytics Using Python: With Machine Learning, Deep Learning and NLP Examples Rating: 0 out of 5 stars0 ratingsLearn PySpark: Build Python-based Machine Learning and Deep Learning Models Rating: 0 out of 5 stars0 ratingsSupervised Learning with Python: Concepts and Practical Implementation Using Python Rating: 0 out of 5 stars0 ratingsMachine Learning for Beginners - 2nd Edition: Build and deploy Machine Learning systems using Python (English Edition) Rating: 0 out of 5 stars0 ratingsMastering Machine Learning with Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using Python Rating: 0 out of 5 stars0 ratingsPractical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets Rating: 0 out of 5 stars0 ratingsDeep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform Rating: 0 out of 5 stars0 ratingsPython Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition) Rating: 0 out of 5 stars0 ratingsIntroduction to Statistical and Machine Learning Methods for Data Science Rating: 0 out of 5 stars0 ratingsData Analytics for Social Microblogging Platforms Rating: 0 out of 5 stars0 ratingsImplementing AI Systems: Transform Your Business in 6 Steps Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/52084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Summary of Super-Intelligence From Nick Bostrom Rating: 5 out of 5 stars5/5Our Final Invention: Artificial Intelligence and the End of the Human Era Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6 Rating: 0 out of 5 stars0 ratingsImpromptu: Amplifying Our Humanity Through AI Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 0 out of 5 stars0 ratingsMidjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence Rating: 4 out of 5 stars4/5What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions Rating: 5 out of 5 stars5/5The Algorithm of the Universe (A New Perspective to Cognitive AI) Rating: 5 out of 5 stars5/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5AI for Educators: AI for Educators Rating: 5 out of 5 stars5/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsThe Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications Rating: 0 out of 5 stars0 ratingsHumans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence Rating: 4 out of 5 stars4/5
Reviews for Practical Machine Learning for Streaming Data with Python
0 ratings0 reviews
Book preview
Practical Machine Learning for Streaming Data with Python - Sayan Putatunda
© Sayan Putatunda 2021
S. PutatundaPractical Machine Learning for Streaming Data with Pythonhttps://doi.org/10.1007/978-1-4842-6867-4_1
1. An Introduction to Streaming Data
Sayan Putatunda¹
(1)
Bangalore, India
This chapter introduces you to streaming data, its various challenges, some of its real-world business applications, various windowing techniques, and the concepts of incremental and online learning algorithms. The chapter also introduces the scikit-multiflow framework in Python and some streaming data generators.
Please note that there are some other frameworks available in Python and for machine learning with streaming data. The scikit-multiflow framework is used in this book because I strongly believe that this package (given its wide range of implemented techniques and great documentation) is a good starting point for Python users to pick up online/incremental machine learning techniques for streaming data.
Streaming Data
The world has witnessed a data deluge in recent years. There has been a huge increase in the volume of data generated from various sources. The major data sources are the Internet, log data, sensor data, emails, RFID, POS transaction data, and so forth. The data gathered from these sources can be categorized into structured, semi-structured, and unstructured data. Recent technological advances are a major reason for this data explosion,
making data storage cheaper and a continuous collection of data possible. Almost all companies in sectors such as retail, social media, and IT are facing a data explosion. They are trying to figure out ways to process and analyze the massive data that they are generating and gain actionable insights.
Big Data
Most of us have heard of the term big data . The McKinsey Global Institute define big data as datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
[1]. Big data’s basic characteristics can be defined with four words that begin with the letter V: volume, velocity, variety, and veracity. The big
in big data is not only about the size or volume of the data—the other V
s also need to be considered.
Advancements in technology have made the continuous collection of data possible. We are inundated with data in various daily transactions (e.g., POS transactions in retail outlets like Walmart, Target, etc.), sensor data, web data, social media data, stock prices, search queries, clickstream data, and so forth. This is all a source for high-velocity data; that is, streaming data. Other sources of continuous data streams include operational monitoring, online advertising, mobile data, and the Internet of Things (IoT).
Streaming data, or data streams, are an infinite and continuous flow of data from a source that arrives at a very high speed. Thus, streaming data is a subset of big data that addresses the velocity aspect of big data. Some of the differentiating characteristics between streaming data and static data are the loosely structured, always on, and always flowing of data [2]. Unlike static data, structure is not rigidly defined in streaming data. It is always on;
that is, data is always available because new data is continuously generated.
As an example of how streaming data is harnessed in business, let’s look at online advertising. Social media giants like Facebook collect user behavior data in real time. Facebook has more than two billion active users every month [4], which gives you an idea of the scale of the data being collected.
Facebook earns most of its revenue by showing ads to its users. Many advertisers are onboarded in the platform. Facebook uses user behavior data to select ads that are relevant to its users (i.e., ads that have the maximum chance of interaction in the form of clicks or purchase). They perform other important steps in online advertising (such as placing a bid
in an auction
). All of this happens in near real time (i.e., whenever you load your News Feed page in Facebook or interact with other apps, such as Messenger or Instagram).
The Need to Process and Analyze Streaming Data
Streaming data offers real time or near real-time analytics. Since it’s ever-producing and never-ending, storing this enormous data and then running analytics on it (as done in batch processing) is not feasible. Streaming data must be analyzed on the fly
to gain insights in real time or near real time.
Figure 1-1 shows the streaming data analytics process flow. High-velocity data from various sources (sensor data, stock-tick data, etc.) are ingested, processed, and analyzed by a streaming data analytics engine in real time or near real time. The output generated is displayed via dashboards, apps, or any other