Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models
Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models
Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models
Ebook147 pages1 hour

Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Design, develop, and validate machine learning models with streaming data using the Scikit-Multiflow framework. This book is a quick start guide for data scientists and machine learning engineers looking to implement machine learning models for streaming data with Python to generate real-time insights. 

You'll start with an introduction to streaming data, the various challenges associated with it, some of its real-world business applications, and various windowing techniques. You'll then examine incremental and online learning algorithms, and the concept of model evaluation with streaming data and get introduced to the Scikit-Multiflow framework in Python. This is followed by a review of the various change detection/concept drift detection algorithms and the implementation of various datasets using Scikit-Multiflow.

Introduction to the various supervised and unsupervised algorithms for streaming data, and their implementation on various datasets using Python are also covered. The book concludes by briefly covering other open-source tools available for streaming data such as Spark, MOA (Massive Online Analysis), Kafka, and more.


What You'll Learn
  • Understand machine learning with streaming data concepts
  • Review incremental and online learning
  • Develop models for detecting concept drift
  • Explore techniques for classification, regression, and ensemble learning in streaming data contexts
  • Apply best practices for debugging and validating machine learning models in streaming data context
  • Get introduced to other open-source frameworks for handling streaming data.
Who This Book Is For
Machine learning engineers and data science professionals
LanguageEnglish
PublisherApress
Release dateApr 9, 2021
ISBN9781484268674
Practical Machine Learning for Streaming Data with Python: Design, Develop, and Validate Online Learning Models

Related to Practical Machine Learning for Streaming Data with Python

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Practical Machine Learning for Streaming Data with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Practical Machine Learning for Streaming Data with Python - Sayan Putatunda

    © Sayan Putatunda 2021

    S. PutatundaPractical Machine Learning for Streaming Data with Pythonhttps://doi.org/10.1007/978-1-4842-6867-4_1

    1. An Introduction to Streaming Data

    Sayan Putatunda¹  

    (1)

    Bangalore, India

    This chapter introduces you to streaming data, its various challenges, some of its real-world business applications, various windowing techniques, and the concepts of incremental and online learning algorithms. The chapter also introduces the scikit-multiflow framework in Python and some streaming data generators.

    Please note that there are some other frameworks available in Python and for machine learning with streaming data. The scikit-multiflow framework is used in this book because I strongly believe that this package (given its wide range of implemented techniques and great documentation) is a good starting point for Python users to pick up online/incremental machine learning techniques for streaming data.

    Streaming Data

    The world has witnessed a data deluge in recent years. There has been a huge increase in the volume of data generated from various sources. The major data sources are the Internet, log data, sensor data, emails, RFID, POS transaction data, and so forth. The data gathered from these sources can be categorized into structured, semi-structured, and unstructured data. Recent technological advances are a major reason for this data explosion, making data storage cheaper and a continuous collection of data possible. Almost all companies in sectors such as retail, social media, and IT are facing a data explosion. They are trying to figure out ways to process and analyze the massive data that they are generating and gain actionable insights.

    Big Data

    Most of us have heard of the term big data . The McKinsey Global Institute define big data as datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze [1]. Big data’s basic characteristics can be defined with four words that begin with the letter V: volume, velocity, variety, and veracity. The big in big data is not only about the size or volume of the data—the other Vs also need to be considered.

    Advancements in technology have made the continuous collection of data possible. We are inundated with data in various daily transactions (e.g., POS transactions in retail outlets like Walmart, Target, etc.), sensor data, web data, social media data, stock prices, search queries, clickstream data, and so forth. This is all a source for high-velocity data; that is, streaming data. Other sources of continuous data streams include operational monitoring, online advertising, mobile data, and the Internet of Things (IoT).

    Streaming data, or data streams, are an infinite and continuous flow of data from a source that arrives at a very high speed. Thus, streaming data is a subset of big data that addresses the velocity aspect of big data. Some of the differentiating characteristics between streaming data and static data are the loosely structured, always on, and always flowing of data [2]. Unlike static data, structure is not rigidly defined in streaming data. It is always on; that is, data is always available because new data is continuously generated.

    As an example of how streaming data is harnessed in business, let’s look at online advertising. Social media giants like Facebook collect user behavior data in real time. Facebook has more than two billion active users every month [4], which gives you an idea of the scale of the data being collected.

    Facebook earns most of its revenue by showing ads to its users. Many advertisers are onboarded in the platform. Facebook uses user behavior data to select ads that are relevant to its users (i.e., ads that have the maximum chance of interaction in the form of clicks or purchase). They perform other important steps in online advertising (such as placing a bid in an auction). All of this happens in near real time (i.e., whenever you load your News Feed page in Facebook or interact with other apps, such as Messenger or Instagram).

    The Need to Process and Analyze Streaming Data

    Streaming data offers real time or near real-time analytics. Since it’s ever-producing and never-ending, storing this enormous data and then running analytics on it (as done in batch processing) is not feasible. Streaming data must be analyzed on the fly to gain insights in real time or near real time.

    Figure 1-1 shows the streaming data analytics process flow. High-velocity data from various sources (sensor data, stock-tick data, etc.) are ingested, processed, and analyzed by a streaming data analytics engine in real time or near real time. The output generated is displayed via dashboards, apps, or any other

    Enjoying the preview?
    Page 1 of 1