Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

Ebook831 pages14 hours

Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

Name: Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML
Author: Gourav Gupta
ISBN: 9789391392130

By Gourav Gupta, Dr. Manish Gupta and Dr. Inder Singh Gupta

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book provides the reader with an up-to-date explanation of Machine Learning and an in-depth, comprehensive, and straightforward understanding of the architectural techniques used to evaluate and anticipate the futuristic insights of data using Apache Spark.

The book walks readers by setting up Hadoop and Spark installations on-premises, Docker, and AWS. Readers will learn about Spark MLib and how to utilize it in supervised and unsupervised machine learning scenarios. With the help of Spark, some of the most prominent technologies, such as natural language processing and computer vision, are evaluated and demonstrated in a realistic setting. Using the capabilities of Apache Spark, this book discusses the fundamental components that underlie each of these natural language processing, computer vision, and machine learning technologies, as well as how you can incorporate these technologies into your business processes.

Towards the end of the book, readers will learn about several deep learning frameworks, such as TensorFlow and PyTorch. Readers will also learn to execute distributed processing of deep learning problems using the Spark programming language.

Skip carousel

Intelligence (AI) & Semantics

LanguageEnglish

PublisherBPB Online LLP

Release dateApr 28, 2022

ISBN9789391392130

Author

Gourav Gupta

Related authors

Skip carousel

Related to Practical Machine Learning with Spark

Related ebooks

Skip carousel

Operationalizing Machine Learning Pipelines: Building Reusable and Reproducible Machine Learning Pipelines Using MLOps
Ebook
Operationalizing Machine Learning Pipelines: Building Reusable and Reproducible Machine Learning Pipelines Using MLOps
byVishwajyoti Pandey
Rating: 0 out of 5 stars
0 ratings
Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
Ebook
Applied Machine Learning Solutions with Python: Production-ready ML Projects Using Cutting-edge Libraries and Powerful Statistical Techniques (English Edition)
bySiddhanta Bhatta
Rating: 0 out of 5 stars
0 ratings
Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow
Ebook
Mastering Machine Learning on AWS: Advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow
byDr. Saket S.R. Mengle
Rating: 0 out of 5 stars
0 ratings
Hands-on Supervised Learning with Python
Ebook
Hands-on Supervised Learning with Python
byMadeleine Shang
Rating: 0 out of 5 stars
0 ratings
Designing Machine Learning Systems with Python
Ebook
Designing Machine Learning Systems with Python
byDavid Julian
Rating: 0 out of 5 stars
0 ratings
Up and Running Google AutoML and AI Platform
Ebook
Up and Running Google AutoML and AI Platform
byAmit Agrawal
Rating: 0 out of 5 stars
0 ratings
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next
Ebook
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next
byRupam Kumar Sharma
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition)
Ebook
Practical Data Science with Jupyter: Explore Data Cleaning, Pre-processing, Data Wrangling, Feature Engineering and Machine Learning using Python and Jupyter (English Edition)
byPrateek Gupta
Rating: 0 out of 5 stars
0 ratings
Beginning with Machine Learning: The Ultimate Introduction to Machine Learning, Deep Learning, Scikit-learn, and TensorFlow (English Edition)
Ebook
Beginning with Machine Learning: The Ultimate Introduction to Machine Learning, Deep Learning, Scikit-learn, and TensorFlow (English Edition)
byDr. Amit Dua
Rating: 0 out of 5 stars
0 ratings
PyTorch Recipes: A Problem-Solution Approach
Ebook
PyTorch Recipes: A Problem-Solution Approach
byPradeepta Mishra
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)
Ebook
Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)
byShekhar Khandelwal
Rating: 0 out of 5 stars
0 ratings
Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)
Ebook
Getting started with Deep Learning for Natural Language Processing: Learn how to build NLP applications with Deep Learning (English Edition)
bySunil Patel
Rating: 0 out of 5 stars
0 ratings
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Ebook
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
bySanket Subhash Khandare
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
Ebook
Machine Learning for Beginners: Learn to Build Machine Learning Systems Using Python (English Edition)
byHarsh Bhasin
Rating: 0 out of 5 stars
0 ratings
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions
Ebook
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions
byIvan Gridin
Rating: 0 out of 5 stars
0 ratings
Machine Learning Cookbook with Python: Create ML and Data Analytics Projects Using Some Amazing Open Datasets (English Edition)
Ebook
Machine Learning Cookbook with Python: Create ML and Data Analytics Projects Using Some Amazing Open Datasets (English Edition)
byRehan Guha
Rating: 0 out of 5 stars
0 ratings
Fun with Machine Learning: Simplify the Data Science process by automating repetitive and complex tasks using AutoML (English Edition)
Ebook
Fun with Machine Learning: Simplify the Data Science process by automating repetitive and complex tasks using AutoML (English Edition)
byArockia Liborious
Rating: 0 out of 5 stars
0 ratings
Essentials of Deep Learning and AI: Experience Unsupervised Learning, Autoencoders, Feature Engineering, and Time Series Analysis with TensorFlow, Keras, and scikit-learn
Ebook
Essentials of Deep Learning and AI: Experience Unsupervised Learning, Autoencoders, Feature Engineering, and Time Series Analysis with TensorFlow, Keras, and scikit-learn
byShashidhar Soppin
Rating: 0 out of 5 stars
0 ratings
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
Ebook
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
byYoon Hyup Hwang
Rating: 5 out of 5 stars
5/5
Hands-on ML Projects with OpenCV: Master computer vision and Machine Learning using OpenCV and Python
Ebook
Hands-on ML Projects with OpenCV: Master computer vision and Machine Learning using OpenCV and Python
byMugesh S.
Rating: 0 out of 5 stars
0 ratings
Real-Time Streaming with Apache Kafka, Spark, and Storm: Create Platforms That Can Quickly Crunch Data and Deliver Real-Time Analytics to Users
Ebook
Real-Time Streaming with Apache Kafka, Spark, and Storm: Create Platforms That Can Quickly Crunch Data and Deliver Real-Time Analytics to Users
byBrindha Priyadarshini Jeyaraman
Rating: 0 out of 5 stars
0 ratings
Frank Kane's Taming Big Data with Apache Spark and Python
Ebook
Frank Kane's Taming Big Data with Apache Spark and Python
byFrank Kane
Rating: 0 out of 5 stars
0 ratings
Learning Apache Spark 2
Ebook
Learning Apache Spark 2
byMuhammad Asif Abbasi
Rating: 0 out of 5 stars
0 ratings
Combining DataOps, MLOps and DevOps: Outperform Analytics and Software Development with Expert Practices on Process Optimization and Automation
Ebook
Combining DataOps, MLOps and DevOps: Outperform Analytics and Software Development with Expert Practices on Process Optimization and Automation
byDr. Kalpesh Parikh
Rating: 0 out of 5 stars
0 ratings
Fast Data Processing with Spark 2 - Third Edition
Ebook
Fast Data Processing with Spark 2 - Third Edition
byKrishna Sankar
Rating: 0 out of 5 stars
0 ratings
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
Ebook
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
byNasir Ali Mirza
Rating: 0 out of 5 stars
0 ratings
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Ebook
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
byAlok Kumar
Rating: 0 out of 5 stars
0 ratings
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
Ebook
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
byPremkumar Rangarajan
Rating: 0 out of 5 stars
0 ratings
Machine Learning with Spark - Second Edition
Ebook
Machine Learning with Spark - Second Edition
byNick Pentreath
Rating: 0 out of 5 stars
0 ratings
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Making The Open Data Lakehouse Affordable Without The Overhead At Iomete: An interview with Vusal Dadalov about the Iomete platform and how they are building a managed data lakehouse using open technologies and formats without the overhead of running it yourself or paying more than if you hosted it yourself.
Podcast episode
Making The Open Data Lakehouse Affordable Without The Overhead At Iomete: An interview with Vusal Dadalov about the Iomete platform and how they are building a managed data lakehouse using open technologies and formats without the overhead of running it yourself or paying more than if you hosted it yourself.
byData Engineering Podcast
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Serverless Event-Driven Architecture with Danilo Poccia: In an event driven application, each component of application logic emits events, which other parts of the application respond to. We have examined this pattern in previous shows that focus on pub/sub messaging, event sourcing, and CQRS.
Podcast episode
Serverless Event-Driven Architecture with Danilo Poccia: In an event driven application, each component of application logic emits events, which other parts of the application respond to. We have examined this pattern in previous shows that focus on pub/sub messaging, event sourcing, and CQRS.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
Podcast episode
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
byData Skeptic
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
#623: API Modernization Strategies Episode 2: AWS AppSync is a serverless GraphQL and Pub/Sub's APIs that simplify application development through
Podcast episode
#623: API Modernization Strategies Episode 2: AWS AppSync is a serverless GraphQL and Pub/Sub's APIs that simplify application development through
byAWS Podcast
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60: Tackling Apache Spark From The Data Engineer's Perspective (Interview)
Podcast episode
Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60: Tackling Apache Spark From The Data Engineer's Perspective (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
#71 - Strategic Monoliths and Microservices - Vaughn Vernon
Podcast episode
#71 - Strategic Monoliths and Microservices - Vaughn Vernon
byTech Lead Journal
0 ratings
0% found this document useful
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
Podcast episode
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
byData Engineering Podcast
0 ratings
0% found this document useful
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
Podcast episode
The Undocumented Web: scraping, private APIs, proxies and “alternative solutions”: What is the undocumented web? Scott and Wes dive into it, discussing APIs, faking, scraping, automation, proxies as well as tips and tricks for best practices. Kyle Prinsloo’s Freelancing & Beyond — Sponsor Kyle Prinsloo teaches you everything...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
MLA 021 Databricks: Discussing Databricks with Ming Chang from (part of )
Podcast episode
MLA 021 Databricks: Discussing Databricks with Ming Chang from (part of )
byMachine Learning Guide
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
Podcast episode
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
byDevOps and Docker Talk: Cloud Native Interviews and Tooling
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
#76 - Learning Domain-Driven Design - Vladik Khononov
Podcast episode
#76 - Learning Domain-Driven Design - Vladik Khononov
byTech Lead Journal
0 ratings
0% found this document useful
Introduction to Data Mesh
Podcast episode
Introduction to Data Mesh
byThe Cloudcast
0 ratings
0% found this document useful
Design Patterns – Podcast S08 E03: Joshua Greene and Jay Strawn, the authors of "Design Patterns by Tutorials", join us to talk about different Design Patterns and SOLID.
Podcast episode
Design Patterns – Podcast S08 E03: Joshua Greene and Jay Strawn, the authors of "Design Patterns by Tutorials", join us to talk about different Design Patterns and SOLID.
byThe Kodeco Podcast: For App Developers and Gamers
0 ratings
0% found this document useful
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
Podcast episode
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
Podcast episode
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
108: PySpark - Jonathan Rioux: Apache Spark is a unified analytics engine for large-scale data processing. PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
Podcast episode
108: PySpark - Jonathan Rioux: Apache Spark is a unified analytics engine for large-scale data processing. PySpark blends the powerful Spark big data processing engine with the Python programming language to provide a data analysis platform that can scale up for nearly any task.
byTest and Code
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Episode 161: Trapped as a QA engineer and trapped as a generalist
Podcast episode
Episode 161: Trapped as a QA engineer and trapped as a generalist
bySoft Skills Engineering
0 ratings
0% found this document useful

Skip carousel

What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
Tensor Flow 101
APC
Article
Tensor Flow 101
Jan 27, 2020
4 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
AWS Vs Azure What’s The Difference?
PC Pro Magazine
Article
AWS Vs Azure What’s The Difference?
Sep 11, 2022
7 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
KAFKA Build Utilities With The Kafka Server
Linux Format
Article
KAFKA Build Utilities With The Kafka Server
Jul 2, 2019
Nowadays, quite a few data architectures involve both a database and Apache Kafka, which is a distributed streaming platform and the subject of this tutorial. You can also find Kafka described as a publish-subscribe message system, which is a fancy w
7 min read
Artificial Empathy: The Last Step Of Humanizing Machines
Techfastly
Article
Artificial Empathy: The Last Step Of Humanizing Machines
Jul 1, 2021
1 min read
Fact-check And Verify Information
Post South Africa
Article
Fact-check And Verify Information
Mar 13, 2024
Q: What is AI? A: AI is the acronym for artificial intelligence (AI) and refers to the development of computer systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-maki
3 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
AppleMagazine
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 29, 2024
4 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
TechLife News
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 30, 2024
4 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Forward Thinkers
Vogue Australia
Article
Forward Thinkers
Oct 22, 2022
6 min read
You Won’t Believe How Well This Algorithm Spots Clickbait
Futurity
Article
You Won’t Believe How Well This Algorithm Spots Clickbait
Aug 29, 2019
3 min read
Ideas Lab
K-Zone
Article
Ideas Lab
Oct 10, 2021
Meet Rashina Hoda, a software engineering researcher who studies how software engineers develop the software products we all love! K-Z : Hi Rashina! What do you do in your role at Monash University? R: As Associate Professor of Software Engineeri
2 min read
Is AI Making Us A Dumber Species ?
Saturday Star
Article
Is AI Making Us A Dumber Species ?
Jul 1, 2023
6 min read
ChatGPT: Personal Tutor Or 'Cheat-bot'? The App That Could Revolutionise Asia's Learning
This Week in Asia
Article
ChatGPT: Personal Tutor Or 'Cheat-bot'? The App That Could Revolutionise Asia's Learning
Jan 7, 2023
Indian engineering student Pranav says he is a numbers guy whose study gets bogged down in the legwork of essay writing. Then in early December, he encountered ChatGPT - the viral intelligent chatbot which has emerged from San Francisco - and began t
5 min read
The Future Is Now
Palm Beach Illustrated
Article
The Future Is Now
Aug 19, 2019
5 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
AppleMagazine
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 28, 2023
4 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
TechLife News
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 29, 2023
4 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
What Do Academics Think?
The Big Issue Magazine
Article
What Do Academics Think?
May 19, 2023
3 min read
Opinion: Why Brain Decoding Is Not Mind Reading — And Why That Matters
STAT
Article
Opinion: Why Brain Decoding Is Not Mind Reading — And Why That Matters
Jun 8, 2023
1 min read
Tech Tutor Exponential Technologies Are Changing
Business Today
Article
Tech Tutor Exponential Technologies Are Changing
Mar 5, 2020
8 min read
Future Innovators
Vogue Australia
Article
Future Innovators
Nov 28, 2021
6 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read

Related categories

Skip carousel

Reviews for Practical Machine Learning with Spark

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Practical Machine Learning with Spark - Gourav Gupta

CHAPTER 1

Introduction to Machine Learning

Field of study that gives computers the capability to learn without being explicitly programmed.

— Arthur Samuel

Introduction

Since the last two decades, there has been an incessant enhancement towards the vertical of Artificial Intelligence (AI) and its related sub-branches such as Machine Learning (ML), Statistical Modelling (SM), and Deep Learning (DL). These aforementioned technologies leverage many applications in the amelioration of people’s life and their day-to-day needs in various domains such as bioinformatics, radiology, agriculture, finance, astronomy, banking, healthcare, geo-informatics, seismology, and space exploration. ML extends the core functionality to push-up the capability of manual operations and machine to automatically learn by understanding and observing the key historical experiences. The main objective of this book is to educate the readers about the fundamental, advancement, and real-life applications of ML using a distributed framework. Furthermore, this chapter gives an in-depth knowledge about the journey of AI and the taxonomy of AI. Indeed, the term AI refers to a mimic prototype to imitate intelligent behaviors by understanding the meaningful information, patterns, or inputs. For example, self-driving cars use the concept of AI, especially a vision-based technology for teaching the AI model to make insightful decisions by mimicking and understanding the intelligent behaviors or inputs; these kinds of models are ideal examples of AI. The report shared by Gartner in 2019 depicts that the Intelligent System (IS) and its related verticals will become a big epic-center and most decisive emerging technology in the coming years. In future, almost every tedious problem will be resolved with the help of AI and ML. Across the globe it becomes a subject of interest among researchers, data scientists, data analysts, industrial experts, and academicians for mitigating the herculean real-time problems using AI. Also, this chapter shows the rigorous knowledge about the evolution of ML, types of ML, and its emerging applications with their futuristic scope. In addition, a compendious discussion on DL in connection with AI applications have been embossed in this chapter.

Structure

In this chapter, we will discuss the following topics:

Evolution of machine learning

Fundamentals and definition of machine learning

Types of machine learning algorithms

Application of machine learning

Future of machine learning

Objectives

After studying this chapter, readers will be able to:

Learn about the history of machine learning.

Get an understanding of the modern definition of machine learning.

Grasp the knowledge of different types of machine learning and its algorithm.

Understand the application of machine learning in various fields.

Know the future scope of machine learning.

Evolution of Machine Learning

The origin of both technologies AI and ML are interconnected. Hence, for the solid foundation of the readers, detailed history of ML and AI is presented in this section. However, the primary objective of this book is to make the readers conversant with the practical real-time scenario of ML with Apache Spark.

The term ‘Machine Learning’ first came into existence in 1952 after the distinguished work by an American engineer Arthur Samuel. Starting from 1949 to late 1968, he did the pioneering research to learn a computer by applying some instructions into it for making a self-decision. Initially in 1950s, he developed an alpha beta pruning program using a scoring function for measuring winning chances of two-player games like chess, on computers with limited memory. Next, he proposed the minimax algorithm based on the minimax strategy concept along with numerous mechanisms named as rotelearning to make his program better. In 1952, Samuel was the first to introduce the term Machine Learning. Thereafter, in 1957 Frank Rosenblatt from Cornell Aeronautical Laboratory merged the Donald Hebb’s model of a brain cell with Samuel’s machine learning concept to design the first neural network named perceptron for computers. The Perceptron algorithm was first installed in a machine named Mark 1 perceptron based on IBM704 hardware. It was used for image reconstruction applications and still had some limitations in recognition of the faces patterns.

In 1960s, the new trail was introduced using multi-layers in the neural network [NN], there by providing enhanced capability to solve complex algorithms and provide better precision. After this multi-layer theory, many new capabilities were opened to further improve the neural network learning through the feedforward propagation and back propagation neural networks.

In 1967, the nearest neighbor algorithm came in existence for the basic pattern recognition application for finding the more efficient route for traveling sales persons. In 1970, the back propagation algorithm was developed to adjust the network with hidden layers of neurons for minimizing errors. This algorithm was used to train Deep Neural Network (DNN).

During the 70s and 80s, AI researchers and computer scientists worked together on neural network research, while some of the researchers and engineers started working in ML as a new trail. By the early 1980s, ML and AI took separate paths. AI mainly focused on using logical and knowledge-based approaches while ML focused on neural networks-based algorithms.

In 1990s, ML reached its peak because of availability of large data shared by the Internet service. In 1990, Robert Schapire developed the Boosting Algorithm for ML to minimize the bias during supervise learning with ML algorithms for boosting weak learners. In this, a set of weak learners create a single strong learner and is defined as classifiers that are correlated with true classification. It combines many simple models (weak learners) to generate the result. There are many types of boosting algorithms such as, AdaBoost, BrownBoost, LPBoost, MadaBoost, TotalBoost, xqBoost, and LogitBoost, and AnyBoost. A detailed study on various types of boosting algorithms have been discussed later in this chapter.

Next, in 1996, the IBM Company won the first game against the world champion Garry Kasparov by developing Deep Blue, a chess-playing computer. The Deep Blue computer used custom build Very Large-Scale Integration (VLSI) chips for executing the Alpha-Beta algorithm. In 1997, Jurgen Schmidhuber and Sepp Hochreiter designed the neural network model named Long Short-Term Memory (LSTM) for speech recognition training. LSTM consists of cells, input, and output gates and was used for eliminating the gradient problem. In 2006, Face Recognition Algorithms were tested for 3D face scans, face images, and iris images and which was more accurate than the earlier facial recognition algorithms.

In the same year, the Canadian computer scientist Geoffrey Hinton introduced the term Deep Learning (DL) and developed a fast and greedy unsupervised learning algorithm for distinguishing the text and objects in the digital images and videos.

In 2011, the deep learning artificial intelligence research team at Google also known as Google Brain developed a large-scale deep learning software system named as DistBelief for learning and categorizing the object in a similar way as a person does. After a year, the Google X team developed ML algorithms containing 16,000 clusters for automatically identifying the cat digital images from YouTube videos.

In 2014, the Facebook research team came up with a facial recognition system known as DeepFace for recognizing human faces in digital images using DL. In 2015, Microsoft developed the ML toolkit for distributed resolution ML problems across multiple computers. In 2016, the Google DeepMind team developed AlphaGo for solving most complex board game problems.

Next in 2017, Google released Google Brain’s second-generation system known as the TensorFlow version 1.0.0 for a single device that can run on both Central Processing Unit (CPU) and Graphics Processing Unit (GPU) for general purpose computing. Recently, Google released the TensorFlow version named TensorFlow.js version 1.0 for ML in JavaScript, TensorFlow 2.0, and TensorFlow Graphics for DL in computer graphics in 2018 and 2019, respectively.

Fundamentals and Definition of Machine Learning

This section focuses on creating a solid foundation of ML starting from its initial definition to its modern definition along with basic terminologies which are essential for grasping the fundamentals of ML. As discussed previously, ML has been adapting and expanding its functionalities in every automation related jobs, so the authors here have put the extra attention towards the core and rational concepts to strengthen the core knowledge of readers on ML. Also, it is necessary to walk through the journey of ML consisting of its importance, the traditional and modern approaches to train a machine or a model for training, validating, and testing of the dataset. This book helps the readers to update them about the real-time challenges and their respective solutions being used in the Intelligence and Analytics-based organizations.

Figure 1.1 depicts the branches of Artificial Intelligence such as Machine Learning, Neural Network, and Deep Learning. In ML, it takes the help of different types of learning concepts such as Supervised Learning (SL), Semi-Supervised Learning (SSL), Unsupervised Learning (USL), and Reinforcement Learning (RL).

Figure 1.1: Artificial Intelligence with its derived technologies

In NN, a special collection of algorithms is used for training, validating, and testing the patterns or inputs by leveraging the ideation of artificial neurons that work a like neurons of a human brain. For example, the conversion of voice-to-text uses the NN as a backbone. Amazon Alexa, Apple Siri, and Google Home are usually known as an ideal application of Smart Personal Assistants. On the flip side, the term DL represents the conglomeration of two or more hidden layers for processing the complex problems with high precision. Generally, DL is like NN, but the only difference is that DL is an easy customization for the complex neural architecture and extends the ease to handle the cumbersome model. These days, there are various DL and NN frameworks available to get on-spot flavor of the initial analytic platform such as Keras, Caffe, and TensorFlow.

In the following section, the reader will elicit about the basic terminologies which are essential to understand the concepts of ML:

Features or Attributes or Variables: These are the unique key measurable characteristics of data to be fed into the system for training and testing a model. For ML algorithms, these features are used as inputs or outputs. For recognizing the face of a human being, the associated features such as gender, age, height, lip shape, face shape, and color, so on are to be used as the decisive attributes.

Featured Vector or Tuple: It is a group of important features which are listed in a vector or tuple format for training a model.

Model: A specific representation learned from data using the ML algorithm. There are three types of models in ML named as Supervised, Unsupervised, and Reinforcement models. It consists of three important phases such as training, validating, and testing of a model.

Dataset: A set of information collected as rows or instances. The model needs a dataset for performing the training and testing phase; hence, the model is unable to train without the dataset or input database.

Dimension: A subset of features used to define the property of data. The dimension helps to provide the detailed information about the data for better understanding.

Target (Label): It is the value to be predicted by training a model. In face recognition and gender classification problem, the label with each set of input would be the men and women.

Training Dataset / Validating Dataset: It is initial dataset used to train, validate, and develop the model. Subsequently, the developed model will then map the new data to further train the model.

Testing Dataset / Evaluation Dataset: It is the final data set used for verification of the model. This is also called the test dataset. Some authors also refer to it as the golden or reference dataset.

Prediction: It is a result or output of a trained model by testing on the given inputs or patterns.

Performance Metrics: It is used to calculate the accuracy of the prediction model using precision, recall, accuracy, and Intersection over Union (IoU).

Information: It is collection of datasets such as videos, texts, and images which need to be used to interpretate and manipulate the training dataset for providing some meaningful information.

Unlabeled Data: This is the raw form of the data which may consist of video streams, audio, images, and so on in the irregular patterns or unarranged manner.

Classifier: It helps to classify the classes of the predicted output. For example, classification of different livestock’s such as Cows, Cats, and Horses from an image.

Pattern: Pattern is a way to understand features of any dataset and images. Pattern is known as a features extractor through which a similar object or dataset can be identified.

Class: Class is used to define the details of any grouped objects/labels. If an image has both fruits and vegetables, it means image is classified into two classes, one each for vegetables and fruits.

After knowing the basic terminologies of ML, readers must learn about the basic processing flow in the traditional programming language and ML algorithms. Figure 1.2 and Figure 1.3 represent the traditional programming language approach and Machine Language approach.

Figure 1.2: Block diagram of the working of the traditional programming language (top) and machine learning (bottom)

In traditional programming, the reader configures the machine according to the input and produces a desired output or result based on the logic of the algorithm. Let’s take an assumption, if a human being instructs a computer or any other programming machine about what to do, at that instance, readers need a programming language that allows a machine to learn and make the action accordingly. Further, it also gives the ability to the machine by using the algorithms for making the decision, based on the logic or conditions.

On the other hand, in the ML approach or modern learning, the computer learns from their behaviors and historical patterns instead of being programmed to do a specific task. This type of learning is different from the traditional learning in which the computer needs to do what exactly we want it to do with the help self-learning. Most of the programs are a series of instructions that is why there is a need to create software to bind the stringent boundary for performing a special task like transactions in the banking domain. But in traditional learning, the readers need to clearly define and set the limits for doing something through a machine that is, if a person tries to withdraw money, that exceeds the balance in his account, then the transaction is cancelled. Readers pass explicit instruction to the banking programs that if you see X, then do Y. On the flip side, ML is different from traditional learning. In ML readers do not create detailed instructions; instead, they need to provide the meaning patterns from data or inputs or key features to the computer to study the problem and decide what it is asked to do. In this, the reader gives the capability to the computer to adapt, evaluate, and learn which is not much different from how a human learns.

Figure 1.2 shows the clear picture how a traditional programming language is different from the machine learning algorithm which is depicted in Figure 1.3. The main difference between a traditional programming language and ML algorithm is that in the traditional programming language, an input data is fed with a program logic which is run on the machine to produce the output. In case of the ML algorithm, we feed the input data along with the output which runs on the machine during training, and the machine creates its own program.

Let’s try to understand the term learning in simple language. If a machine is learning from its past experiences with respect to some task and improves its performances in a task with earlier experience.

The word ‘learning’ or ‘machine learning’ both are the same, so do not be confused. A good learning should address the following problem statement:

Should know the clear problem statement of what the learner should learn and what the requirement for learning is.

To clearly define what type of data is needed along with sources of the data.

Define if the learner should operate on the dataset entirely.

In ML, the process of the machine learning model starts with iterating the statistical algorithm on the training dataset. This procedure creates an ideal model which must be best fitted for getting a more accurate result. Each and every time, ML tries to improve the performance of the model by applying the known or refined patterns of historical experience.

Machine learning basically deals with two types of datasets. In the first type, the dataset is being prepared manually, that is, the input and expected output datasets are already available and prepared. In the second type of dataset, the input data is available, and the interest of a user is to predict the expected output. As we know, the available input dataset, which is further classified into training and testing dataset, needs to be derived into three phases such as training, validation, and testing. However, there is no hard and fast rule to check what percentage of data is trained, validated, or tested.

Let us see how machine learning works. It basically works in three phases as shown in Figure 1.3:

Figure 1.3: Workflow to develop ML model

Generally, there are three phases to be involved to create a full fledge ML pipeline which would do training, testing, and executing. These steps are used to generate the outcome from the testing dataset. Prior to moving towards ML phases, we must know the best way to prepare a dataset that needs to be fed into the training and testing phases. Generally, data scientists recommend that the dataset should be divided into the ratio of 70:30. Training must be done on 70% of the dataset and the rest needs to be fed into the testing phase. First, we need to understand the quality of the dataset, and accordingly the required manipulation and cleaning steps are applied on the dataset to make the dataset more refined and best-fit to the model. Then, the actual process needs to be started to train the model on the 70% of the dataset using appropriate ML algorithms. The resultant of the training phase needs to be applied on the 30% of the dataset to test the precision and recall the trained model. In the last phase, once we know the precision of the trained model on the tested dataset, the model will be integrated with the ML pipeline to work as an automatic workflow. Table 1.1 shows the main difference between AI and ML:

Table 1.1: Difference between AI and ML

Types of Machine Learning

Machine Learning has a wide domain and there are many types of ML as shown in Figure 1.4 in the analytic world. These are classified into broad categories based on the following criteria:

First criteria, whether the training dataset is trained or not with human supervision. On the basis of these criteria, ML is divided into four types, that is, Supervised Learning (SL), Unsupervised Learning (USL), Semi-Supervised Learning (SSL), and Reinforcement Learning (RL). Recently, ML experts have grouped these four learning into two learning categories, that is, Learning Problem (LP) and Hybrid Learning Problem (HLP). The SL, USL, and RL fall under the category of Learning Problem where as HLP involves SSL. SSL is further classified into Self-Supervised Learning (Self-SL) and Multi-Instance Learning (MIL).

In second criteria the traning dataset learnt incrementally on the basis of adhoc at ant frequency. ML is mainly divided into Online Learning (OL) and Batch Learning (BL). Some more types of ML also fall under this criterion which will cover in Chapter 5, Supervised Learning with Spark and Chapter 6, Unsupervised Learning with Spark.

Figure 1.4: Taxonomy of Machine Learning

Learning of Models Based on the First Criteria

In the following section, readers will start with the first criteria and take an eagle look of all types of learning. As discussed earlier, LP is classified into three main types, that is, Supervised Learning, Unsupervised Learning, and Reinforcement Learning.

Supervised Learning (SL)

SL is used when there is a precise mapping between input-output data. In this, the given model is trained on a labelled dataset. During the training period, the algorithm identifies the relationship between the two variables to predict a new outcome. This learning is task-oriented learning in which accuracy of the prediction is more dependent on number of tasks (number of rows). If we give more tasks, the model learns it efficiently to predict more accurate results. The most real time and general example of supervised learning is a spam filter. It is trained with different categories of emails along with their class (spam), and then it learns how to classify new emails.

Supervised learning is divided into two types:

Regression-based Supervised Learning (no labels defined)

Classification-based Supervised Learning (defined labels)

Regression

Regression is a supervised learning where the output has a continuous value. For example, Table 1.2 shows the dataset of real-time monitoring through a smart watch which serves the purpose of predicting the heartbeat and number of walking steps of a cricket player with respect to time. Here, time does not contain the discreate value, but it is continuous in the range. In this type, smaller the error greater is the accuracy of the regression model.

Table 1.2: Real-time data received from a smart watch

Regression consists of many algorithms which can predict the result based on the trained model, knowing the input and output patterns. In the upcoming chapters, readers will be exposed to all ML algorithms in depth. There are many types of regression algorithms as follows:

Linear Regression (LR)

Multi-Linear Regression (MLR)

Lasso Regression

Ridge Regression

Elastic-Net Regression

Generalized Linear Regression (GLR)

Isotonic Regression

Decision Tree Regression (DTR)

Random Forest Regression (RFR)

Gradient Boosting Tree Regression (GBTR)

Classification

In this type of supervised learning, the output is having a defined label in the discrete value. The main task of the classification is to predict the discrete value belongs to the class and evaluate based on accuracy. In this type of learning, it has two types of classes such as Binary or Multi class classification. In binary classification, a model can be able to predict either (0 or 1) or (yes or no). However, in multi class, a model can be able to predict more than one class. For example, Gmail classifies the email category more than one class such as social, promotion, updates, and so on. Classification also has many algorithms for prediction which are discussed as follows:

K-Nearest Neighbor (KNN)

Random Forest (RF)

Gradient Boosting (GB)

Support Vector Machine (SVM)

Naive Bayes Classifier

Logistic Regression

Multilayer Perceptron Classifier (MPLC)

One vs Rest Classifier / Multi-Classification Logistic Regression

Decision Tree Classification

Gradient Boosted Tree Classifier

Unsupervised Learning (USL)

In USL, the machine tries to learn without a supervisor or explicit agent. In this, the training data set is unlabeled; hence, the machine is restricted to find the hidden structure in unlabeled data by self. For example, if we have a group of live stocks that is, cows, dogs, cats, camels, and so on in the frame or image, which was not seen ever by the trained model/machine. Thus, the machine will have no idea about the feature of these individual animals and get confused while categorization. But, with the help of USL, the categorization becomes easy and can be possible by considering the similarities, differences, and patterns. USL is categorized into two types:

Clustering

Clustering is a technique for grouping the same set of objects or pattern in the same group based on some key attributes and parameters from the dataset. There are many types of clustering algorithms which are mentioned as follows. (Most of these will be covered in the upcoming Chapter 5 Supervised Learning with Spark and Chapter 6 Unsupervised Learning with Spark in detail.

K-Means

Bisecting K-means Algorithm (BKM)

Latent Dirichlet allocation (LDA)

Gaussian Mixture Model (GMM)

Table 1.3 shows the clear view between supervised and unsupervised learning:

Table 1.3: Difference between Supervised and Unsupervised Learning

Reinforcement Learning (RL)

In RL, there is no actual supervision to be used instead, a feedback system is provided which helps the machine to learn and make the decision on that observation. All this decision and result has been done through the smart self-learning system or reinforcement learning. It is more applicable with NN and a perfect example of RL is Google’s DeepMind AlphaGo Program.

There are several types which are as follows:

Q-Learning

Temporal-Difference Learning (TDL)

Deep Adversarial - Metric Learning

Hybrid Learning Problem (HLP)

As discussed earlier, HLP is classified into three main types, that is, Semi-Supervised Learning, Self-Supervised Learning, and Multi-Instance Learning.

Semi-Supervised Learning (SSL)

As we know that the labeling of data is a lengthy and costly process, but in this learning, we get some algorithms which will do automatic labeling over the dataset. Google’s Photo is the best example.

Self-Supervised Learning (Self-SL)

This learning requires unlabeled data for doing the pre-processing tasks, and then the output needs to be fed to the intelligent framework for precise analytics. Data augmentation and image rotation in Computer Vision is an example to show the characteristics of self-supervised learning.

Multi-Instance Learning (MIP)

Enjoying the preview?

Page 1 of 1

Practical Machine Learning with Spark: Uncover Apache Spark’s Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

About this ebook

Gourav Gupta

Related authors

Related to Practical Machine Learning with Spark

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Practical Machine Learning with Spark

What did you think?

Book preview

Practical Machine Learning with Spark - Gourav Gupta

Introduction

Structure

Objectives

Evolution of Machine Learning

Fundamentals and Definition of Machine Learning

Types of Machine Learning

Supervised Learning (SL)

Regression

Classification

Unsupervised Learning (USL)

Clustering

Reinforcement Learning (RL)

Hybrid Learning Problem (HLP)

Semi-Supervised Learning (SSL)

Self-Supervised Learning (Self-SL)

Multi-Instance Learning (MIP)