Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure

Ebook369 pages2 hours

Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure

Name: Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure
Author: Sridhar Alla
ISBN: 9781484265499

By Sridhar Alla and Suman Kalyan Adari

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Integrate MLOps principles into existing or future projects using MLFlow, operationalize your models, and deploy them in AWS SageMaker, Google Cloud, and Microsoft Azure. This book guides you through the process of data analysis, model construction, and training.
The authors begin by introducing you to basic data analysis on a credit card data set and teach you how to analyze the features and their relationships to the target variable. You will learn how to build logistic regression models in scikit-learn and PySpark, and you will go through the process of hyperparameter tuning with a validation data set. You will explore three different deployment setups of machine learning models with varying levels of automation to help you better understand MLOps. MLFlow is covered and you will explore how to integrate MLOps into your existing code, allowing you to easily track metrics, parameters, graphs, and models. You will be guided through the process of deploying and querying your models with AWS SageMaker, Google Cloud, and Microsoft Azure. And you will learn how to integrate your MLOps setups using Databricks.

What You Will Learn

Perform basic data analysis and construct models in scikit-learn and PySpark
Train, test, and validate your models (hyperparameter tuning)
Know what MLOps is and what an ideal MLOps setup looks like
Easily integrate MLFlow into your existing or future projects
Deploy your models and perform predictions with them on the cloud

Who This Book Is For
Data scientists and machine learning engineers who want to learn MLOps and know how to operationalize their models

Skip carousel

LanguageEnglish

PublisherApress

Release dateDec 7, 2020

ISBN9781484265499

Author

Sridhar Alla

Related to Beginning MLOps with MLFlow

Related ebooks

Skip carousel

Building REST APIs with Flask: Create Python Web Services with MySQL
Ebook
Building REST APIs with Flask: Create Python Web Services with MySQL
byKunal Relan
Rating: 0 out of 5 stars
0 ratings
Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
Ebook
Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
byPramod Singh
Rating: 0 out of 5 stars
0 ratings
Practical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud
Ebook
Practical Azure SQL Database for Modern Developers: Building Applications in the Microsoft Cloud
byDavide Mauri
Rating: 0 out of 5 stars
0 ratings
Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud
Ebook
Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud
byManuel Amunategui
Rating: 0 out of 5 stars
0 ratings
Real-time Analytics with Storm and Cassandra
Ebook
Real-time Analytics with Storm and Cassandra
byShilpi Saxena
Rating: 0 out of 5 stars
0 ratings
Mastering MLOps Architecture: From Code to Deployment: Manage the production cycle of continual learning ML models with MLOps (English Edition)
Ebook
Mastering MLOps Architecture: From Code to Deployment: Manage the production cycle of continual learning ML models with MLOps (English Edition)
byRaman Jhajj
Rating: 0 out of 5 stars
0 ratings
Machine Learning Systems: Designs that scale
Ebook
Machine Learning Systems: Designs that scale
byJeffrey Smith
Rating: 0 out of 5 stars
0 ratings
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
Ebook
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
byThurupathan Vijayakumar
Rating: 0 out of 5 stars
0 ratings
Big Data Engineer A Complete Guide - 2021 Edition
Ebook
Big Data Engineer A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Systems Thinkers
Ebook
Systems Thinkers
byMagnus Ramage
Rating: 3 out of 5 stars
3/5
Spark SQL A Complete Guide
Ebook
Spark SQL A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Supplier Relationship Management: How to Maximize Vendor Value and Opportunity
Ebook
Supplier Relationship Management: How to Maximize Vendor Value and Opportunity
byStephen Easton
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Numerical Applications with SAS
Ebook
Deep Learning for Numerical Applications with SAS
byHenry Bequet
Rating: 0 out of 5 stars
0 ratings
Data Wrangling A Complete Guide - 2019 Edition
Ebook
Data Wrangling A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
Ebook
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
byTaweh Beysolow II
Rating: 0 out of 5 stars
0 ratings
Learning Python with Raspberry Pi
Ebook
Learning Python with Raspberry Pi
byAlex Bradbury
Rating: 0 out of 5 stars
0 ratings
Learn R By Coding
Ebook
Learn R By Coding
byThomas Kurnicki
Rating: 0 out of 5 stars
0 ratings
PostgreSQL Server Programming - Second Edition
Ebook
PostgreSQL Server Programming - Second Edition
byHannu Krosing
Rating: 0 out of 5 stars
0 ratings
Julia for Data Analysis
Ebook
Julia for Data Analysis
byBogumil Bogumil
Rating: 0 out of 5 stars
0 ratings
Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization
Ebook
Advanced R Statistical Programming and Data Models: Analysis, Machine Learning, and Visualization
byMatt Wiley
Rating: 0 out of 5 stars
0 ratings
The Chief Data Officer Management Handbook: Set Up and Run an Organization’s Data Supply Chain
Ebook
The Chief Data Officer Management Handbook: Set Up and Run an Organization’s Data Supply Chain
byMartin Treder
Rating: 0 out of 5 stars
0 ratings
Decision Tree A Complete Guide - 2021 Edition
Ebook
Decision Tree A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
The New Know: Innovation Powered by Analytics
Ebook
The New Know: Innovation Powered by Analytics
byThornton May
Rating: 0 out of 5 stars
0 ratings
Apache Solr Search Patterns
Ebook
Apache Solr Search Patterns
byJayant Kumar
Rating: 0 out of 5 stars
0 ratings
Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques
Ebook
Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques
byB V Vishwas
Rating: 5 out of 5 stars
5/5
Natural Language Processing with Java
Ebook
Natural Language Processing with Java
byRichard M Reese
Rating: 0 out of 5 stars
0 ratings
Pattern Recognition and Artificial Intelligence
Ebook
Pattern Recognition and Artificial Intelligence
byC.H. Chen
Rating: 0 out of 5 stars
0 ratings
Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications
Ebook
Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications
byVineeth Balasubramanian
Rating: 0 out of 5 stars
0 ratings
Learning Azure DocumentDB
Ebook
Learning Azure DocumentDB
byBecker Riccardo
Rating: 0 out of 5 stars
0 ratings
Topics in Expert System Design: Methodologies and Tools
Ebook
Topics in Expert System Design: Methodologies and Tools
byElsevier Books Reference
Rating: 5 out of 5 stars
5/5

Intelligence (AI) & Semantics For You

Skip carousel

101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C. Lennox
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
ChatGPT: The Future of Intelligent Conversation
Ebook
ChatGPT: The Future of Intelligent Conversation
byCea West
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The Exponential Age: How Accelerating Technology is Transforming Business, Politics and Society
Ebook
The Exponential Age: How Accelerating Technology is Transforming Business, Politics and Society
byAzeem Azhar
Rating: 5 out of 5 stars
5/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
The Age of AI: Artificial Intelligence and the Future of Humanity
Ebook
The Age of AI: Artificial Intelligence and the Future of Humanity
byJason Thacker
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
Ebook
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
byS M Howard
Rating: 4 out of 5 stars
4/5
ChatGPT
Ebook
ChatGPT
byRobert Conway
Rating: 1 out of 5 stars
1/5
TensorFlow in 1 Day: Make your own Neural Network
Ebook
TensorFlow in 1 Day: Make your own Neural Network
byKrishna Rungta
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
Podcast episode
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
byScreaming in the Cloud
0 ratings
0% found this document useful
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
Podcast episode
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
byMachine Learning Cafe
0 ratings
0% found this document useful
Building A Knowledge Graph From Public Data At Enigma With Chris Groskopf - Episode 50: The Data Engineering Behind A Real-World Knowledge Graph (Interview)
Podcast episode
Building A Knowledge Graph From Public Data At Enigma With Chris Groskopf - Episode 50: The Data Engineering Behind A Real-World Knowledge Graph (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
TIP608: Long-Term Compounding w/ Chris Mayer: Clay chats with Chris Mayer about long-term compounding and a few of his holdings, including Constellation Software.
Podcast episode
TIP608: Long-Term Compounding w/ Chris Mayer: Clay chats with Chris Mayer about long-term compounding and a few of his holdings, including Constellation Software.
byWe Study Billionaires - The Investor’s Podcast Network
0 ratings
0% found this document useful
Building Tools And Platforms For Data Analytics - Episode 95: An interview on what data engineers need to know about building tools and platforms for data analytics
Podcast episode
Building Tools And Platforms For Data Analytics - Episode 95: An interview on what data engineers need to know about building tools and platforms for data analytics
byData Engineering Podcast
0 ratings
0% found this document useful
044 | Tamara Munzner
Podcast episode
044 | Tamara Munzner
byData Stories
0 ratings
0% found this document useful
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
Podcast episode
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
byLinear Digressions
0 ratings
0% found this document useful
The mathematics of machine learning: with Tivadar Danka
Podcast episode
The mathematics of machine learning: with Tivadar Danka
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
Podcast episode
Beam and Spark with Holden Karau: This week our colleague, Holden Karau, joins us to talk about Spark and Beam.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
Longer Dialogues: Upper Beginner Dutch S1 #13 - Eating French Fries in the Netherlands: learn how to talk about fast food
Podcast episode
Longer Dialogues: Upper Beginner Dutch S1 #13 - Eating French Fries in the Netherlands: learn how to talk about fast food
byLearn Dutch | DutchPod101.com
0 ratings
0% found this document useful
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
Podcast episode
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
byData Engineering Podcast
100%
100% found this document useful
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
Podcast episode
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
byData Engineering Podcast
0 ratings
0% found this document useful
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
Podcast episode
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
126 | FlowingData with Nathan Yau
Podcast episode
126 | FlowingData with Nathan Yau
byData Stories
0 ratings
0% found this document useful
Longer Dialogues: Upper Beginner Dutch S1 #3 - A Visit From Your Dutch Friend: learn how to describe the positioning of objects, which of course involves using prepositions using an informal conversation as an example
Podcast episode
Longer Dialogues: Upper Beginner Dutch S1 #3 - A Visit From Your Dutch Friend: learn how to describe the positioning of objects, which of course involves using prepositions using an informal conversation as an example
byLearn Dutch | DutchPod101.com
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Bayesian A/B Testing: Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statistics on stock markets. For the last two years he's been the team lead of data science at Shopify. He's...
Podcast episode
Bayesian A/B Testing: Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statistics on stock markets. For the last two years he's been the team lead of data science at Shopify. He's...
byData Skeptic
100%
100% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Serverless Code with Ryan Scott Brown: The unit of computation has evolved from on premise servers to virtual machines in the cloud to containers running in those virtual machines. Serverless computation is another stage in the evolution of computational unit management.
Podcast episode
Serverless Code with Ryan Scott Brown: The unit of computation has evolved from on premise servers to virtual machines in the cloud to containers running in those virtual machines. Serverless computation is another stage in the evolution of computational unit management.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Shoreline: Fleet Automation with Anurag Gupta: In today’s containerized world, it’s common to encounter similar issues with known solutions across multiple pods. For most people there are 2 solutions: go pod-by-pod finding and fixing the problem, or do that while also spending months trying to auto...
Podcast episode
Shoreline: Fleet Automation with Anurag Gupta: In today’s containerized world, it’s common to encounter similar issues with known solutions across multiple pods. For most people there are 2 solutions: go pod-by-pod finding and fixing the problem, or do that while also spending months trying to auto...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
AI vs software devs: with conversations from JS Party, Go Time & The Changelog
Podcast episode
AI vs software devs: with conversations from JS Party, Go Time & The Changelog
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
What is beyond PoCs? ML project-hurdles you should be prepared to take with Balázs Kégl - 016: Why do we do PoCs all the time and why do we struggle with Real projects? We are going to talk about ML project-hurdles with the head of AI at Huawei Paris, Balazs Kegl.
Podcast episode
What is beyond PoCs? ML project-hurdles you should be prepared to take with Balázs Kégl - 016: Why do we do PoCs all the time and why do we struggle with Real projects? We are going to talk about ML project-hurdles with the head of AI at Huawei Paris, Balazs Kegl.
byMachine Learning Cafe
0 ratings
0% found this document useful
[AI Team Success] Cultivating an Innovation Culture for AI - with Caroline Gorski of Rolls-Royce: Today’s guest is Caroline Gorski, CEO of R2 Data Labs – a data and tech-focused subsidiary of Rolls Royce. In conversation with Emerj CEO Daniel Faggella, they discuss being a new kind of company underneath a century-old legacy luxury name brand...
Podcast episode
[AI Team Success] Cultivating an Innovation Culture for AI - with Caroline Gorski of Rolls-Royce: Today’s guest is Caroline Gorski, CEO of R2 Data Labs – a data and tech-focused subsidiary of Rolls Royce. In conversation with Emerj CEO Daniel Faggella, they discuss being a new kind of company underneath a century-old legacy luxury name brand...
byThe AI in Business Podcast
0 ratings
0% found this document useful
#68 Probabilistic Machine Learning & Generative Models, with Kevin Murphy
Podcast episode
#68 Probabilistic Machine Learning & Generative Models, with Kevin Murphy
byLearning Bayesian Statistics
0 ratings
0% found this document useful
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
Podcast episode
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
byData Skeptic
0 ratings
0% found this document useful

Skip carousel

01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
How To File Your Taxes For Free Online With Deadline Day Nearing
Los Angeles Times
Article
How To File Your Taxes For Free Online With Deadline Day Nearing
May 11, 2021
Taxes may be one of life's certainties. But paying to pay them isn't. In many other countries, the government does the math for you and tells you how much to pay, at no cost to you (beyond the taxes you are paying in the first place). In America, the
3 min read
Text Docs To Rich Docs
Linux Format
Article
Text Docs To Rich Docs
Dec 17, 2019
6 min read
The Sweet Spot Between Idiot And Expert….
NZ Marketing
Article
The Sweet Spot Between Idiot And Expert….
Jun 21, 2018
6 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Three Low-code Options
PC Pro Magazine
Article
Three Low-code Options
Nov 12, 2020
Counting Intel, Vodafone and VW among its customers, OutSystems helps businesses create cloudbased, on-premises and hybrid applications for mobile and web. Its development environment is predominantly drag-and-drop, with views for processes, data and
3 min read
“We Should Pay Attention To The Way That A New Language Can Redefine The Limits Of Computing”
PC Pro Magazine
Article
“We Should Pay Attention To The Way That A New Language Can Redefine The Limits Of Computing”
Feb 11, 2021
7 min read
2024: What Is The Near Future Of Generative AI?
The European Business Review
Article
2024: What Is The Near Future Of Generative AI?
Jan 26, 2024
8 min read
Real World Computing
PC Pro Magazine
Article
Real World Computing
May 11, 2023
Migrating to Azure isn’t necessarily the toughest part of a successful cloud migration, explains our guest columnist Many organisations succeed at deploying resources in or migrating to Microsoft Azure. But many of those same organisations fail to en
6 min read
AI As A Service
PC Pro Magazine
Article
AI As A Service
Jul 9, 2020
2 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Forward Thinking
Racecar Engineering
Article
Forward Thinking
Feb 4, 2022
8 min read
Use Katana For Lookdev And Lighting
3D World
Article
Use Katana For Lookdev And Lighting
Sep 7, 2021
3 min read
End Of The Line!
Linux Format
Article
End Of The Line!
Nov 15, 2022
"October 2023 may seem like a long time away, but that’s when MySQL 5.7 will hit end-of-life (EOL) status. This normally means no more updates or security patches will be released. For companies running this database in their applications, it is time
1 min read
Investing in People Drives Growth
Inc.
Article
Investing in People Drives Growth
Aug 16, 2022
2 min read
AWS vs Azure
Linux Format
Article
AWS vs Azure
Aug 22, 2023
9 min read
10 Myths about Cloud Computing
Techfastly
Article
10 Myths about Cloud Computing
Oct 21, 2020
Cloud is a combination of hardware and software that stores your data virtually and gives you access to the desired software and application whenever you need it. Cloud computing is not your traditional computing that bounds and restricts the apps an
4 min read
‘Blueprints’ Help Small Business Take Advantage Of The Cloud
Futurity
Article
‘Blueprints’ Help Small Business Take Advantage Of The Cloud
Sep 6, 2019
2 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Vector Vexations
Linux Format
Article
Vector Vexations
Apr 2, 2024
Why does MySQL not support vectors in its community edition? Generative AI is the hot topic in tech. GenAI relies on vector data. Yet Oracle has no plans to support vectors in the community edition of MySQL. If you want to try out vector data with ot
1 min read
Collaboration
Linux Format
Article
Collaboration
Dec 15, 2020
1 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Improve your APPS
iCreate
Article
Improve your APPS
Nov 3, 2022
There are many advantages to using the core Apple apps on macOS and iOS, which include reliability, the best possible continuity with other apps and devices, and the knowledge that if there is a glitch it will be fixed quickly because so many people
8 min read
Write A Novel On Your Mac
iCreate
Article
Write A Novel On Your Mac
Apr 22, 2021
11 min read
Bitwarden vs LastPass
Maximum PC
Article
Bitwarden vs LastPass
Mar 2, 2021
4 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
How to Make Sure You’re Properly Managing User-Owned Encryption on macOS
MacWorld
Article
How to Make Sure You’re Properly Managing User-Owned Encryption on macOS
Jul 13, 2017
4 min read
The Verdict Static Site Generators
Linux Format
Article
The Verdict Static Site Generators
Oct 19, 2021
2 min read
How Google Is Making The AI That Powers Its Products Better.
HWM Singapore
Article
How Google Is Making The AI That Powers Its Products Better.
Jun 3, 2019
3 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read

Related categories

Skip carousel

Reviews for Beginning MLOps with MLFlow

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Beginning MLOps with MLFlow - Sridhar Alla

S. Alla, S. K. AdariBeginning MLOps with MLFlowhttps://doi.org/10.1007/978-1-4842-6549-9_1

1. Getting Started: Data Analysis

Sridhar Alla¹ and Suman Kalyan Adari²

(1)

Delran, NJ, USA

(2)

Tampa, FL, USA

In this chapter, we will go over the premise of the problem we are attempting to solve with the machine learning solution we want to operationalize. We will also begin data analysis and feature engineering of our data set.

Introduction and Premise

Welcome to Beginning MLOps with MLFlow! In this book, we will be taking an example problem, developing a machine learning solution to it, and operationalizing our model on AWS SageMaker, Microsoft Azure, Google Cloud, and Datarobots. The problem we will be looking at is the issue of performing anomaly detection on a credit card data set. In this chapter, we will explore this data set and show the overall structure while explaining a few techniques on analyzing this data. This data set can be found at www.kaggle.com/mlg-ulb/creditcardfraud.

If you are already familiar with how to analyze data and build machine learning models, feel free to grab the data set and skip ahead to 3 to jump right into MLOps.

Otherwise, we will first go over the general process of how machine learning solutions are generally created. The process goes something like this:

Identification of the problem: First of all, you need to have an idea of what the problem is, what can be done about it, what has been done about it, and why it is a problem worth solving.

Here’s an example of a problem: an invasive snake species harmful to the local environment has infested a region. This species is highly venomous and looks very similar to a harmless species of snake native to this same environment. Furthermore, the invasive species is destructive to the local environment and is outcompeting the local species.

In response, the local government has issued a statement encouraging citizens to go out and kill the venomous, invasive species on sight, but the problem is that it turns out citizens have been killing the local species as well due to how easy it is to confuse the two species.

What can be done about this? A possible solution is to use the power of machine learning and build an application to help citizens identify the snake species. What has been done about it? Perhaps someone released an app that does a poor job at distinguishing the two species, which doesn’t help remedy the current situation. Perhaps fliers have been given out, but it can be hard to identify every member of a species correctly based on just one picture.

Why is it a problem worth solving? The native species is important to the local environment. Killing the wrong species can end up exacerbating the situation and lead to the invasive species claiming the environment over the native species. And so building a computer vision-based application that can discern between the various snake species (and especially the two species relevant to the problem) could be a great way to help citizens get rid of the right snake species.

Collection of data: After you’ve identified the problem, you want to collect the relevant data. In the context of the snake species classification problem, you want to find images of various snake species in your region. The location depends on how big of a scale your project will operate on. Is it going to identify any snake in the world? Just snakes in Florida?

If you can afford to do so, the more data you collect, the better the potential training outcomes will be. More training examples can introduce increased variety to your model, making it better in the long run. Deep learning models scale in performance with large volumes of data, so keep that in mind as well.

Data analysis: Once you’ve collected all the raw data, you want to clean it up, process it, and format it in a way that allows you to analyze the data better.

For images, this could be something like applying an algorithm to crop out unnecessary parts of the image to focus solely on the snake. Additionally, maybe you want to center-crop the image to remove all the extra visual information in the data sample. Either way, raw image data is rarely ever in good enough condition to be used directly; it almost always requires processing to get the relevant data you want.

For unstructured data like images, formatting this data in a way good enough to analyze it could be something like creating a directory with all of the respective snake species and the relevant image data. From there, you can look at the count of images for each snake species class that you have and determine if you need to retrieve more samples for a particular species or not.

For structured data, say the credit-card data set, processing the raw data can mean something like getting rid of any entries with null values in them. Formatting them in a way so you can analyze them better can involve dimensionality-reduction techniques such as principal component analysis (PCA). Note: It turns out that most of the data in the credit card data set has actually been processed with PCA in part to preserve the privacy of the users the data has been extracted from.

As for the analysis, you can construct multiple graphs of different features to get an idea of the overall distribution and how the features look plotted against each other. This way, you can see any significant relationships between certain features that you might keep in mind when creating your training data.

There are some tools you can use in order to find out what features have the greatest influence on the label, such as phi-k correlation. By allowing you to see the different correlation values between the individual features and the target label, you can gain a deeper understanding of the relationships between the features in this data set. If needed, you can also drop features that aren’t very influential from the data. In this step, you really want to get a solid understanding of your data so you can apply a model architecture that is most suitable for it.

Feature engineering and data processing: Now you can use the knowledge you gained from analyzing the various features and their relationships to each another to potentially construct new features from combinations of several existing ones. For example, the Titanic data set is a great example that you can apply feature engineering to. In this case, you can take information such as class, age, fare, number of siblings, number of parents, and so on to create as many features as you can think up.

Feature engineering is really about giving your model a deeper context so it can learn the task better. You don’t necessarily want to create random features for the sake of it, but something that’s potentially relevant like number of female relatives, for example. (Since females were more likely to survive the sinking of the Titanic, could it be possible that if a person had more female relatives, they were less likely to survive as preference was given to their female relatives instead?)

The next step after feature engineering is data processing, which is a step involving all preparations made to process the data to be passed into the model. In the context of the snake species image data, this could involve normalizing all the values to be between 0 and 1 as well as batching the data into groups.

This step also usually creates several subsets of your initial data: a training data set, a testing data set, and a validation data set. We will go into more detail on the purpose of each of these data sets later. For now, a training data set contains the data you want the model to learn from, the testing data set contains data you want to evaluate the model’s performance on, and the validation data set is used to either select a model or help tune a model’s hyperparameters to draw out a better performance.

Build the model: Now that the data processing is done, this step is all about selecting the proper architecture and building the model. For the snake species image data, a good choice would be to use a convolutional neural network (CNN) because they work very well for any tasks involving images. From there, it is up to you to define the specific architecture of the model with respect to its layer composition.

Training, evaluating, and validating: When you’re training your CNN model, you’re usually passing in batches of data until the entire data makes a full pass through the model. From the results of this forward pass, calculations are made that tell the model how to adjust the weights as they are made going backwards across the network in what’s called the backward pass. The training process is essentially where the model learns how to perform the task and gets better at it the more examples it sees.

After the training process, either the evaluation step or the validation step can come next. As long as the testing set and validation set come from different distributions (the validation set can be derived from the training set, while the testing set can be derived from the original data), the model is technically seeing new data in the evaluation and validation processes. The model will never learn anything from the evaluation data, so you can test your model anytime.

Model evaluation is where the model’s performance metrics such as accuracy, precision, recall, and so on are evaluated on a data set that it has never seen before. We will go into more detail on the evaluation step once it becomes more relevant in the next chapter, Chapter 2.

Depending on the context, the exact purpose of validation can differ, along with the question of whether or not evaluation should be performed first after training. Let’s define several sample scenarios where you would use validation:

Selecting a model architecture: Of several model types or architectures, you use k-fold cross-validation, for example, to quickly train and evaluate each of the models on some data partition of the validation set to get an idea of how they are performing. This way, you can get a good idea of which model is performing best, allowing you to pick a model and continue with the rest of the process.

Selecting the best model: Of several trained models, you can use something like k-fold cross-validation to quickly evaluate each model on the validation data to allow you to get an idea of which ones are performing best.

Tuning hyperparameters: Quickly train a model and test it with different hyperparameter setups to get an idea of which configurations work better. You can start with a broad range of hyperparameters. From there, you can use the results to narrow the range of hyperparameters until you get to a configuration where you are satisfied. Models in deep learning, for example, can have many hyperparameters, so using validation to tune those hyperparameters can work well in deep learning settings. Just beware of diminishing returns. After a certain precision with the hyperparameter setting, you’re not going to see that big of a performance boost in the model.

Indication of high variance: This validation data is slightly different from the other three examples. In the case of neural networks, this data is derived from a small split of the training data. After one full pass of the training data, the model evaluates on this validation data to calculate metrics such as loss and accuracy.

If your training accuracy is high and training loss is low, but the validation accuracy is low and the validation loss is high, that’s an indication that your model suffers from high variance. What this means is that your model has not learned to generalize what it is learning to new data, as the validation data in this case is comprised of data it has never seen before. In other words, your model is overfitting. The model just isn’t recreating the kind of performance it gets on the training data on new data that it hasn’t seen before.

If your model has poor training accuracy and high training loss, then your model suffers from high bias, meaning it isn’t learning how to perform the task correctly on the training data at all.

This little validation split during the training process can give you an early indication of when overfitting is occurring.

Predicting: Once the model has been trained, evaluated, and validated, it is then ready to make predictions. In the context of the snake species detector, this step involves passing in visual images of the snake in question to get some prediction back. For example, if the model is supposed to detect the snake, draw a box around it, and label it (in an object detection task), it will do so and display the results in real time in the application.

If it just classifies the snake in the picture, the user simply sends their photo of a snake to the model (via the application) to get a species classification prediction along with perhaps a probability confidence score.

Hopefully now you have a better idea of what goes on when creating machine learning solutions.

With all that in mind, let’s get started on the example, where you will use the credit card data set to build simple anomaly detection models using the data.

Credit Card Data Set

Before you perform any data analysis, you need to first collect your data. Once again, the data set can be found at the following link: www.kaggle.com/mlg-ulb/creditcardfraud.

Following the link, you should see something like the following in Figure 1-1.

../images/499842_1_En_1_Chapter/499842_1_En_1_Fig1_HTML.jpg

Figure 1-1

Kaggle website page on the credit card data

From here, you want to download the data set by clicking the Download (144 MB) button next to New Notebook. It should take you to a sign-in page if you’re not already signed in, but you should be able to download the data set after that.

Once the zip file finishes downloading, simply extract it somewhere to reveal the credit card data set. Now let’s open up Jupyter and explore this data set. Before you start this step, let’s go over the exact packages and their versions:

Python 3.6.5

numpy 1.18.3

pandas 0.24.2

matplotlib 3.2.1

To check your package versions, you can run a command like

pip show package_name

Alternatively, you can run the following code to display the version in the notebook itself:

import module_name

print(module_name.__version__)

In this case, module_name is the name of the package you’re importing, such as numpy.

Loading the Data Set

Let’s begin! First, open a new notebook and import all of the dependencies and set global parameters for this notebook:

%matplotlib inline

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from pylab import rcParams

rcParams['figure.figsize'] = 14, 8

Refer to Figure 1-2.

../images/499842_1_En_1_Chapter/499842_1_En_1_Fig2_HTML.jpg

Figure 1-2

Jupyter notebook cell with some import statements as well as a global parameter definition for the size of all matplotlib plots

Now that you have imported the necessary libraries, you can load the data set. In this case, the data folder exists in the same directory as the notebook file and contains the creditcard.csv file. Here is the code:

data_path = data/creditcard.csv

df = pd.read_csv(data_path)

Refer to Figure 1-3.

../images/499842_1_En_1_Chapter/499842_1_En_1_Fig3_HTML.jpg

Figure 1-3

Defining the data path to the credit card data set .csv file, reading its contents, and creating a pandas data frame object

Now that the data frame has been loaded, let’s take a look at its contents:

df.head()

Refer to Figure 1-4.

../images/499842_1_En_1_Chapter/499842_1_En_1_Fig4_HTML.jpg

Figure 1-4

Calling the head() function on the data frame to display the first five rows of the data frame

If you are not familiar with the df.head(n) function, it essentially prints the first n rows of the data frame. If you did not pass any arguments, like in the figure above, then the function defaults to a value of five, printing the first five rows of the data frame.

Feel free to play around with that function as well as use the scroll bar to explore the rest of the features.

Now, let’s look at some basic statistical values relating to the values in this data frame:

df.describe()

Refer to Figure 1-5.

../images/499842_1_En_1_Chapter/499842_1_En_1_Fig5_HTML.jpg

Enjoying the preview?

Page 1 of 1

Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and Microsoft Azure

About this ebook

Sridhar Alla

Read more from Sridhar Alla

Related authors

Related to Beginning MLOps with MLFlow

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Beginning MLOps with MLFlow

What did you think?

Book preview

Beginning MLOps with MLFlow - Sridhar Alla

1. Getting Started: Data Analysis

Introduction and Premise

Credit Card Data Set

Loading the Data Set