Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R

Ebook534 pages3 hours

Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R

Name: Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R
Author: V Kishore Ayyadevara
ISBN: 9781484235645

By V Kishore Ayyadevara

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Bridge the gap between a high-level understanding of how an algorithm works and knowing the nuts and bolts to tune your models better. This book will give you the confidence and skills when developing all the major machine learning models. In Pro Machine Learning Algorithms, you will first develop the algorithm in Excel so that you get a practical understanding of all the levers that can be tuned in a model, before implementing the models in Python/R.
You will cover all the major algorithms: supervised and unsupervised learning, which include linear/logistic regression; k-means clustering; PCA; recommender system; decision tree; random forest; GBM; and neural networks. You will also be exposed to the latest in deep learning through CNNs, RNNs, and word2vec for text mining. You will be learning not only the algorithms, but also the concepts of feature engineering to maximize the performance of a model. You will see the theory along with case studies, such as sentiment classification, fraud detection, recommender systems, and image recognition, so that you get the best of both theory and practice for the vast majority of the machine learning algorithms used in industry. Along with learning the algorithms, you will also be exposed to running machine-learning models on all the major cloud service providers.
You are expected to have minimal knowledge of statistics/software programming and by the end of this book you should be able to work on a machine learning project with confidence.
What You Will Learn

Get an in-depth understanding of all the major machine learning and deep learning algorithms
Fully appreciate the pitfalls to avoid while building models
Implement machine learning algorithms in the cloud
Follow a hands-on approach through case studies for each algorithm
Gain the tricks of ensemble learning to build more accurate models
Discover the basics of programming in R/Python and the Keras framework for deep learning

Who This Book Is For
Business analysts/ IT professionals who want to transition into data science roles. Data scientists who want to solidify their knowledge in machine learning.

Skip carousel

LanguageEnglish

PublisherApress

Release dateJun 30, 2018

ISBN9781484235645

Author

V Kishore Ayyadevara

Related authors

Skip carousel

Related to Pro Machine Learning Algorithms

Related ebooks

Skip carousel

Deep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks
Ebook
Deep Belief Nets in C++ and CUDA C: Volume 1: Restricted Boltzmann Machines and Supervised Feedforward Networks
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings
.NET DevOps for Azure: A Developer's Guide to DevOps Architecture the Right Way
Ebook
.NET DevOps for Azure: A Developer's Guide to DevOps Architecture the Right Way
byJeffrey Palermo
Rating: 0 out of 5 stars
0 ratings
Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient
Ebook
Hyperparameter Optimization in Machine Learning: Make Your Machine Learning and Deep Learning Models More Efficient
byTanay Agrawal
Rating: 0 out of 5 stars
0 ratings
Learn PySpark: Build Python-based Machine Learning and Deep Learning Models
Ebook
Learn PySpark: Build Python-based Machine Learning and Deep Learning Models
byPramod Singh
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
Ebook
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
byErvin Varga
Rating: 0 out of 5 stars
0 ratings
Practical MATLAB: With Modeling, Simulation, and Processing Projects
Ebook
Practical MATLAB: With Modeling, Simulation, and Processing Projects
byIrfan Turk
Rating: 0 out of 5 stars
0 ratings
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
Ebook
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
byBahaaldine Azarmi
Rating: 0 out of 5 stars
0 ratings
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
Ebook
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
byThurupathan Vijayakumar
Rating: 0 out of 5 stars
0 ratings
Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
Ebook
Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
byPramod Singh
Rating: 0 out of 5 stars
0 ratings
REST API Development with Node.js: Manage and Understand the Full Capabilities of Successful REST Development
Ebook
REST API Development with Node.js: Manage and Understand the Full Capabilities of Successful REST Development
byFernando Doglio
Rating: 0 out of 5 stars
0 ratings
SQL Primer: An Accelerated Introduction to SQL Basics
Ebook
SQL Primer: An Accelerated Introduction to SQL Basics
byRahul Batra
Rating: 0 out of 5 stars
0 ratings
Elasticsearch 8 for Developers - 2nd Edition: A beginner's guide to indexing, analyzing, searching, and aggregating data (English Edition)
Ebook
Elasticsearch 8 for Developers - 2nd Edition: A beginner's guide to indexing, analyzing, searching, and aggregating data (English Edition)
byAnurag Srivastava
Rating: 0 out of 5 stars
0 ratings
Assessing and Improving Prediction and Classification: Theory and Algorithms in C++
Ebook
Assessing and Improving Prediction and Classification: Theory and Algorithms in C++
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings
.NET IL Assembler
Ebook
.NET IL Assembler
bySerge Lidin
Rating: 0 out of 5 stars
0 ratings
Database Design and Relational Theory: Normal Forms and All That Jazz
Ebook
Database Design and Relational Theory: Normal Forms and All That Jazz
byC.J. Date
Rating: 4 out of 5 stars
4/5
Beginning Oracle Database 12c Administration: From Novice to Professional
Ebook
Beginning Oracle Database 12c Administration: From Novice to Professional
byIgnatius Fernandez
Rating: 0 out of 5 stars
0 ratings
Spaghetti Code How to Make a Career Out of Playing With Computers
Ebook
Spaghetti Code How to Make a Career Out of Playing With Computers
byCarl Allen Schoner
Rating: 0 out of 5 stars
0 ratings
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
Ebook
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
byTaweh Beysolow II
Rating: 0 out of 5 stars
0 ratings
Developing Applications with Azure Active Directory: Principles of Authentication and Authorization for Architects and Developers
Ebook
Developing Applications with Azure Active Directory: Principles of Authentication and Authorization for Architects and Developers
byManas Mayank
Rating: 0 out of 5 stars
0 ratings
Oracle Enterprise Manager 12c Command-Line Interface
Ebook
Oracle Enterprise Manager 12c Command-Line Interface
byKellyn Pot'Vin
Rating: 0 out of 5 stars
0 ratings
MongoDB Recipes: With Data Modeling and Query Building Strategies
Ebook
MongoDB Recipes: With Data Modeling and Query Building Strategies
bySubhashini Chellappan
Rating: 0 out of 5 stars
0 ratings
Cyber Security on Azure: An IT Professional’s Guide to Microsoft Azure Security
Ebook
Cyber Security on Azure: An IT Professional’s Guide to Microsoft Azure Security
byMarshall Copeland
Rating: 0 out of 5 stars
0 ratings
The SQL Server DBA’s Guide to Docker Containers: Agile Deployment without Infrastructure Lock-in
Ebook
The SQL Server DBA’s Guide to Docker Containers: Agile Deployment without Infrastructure Lock-in
byEdwin M Sarmiento
Rating: 0 out of 5 stars
0 ratings
Inside Deep Learning: Math, Algorithms, Models
Ebook
Inside Deep Learning: Math, Algorithms, Models
byEdward Raff
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources
Ebook
Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources
byPeter Zadrozny
Rating: 0 out of 5 stars
0 ratings
JavaScript: Optimizing Native JavaScript: Designing, Programming, and Debugging Native JavaScript Applications
Ebook
JavaScript: Optimizing Native JavaScript: Designing, Programming, and Debugging Native JavaScript Applications
byRobert C. Etheredge
Rating: 0 out of 5 stars
0 ratings
Networking and Online Games: Understanding and Engineering Multiplayer Internet Games
Ebook
Networking and Online Games: Understanding and Engineering Multiplayer Internet Games
byGrenville Armitage
Rating: 5 out of 5 stars
5/5
Pro Cryptography and Cryptanalysis: Creating Advanced Algorithms with C# and .NET
Ebook
Pro Cryptography and Cryptanalysis: Creating Advanced Algorithms with C# and .NET
byMarius Iulian Mihailescu
Rating: 0 out of 5 stars
0 ratings
Automated Theorem Proving in Software Engineering
Ebook
Automated Theorem Proving in Software Engineering
byJohann M. Schumann
Rating: 0 out of 5 stars
0 ratings
Handbook of Human Centric Visualization
Ebook
Handbook of Human Centric Visualization
byWeidong Huang
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
Ebook
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
byS M Howard
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Mastering ChatGPT: Unlock the Power of AI for Enhanced Communication and Relationships: English
Ebook
Mastering ChatGPT: Unlock the Power of AI for Enhanced Communication and Relationships: English
byVasyl Kolomiiets
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Dancing with Qubits: How quantum computing works and how it can change the world
Ebook
Dancing with Qubits: How quantum computing works and how it can change the world
byRobert S. Sutor
Rating: 5 out of 5 stars
5/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
TensorFlow in 1 Day: Make your own Neural Network
Ebook
TensorFlow in 1 Day: Make your own Neural Network
byKrishna Rungta
Rating: 4 out of 5 stars
4/5
ChatGPT for Marketing: A Practical Guide
Ebook
ChatGPT for Marketing: A Practical Guide
byJuanjo Ramos
Rating: 3 out of 5 stars
3/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
ChatGPT
Ebook
ChatGPT
byRobert Conway
Rating: 1 out of 5 stars
1/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
Podcast episode
Hasty Treat - Hireable Skills for 2021: In this Hasty Treat, Scott and Wes talk about hireable skills or 2021 — what you need to know to get a job and grow in your career this year! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Differential Privacy with Dr. Yun Lu: Differential privacy provides a mathematical definition of what privacy is in the context of user data. In lay terms, a data set is said to be differentially private if the existence or lack of existence of a particular piece of data doesn't impact the e...
Podcast episode
Differential Privacy with Dr. Yun Lu: Differential privacy provides a mathematical definition of what privacy is in the context of user data. In lay terms, a data set is said to be differentially private if the existence or lack of existence of a particular piece of data doesn't impact the e...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
125: pytest 6 - Anthony Sottile: pytest 6 is out. Anthony Sottile joins the show to discuss features, improvements, documentation updates and more.
Podcast episode
125: pytest 6 - Anthony Sottile: pytest 6 is out. Anthony Sottile joins the show to discuss features, improvements, documentation updates and more.
byTest and Code
0 ratings
0% found this document useful
#124 Using AI to Improve Data Quality in Healthcare
Podcast episode
#124 Using AI to Improve Data Quality in Healthcare
byDataFramed
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
Jobs of Tomorrow: Windows Insider Podcast Episode 17
Podcast episode
Jobs of Tomorrow: Windows Insider Podcast Episode 17
byWindows Insider Podcast
100%
100% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
Podcast episode
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
Podcast episode
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
Podcast episode
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
byBig Technology Podcast
100%
100% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Machine Learning: Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how...
Podcast episode
Machine Learning: Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how...
byOracle University Podcast
0 ratings
0% found this document useful
Responsible and Explainable AI - Supreet Kaur
Podcast episode
Responsible and Explainable AI - Supreet Kaur
byDataTalks.Club
0 ratings
0% found this document useful
PSW #771 - Dan DeCloss
Podcast episode
PSW #771 - Dan DeCloss
bySecurity Weekly Podcast Network (Audio)
0 ratings
0% found this document useful
The Role of Infrastructure in ML // Niels Bantilan // #197
Podcast episode
The Role of Infrastructure in ML // Niels Bantilan // #197
byMLOps.community
0 ratings
0% found this document useful
Product Owners in Data Science - Anna Hannemann
Podcast episode
Product Owners in Data Science - Anna Hannemann
byDataTalks.Club
0 ratings
0% found this document useful
Oracle Machine Learning: There is so much data available today. But it only makes a difference when you transform that data into actionable intelligence. In this episode, hosts Lois Houston and Nikita Abraham, along with Nick Commisso, discuss how you can harness the...
Podcast episode
Oracle Machine Learning: There is so much data available today. But it only makes a difference when you transform that data into actionable intelligence. In this episode, hosts Lois Houston and Nikita Abraham, along with Nick Commisso, discuss how you can harness the...
byOracle University Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
Podcast episode
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
byOracle University Podcast
0 ratings
0% found this document useful
Stages of Enterprise AI Maturity, from a Practitioner's Perspective - with Rajkumar Bondugula of Verizon: This week’s guest is Rajkumar Bondugula, Chief Data Scientist at Verizon. Rajkumar holds a Ph.D. in Machine Learning and was previously the Principal Data Scientist at Equifax. In this episode, Rajkumar clarifies some of the critical differences...
Podcast episode
Stages of Enterprise AI Maturity, from a Practitioner's Perspective - with Rajkumar Bondugula of Verizon: This week’s guest is Rajkumar Bondugula, Chief Data Scientist at Verizon. Rajkumar holds a Ph.D. in Machine Learning and was previously the Principal Data Scientist at Equifax. In this episode, Rajkumar clarifies some of the critical differences...
byThe AI in Business Podcast
0 ratings
0% found this document useful
PSW #768 - Robert Martin: In the Security News: The Roblox prison yard, password manager problems, PyTorch gets torched with a supply chain attack, Oppenheimer cleared, Puckungfu, spice up your persistence with PHP, turning Google home into a wiretap device, Nintendo 3DS...
Podcast episode
PSW #768 - Robert Martin: In the Security News: The Roblox prison yard, password manager problems, PyTorch gets torched with a supply chain attack, Oppenheimer cleared, Puckungfu, spice up your persistence with PHP, turning Google home into a wiretap device, Nintendo 3DS...
bySecurity Weekly Podcast Network (Audio)
0 ratings
0% found this document useful
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
Podcast episode
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
#88 - Observability Engineering - Liz Fong-Jones
Podcast episode
#88 - Observability Engineering - Liz Fong-Jones
byTech Lead Journal
0 ratings
0% found this document useful
DevOps and Incident Response Evolution
Podcast episode
DevOps and Incident Response Evolution
byThe Cloudcast
0 ratings
0% found this document useful
Agile Applied AI Research with Parvez Ahammad - #492: Today we’re joined by Parvez Ahammad, head of data science applied research at LinkedIn. In our conversation, Parvez shares his interesting take on organizing principles for his organization, starting with how data science teams are broadly...
Podcast episode
Agile Applied AI Research with Parvez Ahammad - #492: Today we’re joined by Parvez Ahammad, head of data science applied research at LinkedIn. In our conversation, Parvez shares his interesting take on organizing principles for his organization, starting with how data science teams are broadly...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful

Skip carousel

Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
Workflow
Linux Format
Article
Workflow
Nov 17, 2020
3 min read
The Coming Software Apocalypse
The Atlantic
Article
The Coming Software Apocalypse
Sep 26, 2017
33 min read
The Return Of Gpu Computing
PC Pro Magazine
Article
The Return Of Gpu Computing
Jul 8, 2021
5 min read
The Fundamental Limits of Machine Learning
Nautilus
Article
The Fundamental Limits of Machine Learning
Aug 14, 2017
5 min read
The Race To Exascale Supercomputers
Maximum PC
Article
The Race To Exascale Supercomputers
Jun 21, 2022
9 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
AI And Design: Questions Of Ethics
Architecture Australia
Article
AI And Design: Questions Of Ethics
Mar 4, 2024
Artificial intelligence (AI) is a very old idea, but the term AI and the field of AI as it relates to modern programmable digital computing have taken their contemporary forms in the past 70 years.1Today, we interact with AI technologies constantly,
5 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
TechLife News
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 29, 2023
4 min read
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
AppleMagazine
Article
Q&A: OPENAI CTO MIRA MURATI ON SHEPHERDING CHATGPT
Apr 28, 2023
4 min read
The Algorithmic Leader
Rotman Management
Article
The Algorithmic Leader
Jan 1, 2020
9 min read
Forward Thinking
Racecar Engineering
Article
Forward Thinking
Feb 4, 2022
8 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
The Machine Learning Revolution
Maximum PC
Article
The Machine Learning Revolution
Aug 17, 2021
8 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
The Infrastructure of an AI Factory
Techfastly
Article
The Infrastructure of an AI Factory
Mar 3, 2021
Data is a crucial element for machine learning algorithms. It can be considered as a fuel of AI factories. Collection of useful data and feeding it into frameworks and models is the foremost step. Data acts as a case or example that the algorithms re
1 min read
‘Deep Learning’ Goes Faster With Organized Data
Futurity
Article
‘Deep Learning’ Goes Faster With Organized Data
Jun 5, 2017
Researchers have found that a technique for speedy data lookup, called hashing, can dramatically reduce the amount of computation required for deep learning, a demanding form of machine learning. “This applies to any deep-learning architecture, and t
2 min read
How To Train Computers Faster For ‘Extreme’ Datasets
Futurity
Article
How To Train Computers Faster For ‘Extreme’ Datasets
Dec 12, 2019
4 min read
Family History In The AI Era
Family Tree UK
Article
Family History In The AI Era
Apr 12, 2024
7 min read
The Machine Learning Revolution
APC
Article
The Machine Learning Revolution
Sep 6, 2021
8 min read
“There’s A Big Difference Between Research Work And The Risk You’re Likely To Be Exposed To”
PC Pro Magazine
Article
“There’s A Big Difference Between Research Work And The Risk You’re Likely To Be Exposed To”
Aug 7, 2022
Most cyber-scare stories have more in common with horror fiction than practical reality, and I’m not talking purely about the hyped-up cyber-warfare stuff that appears online. Me being me, I’m focused on the hacking threat stuff. Regular readers of m
6 min read

Related categories

Skip carousel

Reviews for Pro Machine Learning Algorithms

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Pro Machine Learning Algorithms - V Kishore Ayyadevara

V Kishore AyyadevaraPro Machine Learning Algorithms https://doi.org/10.1007/978-1-4842-3564-5_1

1. Basics of Machine Learning

V Kishore Ayyadevara¹

(1)

Hyderabad, Andhra Pradesh, India

Machine learning can be broadly classified into supervised and unsupervised learning. By definition, the term supervised means that the machine (the system) learns with the help of something—typically a labeled training data.

Training data (or a dataset) is the basis on which the system learns to infer. An example of this process is to show the system a set of images of cats and dogs with the corresponding labels of the images (the labels say whether the image is of a cat or a dog) and let the system decipher the features of cats and dogs.

Similarly, unsupervised learning is the process of grouping data into similar categories. An example of this is to input into the system a set of images of dogs and cats without mentioning which image belongs to which category and let the system group the two types of images into different buckets based on the similarity of images.

In this chapter, we will go through the following:

The difference between regression and classification

The need for training, validation, and testing data

The different measures of accuracy

Regression and Classification

Let’s assume that we are forecasting for the number of units of Coke that would be sold in summer in a certain region. The value ranges between certain values—let’s say 1 million to 1.2 million units per week. Typically, regression is a way of forecasting for such continuous variables.

Classification or prediction, on the other hand, predicts for events that have few distinct outcomes—for example, whether a day will be sunny or rainy.

Linear regression is a typical example of a technique to forecast continuous variables, whereas logistic regression is a typical technique to predict discrete variables. There are a host of other techniques, including decision trees, random forests, GBM, neural networks, and more, that can help predict both continuous and discrete outcomes.

Training and Testing Data

Typically, in regression, we deal with the problem of generalization/overfitting. Overfitting problems arise when the model is so complex that it perfectly fits all the data points, resulting in a minimal possible error rate. A typical example of an overfitted dataset looks like Figure 1-1.

../images/463052_1_En_1_Chapter/463052_1_En_1_Fig1_HTML.jpg

Figure 1-1

An overfitted dataset

From the dataset in the figure, you can see that the straight line does not fit all the data points perfectly, whereas the curved line fits the points perfectly—hence the curve has minimal error on the data points on which it is trained.

However, the straight line has a better chance of being more generalizable when compared to the curve on a new dataset. So, in practice, regression/classification is a trade-off between the generalizability of the model and complexity of model.

The lower the generalizability of the model, the higher the error rate will be on unseen data points.

This phenomenon can be observed in Figure 1-2. As the complexity of the model increases, the error rate of unseen data points keeps reducing up to a point, after which it starts increasing again. However, the error rate on training dataset keeps on decreasing as the complexity of model increases - eventually leading to overfitting.

../images/463052_1_En_1_Chapter/463052_1_En_1_Fig2_HTML.jpg

Figure 1-2

Error rate in unseen data points

The unseen data points are the points that are not used in training the model, but are used in testing the accuracy of the model, and so are called testing data or test data.

The Need for Validation Dataset

The major problem in having a fixed training and testing dataset is that the test dataset might be very similar to the training dataset, whereas a new (future) dataset might not be very similar to the training dataset. The result of a future dataset not being similar to a training dataset is that the model’s accuracy for the future dataset may be very low.

An intuition of the problem is typically seen in data science competitions and hackathons like Kaggle ( www.kaggle.com ). The public leaderboard is not always the same as the private leaderboard. Typically, for a test dataset, the competition organizer will not tell the users which rows of the test dataset belong to the public leaderboard and which belong to the private leaderboard. Essentially, a randomly selected subset of test dataset goes to the public leaderboard and the rest goes to the private leaderboard.

One can think of the private leaderboard as a test dataset for which the accuracy is not known to the user, whereas with the public leaderboard the user is told the accuracy of the model .

Potentially, people overfit on the basis of the public leaderboard, and the private leaderboard might be a slightly different dataset that is not highly representative of the public leaderboard’s dataset.

The problem can be seen in Figure 1-3.

../images/463052_1_En_1_Chapter/463052_1_En_1_Fig3_HTML.jpg

Figure 1-3

The problem illustrated

In this case, you would notice that a user moved down from rank 17 to rank 47 when compared between public and private leaderboards. Cross-validation is a technique that helps avoid the problem. Let’s go through the workings in detail.

If we only have a training and testing dataset, given that the testing dataset would be unseen by the model, we would not be in a position to come up with the combination of hyper-parameters (A hyper-parameter can be thought of as a knob that we change to improve our model’s accuracy) that maximize the model’s accuracy on unseen data unless we have a third dataset. Validation is the third dataset that can be used to see how accurate the model is when the hyper-parameters are changed. Typically, out of the 100% data points in a dataset, 60% are used for training, 20% are used for validation, and the remaining 20% are for testing the dataset.

Another idea for a validation dataset goes like this: assume that you are building a model to predict whether a customer is likely to churn in the next two months. Most of the dataset will be used to train the model, and the rest can be used to test the dataset. But in most of the techniques we will deal with in subsequent chapters, you’ll notice that they involve hyper-parameters.

As we keep changing the hyper-parameters, the accuracy of a model varies by quite a bit, but unless there is another dataset, we cannot ascertain whether accuracy is improving. Here’s why:

We cannot test a model’s accuracy on the dataset on which it is trained.

We cannot use the result of test dataset accuracy to finalize the ideal hyper-parameters, because, practically, the test dataset is unseen by the model.

Hence, the need for a third dataset—the validation dataset .

Measures of Accuracy

In a typical linear regression (where continuous values are predicted), there are a couple of ways of measuring the error of a model. Typically, error is measured on the testing dataset, because measuring error on the training dataset (the dataset a model is built on) is misleading—as the model has already seen the data points, and we would not be in a position to say anything about the accuracy on a future dataset if we test the model’s accuracy on the training dataset only. That’s why error is always measured on the dataset that is not used to build a model.

Absolute Error

Absolute error is defined as the absolute value of the difference between forecasted value and actual value. Let’s imagine a scenario as follows:

In this scenario, we might incorrectly see that the overall error is 0 (because one error is +20 and the other is –20). If we assume that the overall error of the model is 0, we are missing the fact that the model is not working well on individual data points.

To avoid the issue of a positive error and negative error cancelling out each other and thus resulting in minimal error, we consider the absolute error of a model , which in this case is 40, and the absolute error rate is 40 / 200 = 20%

Root Mean Square Error

Another approach to solving the problem of inconsistent signs of error is to square the error (the square of a negative number is a positive number). The scenario under discussion above can be translated as follows:

Now the overall squared error is 800, and the root mean squared error (RMSE) is the square root of (800 / 2), which is 20.

Confusion Matrix

Absolute error and RMSE are applicable while predicting continuous variables. However, predicting an event with discrete outcomes is a different process. Discrete event prediction happens in terms of probability—the result of the model is a probability that a certain event happens. In such cases, even though absolute error and RMSE can theoretically be used, there are other relevant metrics.

A confusion matrix counts the number of instances when the model predicted the outcome of an event and measures it against the actual values, as follows:

Sensitivity or true positive rate or recall = true positive / (total positives) = TP/ (TP + FN)

Specificity or true negative rate = true negative / (total negative) = TN / (FP + TN)

Precision or positive predicted value = TP / (TP + FP)

Recall = TP / (TP+FN)

Accuracy = (TP + TN) / (TP + FN + FP + TN)

F1 score = 2TP/ (2TP + FP + FN)

AUC Value and ROC Curve

Let’s say you are consulting for an operations team that manually reviews e-commerce transactions to see if they are fraud or not.

The cost associated with such a process is the manpower required to review all the transactions.

The benefit associated with the cost is the number of fraudulent transactions that are preempted because of the manual review.

The overall profit associated with this setup above is the money saved by preventing fraud minus the cost of manual review.

In such a scenario, a model can come in handy as follows: we could come up with a model that gives a score to each transaction. Each transaction is scored on the probability of being a fraud. This way, all the transactions that have very little chances of being a fraud need not be reviewed by a manual reviewer. The benefit of the model thus would be to reduce the number of transactions that need to be reviewed, thereby reducing the amount of human resources needed to review the transactions and reducing the cost associated with the reviews. However, because some transactions are not reviewed, however small the probability of fraud is, there could still be some fraud that is not captured because some transactions are not reviewed.

In that scenario, a model could be helpful if it improves the overall profit by reducing the number of transactions to be reviewed (which, hopefully, are the transactions that are less likely to be fraud transactions).

The steps we would follow in calculating the area under the curve (AUC) are as follows:

Score each transaction to calculate the probability of fraud. (The scoring is based on a predictive model—more details on this in Chapter 3.)

Order the transactions in descending order of probability.

There should be very few data points that are non-frauds at the top of the ordered dataset and very few data points that are frauds at the bottom of the ordered dataset. AUC value penalizes for having such anomalies in the dataset.

For now, let’s assume a total of 1,000,000 transactions are to be reviewed, and based on history, on average 1% of the total transactions are fraudulent.

The x-axis of the receiver operating characteristic (ROC) curve is the cumulative number of points (transactions) considered.

The y-axis is the cumulative number of fraudulent transactions captured.

Once we order the dataset, intuitively all the high-probability transactions are fraudulent transactions, and low-probability transactions are not fraudulent transactions. The cumulative number of frauds captured increases as we look at the initial few transactions, and after a certain point, it saturates as a further increase in transactions would not increase fraudulent transactions.

The graph of cumulative transactions reviewed on the x-axis and cumulative frauds captured on the y-axis would look like Figure 1-4.

../images/463052_1_En_1_Chapter/463052_1_En_1_Fig4_HTML.jpg

Figure 1-4

Cumulative frauds captured when using a model

In this scenario, we have a total of 10,000 fraudulent transactions out of a total 1,000,000 transactions. That’s an average 1% fraudulent rate—that is, one out of every 100 transactions is fraudulent.

If we do not have any model, our random guess would increment slowly, as shown in Figure 1-5.

../images/463052_1_En_1_Chapter/463052_1_En_1_Fig5_HTML.jpg

Figure 1-5

Cumulative frauds captured when transactions are randomly sampled

In Figure 1-5, you can see that the line divides the total dataset into two roughly equal parts—the area under the line is equal to 0.5 times of the total area. For convenience, if we assume that the total area of the plot is 1 unit, then the total area under the line generated by random guess model would be 0.5.

A comparison of the cumulative frauds captured based on the predictive model and random guess would be as shown in Figure 1-6.

../images/463052_1_En_1_Chapter/463052_1_En_1_Fig6_HTML.jpg

Figure 1-6

Comparison of cumulative frauds

Note that the area under the curve (AUC) below the curve generated by the predictive model is > 0.5 in this instance.

Thus, the higher the AUC, the better the predictive power of the model.

Unsupervised Learning

So far we have looked at supervised learning, where there is a dependent variable (the variable we are trying to predict) and an independent variable (the variable(s) we use to predict the dependent variable value).

However, in some scenarios, we would only have the independent variables—for example, in cases where we have to group customers based on certain characteristics. Unsupervised learning techniques come in handy in those cases.

There are two major types of unsupervised techniques:

Clustering-based approach

Principal components analysis (PCA)

Clustering is an approach where rows are grouped, and PCA is an approach where columns are grouped. We can think of clustering as being useful in assigning a given customer into one or the other group (because each customer typically represents a row in the dataset), whereas PCA can be useful in grouping columns (alternatively, reducing the dimensionality/variables of data).

Though clustering helps in segmenting customers, it can also be a powerful pre-processing step in our model-building process (you’ll read more about that in Chapter 11). PCA can help speed up the model-building process by reducing the number of dimensions, thereby reducing the number of parameters to estimate.

In this book, we will be dealing with a majority of supervised and unsupervised algorithms as follows:

We first hand-code them in Excel.

We implement in R.

We implement in Python.

The basics of Excel, R and Python are outlined in the appendix.

Typical Approach Towards Building a Model

In the previous section, we saw a scenario of the cost-benefit analysis of an operations team implementing the predictive models in a real-world scenario. In this section, we’ll look at some of the points you should consider while building the predictive models.

Where Is the Data Fetched From?

Typically, data is available in tables in database, CSV, or text files. In a database, different tables may be capturing different information. For example, in order to understand fraudulent transactions, we would be likely to join a transactions table with customer demographics table to derive insights from data.

Which Data Needs to Be Fetched?

The output of a prediction exercise is only as good as the inputs that go into the model. The key part in getting the input right is understanding the drivers/ characteristics of what we are trying to predict better—in our case, understanding the characteristics of a fraudulent transaction better.

Here is where a data scientist typically dons the hat of a management consultant. They research the factors that might be driving the event they are trying to predict. They could do that by reaching out to the people who are working in the front line—for example, the fraud risk investigators who are manually reviewing the transactions—to understand the key factors that they look at while investigating a transaction.

Pre-processing the Data

The input data does not always come in clean every time. There may be multiple issues that need to be handled before building a model:

Missing values in data: Missing values in data exist when a variable (data point) is not recorded or when joins across different tables result in a nonexistent value.

Missing values can be imputed in a few ways. The simplest is by replacing the missing value with the average/ median of the column. Another way to replace a missing value is to add some intelligence based on the rest of variables available in a transaction. This method is known as identifying the K-nearest neighbors (more on this in Chapter 13).

Outliers in data: Outliers within the input variables result in inefficient optimization across the regression-based techniques (Chapter 2 talks more about the affect of outliers). Typically outliers are handled by capping variables at a certain percentile value (95%, for example).

Transformation of variables: The variable transformations available are as follows:

Scaling a variable: Scaling a variable in cases of techniques based on gradient descent generally result in faster optimization.

Log/Squared transformation: Log/Squared transformation comes in handy in scenarios where the input variable shares a non-linear relation with the dependent variable.

Feature Interaction

Consider the scenario where, the chances of a person’s survival on the Titanic is high when the person is male and also has low age. A typical regression-based technique would not take such a feature interaction into account, whereas a tree-based technique would. Feature interaction is the process of creating new variables based on a combination of variables. Note that, more often than not, feature interaction is known by understanding the business (the event that we are trying to predict) better.

Feature Generation

Feature generation is a process of finding additional features from the dataset. For example, a feature for predicting fraudulent transaction would be time since the last transaction for a given transaction. Such features are not available straightaway, but can only be derived by understanding the problem we are trying to solve.

Building the Models

Once the data is in place and the pre-processing steps are done, building a predictive model would

Enjoying the preview?

Page 1 of 1

Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R

About this ebook

V Kishore Ayyadevara

Related authors

Related to Pro Machine Learning Algorithms

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Pro Machine Learning Algorithms

What did you think?

Book preview

Pro Machine Learning Algorithms - V Kishore Ayyadevara

1. Basics of Machine Learning

Regression and Classification

Training and Testing Data

The Need for Validation Dataset

Measures of Accuracy

AUC Value and ROC Curve

Unsupervised Learning

Typical Approach Towards Building a Model

Where Is the Data Fetched From?

Which Data Needs to Be Fetched?

Pre-processing the Data

Feature Interaction

Feature Generation

Building the Models