Ebook912 pages5 hours

Practical Predictive Analytics

Name: Practical Predictive Analytics
Author: Ralph Winters
ISBN: 9781785880469

By Ralph Winters

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

A unique book that centers around develop six key practical skills needed to develop and implement predictive analytics
Apply the principles and techniques of predictive analytics to effectively interpret big data
Solve real-world analytical problems with the help of practical case studies and real-world scenarios taken from the world of healthcare, marketing, and other business domains

Who This Book Is For

This book is for those with a mathematical/statistics background who wish to understand the concepts, techniques, and implementation of predictive analytics to resolve complex analytical issues. Basic familiarity with a programming language of R is expected.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJun 30, 2017

ISBN9781785880469

Author

Ralph Winters

Related authors

Skip carousel

Related to Practical Predictive Analytics

Related ebooks

Skip carousel

Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
Ebook
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
byYoon Hyup Hwang
Rating: 5 out of 5 stars
5/5
Practical Data Science Cookbook - Second Edition
Ebook
Practical Data Science Cookbook - Second Edition
byTony Ojeda
Rating: 0 out of 5 stars
0 ratings
Mastering Python for Data Science
Ebook
Mastering Python for Data Science
bySamir Madhavan
Rating: 3 out of 5 stars
3/5
R Machine Learning Essentials
Ebook
R Machine Learning Essentials
byUsuelli Michele
Rating: 0 out of 5 stars
0 ratings
R Data Science Essentials
Ebook
R Data Science Essentials
byKoushik Raja B.
Rating: 2 out of 5 stars
2/5
Learning Probabilistic Graphical Models in R
Ebook
Learning Probabilistic Graphical Models in R
byDavid Bellot
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
R for Data Science
Ebook
R for Data Science
byDan Toomey
Rating: 5 out of 5 stars
5/5
Learning Social Media Analytics with R
Ebook
Learning Social Media Analytics with R
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings
Python Data Science Essentials
Ebook
Python Data Science Essentials
byBoschetti Alberto
Rating: 0 out of 5 stars
0 ratings
Building a Recommendation System with R
Ebook
Building a Recommendation System with R
byGorakala Suresh K.
Rating: 0 out of 5 stars
0 ratings
Python Data Analysis - Second Edition
Ebook
Python Data Analysis - Second Edition
byArmando Fandango
Rating: 0 out of 5 stars
0 ratings
Machine Learning with Spark - Second Edition
Ebook
Machine Learning with Spark - Second Edition
byNick Pentreath
Rating: 0 out of 5 stars
0 ratings
Learning RStudio for R Statistical Computing
Ebook
Learning RStudio for R Statistical Computing
byvan derLoo Mark
Rating: 4 out of 5 stars
4/5
Mastering Machine Learning with R
Ebook
Mastering Machine Learning with R
byLesmeister Cory
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
R Object-oriented Programming
Ebook
R Object-oriented Programming
byKelly Black
Rating: 3 out of 5 stars
3/5
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Ebook
Making Big Data Work for Your Business: A guide to effective Big Data analytics
bySudhi Sinha
Rating: 0 out of 5 stars
0 ratings
Hands-On Time Series Analysis with R: Perform time series analysis and forecasting using R
Ebook
Hands-On Time Series Analysis with R: Perform time series analysis and forecasting using R
byRami Krispin
Rating: 0 out of 5 stars
0 ratings
Getting Started with Python Data Analysis
Ebook
Getting Started with Python Data Analysis
byVo.T.H Phuong
Rating: 0 out of 5 stars
0 ratings
Mastering Machine Learning with R - Second Edition
Ebook
Mastering Machine Learning with R - Second Edition
byLesmeister Cory
Rating: 0 out of 5 stars
0 ratings
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
Ebook
Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python
byStefanie Molin
Rating: 0 out of 5 stars
0 ratings
Learning Bayesian Models with R
Ebook
Learning Bayesian Models with R
byM.Koduvely Dr. Hari
Rating: 5 out of 5 stars
5/5
Hands-On Genetic Algorithms with Python: Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
Ebook
Hands-On Genetic Algorithms with Python: Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
byEyal Wirsansky
Rating: 0 out of 5 stars
0 ratings
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
R High Performance Programming
Ebook
R High Performance Programming
byAloysius Lim
Rating: 4 out of 5 stars
4/5
Big Data Analytics with R
Ebook
Big Data Analytics with R
bySimon Walkowiak
Rating: 0 out of 5 stars
0 ratings
R Machine Learning By Example
Ebook
R Machine Learning By Example
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings
Data Scientist Pocket Guide: Over 600 Concepts, Terminologies, and Processes of Machine Learning and Deep Learning Assembled Together
Ebook
Data Scientist Pocket Guide: Over 600 Concepts, Terminologies, and Processes of Machine Learning and Deep Learning Assembled Together
byMohamed Sabri
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Network+ Study Guide & Practice Exams
Ebook
Network+ Study Guide & Practice Exams
byRobert Shimonski
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
Ebook
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
byAlexander Cooper
Rating: 5 out of 5 stars
5/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
#77 Acing the Data Science Interview
Podcast episode
#77 Acing the Data Science Interview
byDataFramed
0 ratings
0% found this document useful
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
Podcast episode
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks: A cross-over episode from The Machine Learning Podcast with the team from Deepchecks, exploring the challenges of testing and validating machine learning applications and their work to make it easier.
byThe Python Podcast.__init__
0 ratings
0% found this document useful
LLMs, Retrieval Augmented Generation, Knowledge Graph, Vector Databases with Mike Dillinger: <p>RAG, Retrieval Augemented Generation, is the term you now constantly hear in conjunction with LLM that provides context. But how does it actually work? And what's the relationship with Vector Databases and Knowledge Graphs? This will be a geeky AI e...
Podcast episode
LLMs, Retrieval Augmented Generation, Knowledge Graph, Vector Databases with Mike Dillinger: <p>RAG, Retrieval Augemented Generation, is the term you now constantly hear in conjunction with LLM that provides context. But how does it actually work? And what's the relationship with Vector Databases and Knowledge Graphs? This will be a geeky AI e...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
Podcast episode
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
byTechWave: A Gartner Podcast for IT Leaders
0 ratings
0% found this document useful
#54 Women in Data Science
Podcast episode
#54 Women in Data Science
byDataFramed
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Robert Chang: Building the Minerva Metrics Store @ Airbnb: Robert Chang is a product manager for the data platform at Airbnb, where he helped build and roll out Minerva, Airbnb's internal metrics store. They use Minerva to track over 12,000(!) metrics and 4,000(!) dimensions with consistency across the...
Podcast episode
Robert Chang: Building the Minerva Metrics Store @ Airbnb: Robert Chang is a product manager for the data platform at Airbnb, where he helped build and roll out Minerva, Airbnb's internal metrics store. They use Minerva to track over 12,000(!) metrics and 4,000(!) dimensions with consistency across the...
byThe Analytics Engineering Podcast
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
Podcast episode
#10 Data Science, the Environment and MOOCs: Air pollution, the environment and data science: where do these intersect? Find out in this episode of DataFramed, in which Hugo speaks with Roger Peng, Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health...
byDataFramed
0 ratings
0% found this document useful
Exploring the Zen of Python & pandas Features for Finance
Podcast episode
Exploring the Zen of Python & pandas Features for Finance
byThe Real Python Podcast
0 ratings
0% found this document useful
Experimentation and A/B Testing For Modern Data Teams With Eppo: An interview with Eppo founder Chetan Sharma about the challenges of designing, running, and analyzing product experiments and the work that he is doing to make it more accessible to organizations of every size.
Podcast episode
Experimentation and A/B Testing For Modern Data Teams With Eppo: An interview with Eppo founder Chetan Sharma about the challenges of designing, running, and analyzing product experiments and the work that he is doing to make it more accessible to organizations of every size.
byData Engineering Podcast
0 ratings
0% found this document useful
The Data of Love: Xiao-Li Meng and Liberty Vittert speak with relationship experts Drs. Julie and John Gottman. Listen to find out how to ensure your relationship lasts the test of time.
Podcast episode
The Data of Love: Xiao-Li Meng and Liberty Vittert speak with relationship experts Drs. Julie and John Gottman. Listen to find out how to ensure your relationship lasts the test of time.
byHarvard Data Science Review Podcast
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
Practicing and Communicating Data Science with Jeff Stanton: Jeff Stanton joins me in this episode to discuss his book An Introduction to Data Science, and some of the unique challenges and issues faced by someone doing applied data science. A challenge to any data scientist is making sure they have a...
Podcast episode
Practicing and Communicating Data Science with Jeff Stanton: Jeff Stanton joins me in this episode to discuss his book An Introduction to Data Science, and some of the unique challenges and issues faced by someone doing applied data science. A challenge to any data scientist is making sure they have a...
byData Skeptic
0 ratings
0% found this document useful
554. Barry Saunders: AI Project Case Study: Show Notes: Barry Saunders, a digital expert at McKinsey, discusses his background in the firm and his experience in AI-related projects. He worked in the LEAP practice, which built platforms for video streaming, preventative maintenance, and...
Podcast episode
554. Barry Saunders: AI Project Case Study: Show Notes: Barry Saunders, a digital expert at McKinsey, discusses his background in the firm and his experience in AI-related projects. He worked in the LEAP practice, which built platforms for video streaming, preventative maintenance, and...
byUnleashed - How to Thrive as an Independent Professional
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Use Your Data Warehouse To Power Your Product Analytics With NetSpring: With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.
Podcast episode
Use Your Data Warehouse To Power Your Product Analytics With NetSpring: With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.
byData Engineering Podcast
0 ratings
0% found this document useful
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
Podcast episode
Quantifying The Return On Investment For Your Data Team: As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
Podcast episode
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
byThe Web Platform Podcast
0 ratings
0% found this document useful
67: Keeping Fluent with Web Technology: Summary How do you keep up with the vast amounts of web technology released daily? It can be a losing battle for some and a opportunity for others. One person in our community that comes to mind is Peter Cooper (@peterc) from Cooper Press. Join us...
Podcast episode
67: Keeping Fluent with Web Technology: Summary How do you keep up with the vast amounts of web technology released daily? It can be a losing battle for some and a opportunity for others. One person in our community that comes to mind is Peter Cooper (@peterc) from Cooper Press. Join us...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Episode 098 - Answering 'What is Programmatic DOOH?', Once And For All!
Podcast episode
Episode 098 - Answering 'What is Programmatic DOOH?', Once And For All!
byOOH Insider
0 ratings
0% found this document useful
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
Podcast episode
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
byData Engineering Podcast
0 ratings
0% found this document useful
ProcurementSoftware.site – The FREE resource for digital procurement
Podcast episode
ProcurementSoftware.site – The FREE resource for digital procurement
byThe Procurement Software Podcast
0 ratings
0% found this document useful
Blind Spots: We talk with Laura Klein author of UX for Lean Startups and Build Better Products
Podcast episode
Blind Spots: We talk with Laura Klein author of UX for Lean Startups and Build Better Products
byRocketship.fm
0 ratings
0% found this document useful
Accelerating the Shift from Enablers to Adopters of AI
Podcast episode
Accelerating the Shift from Enablers to Adopters of AI
byThoughts on the Market
0 ratings
0% found this document useful

Skip carousel

Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
Article
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Why Are Leaders Trusting Their Gut Instinct Over Analytics? AND WHAT TO DO ABOUT IT
NZBusiness and Management
Article
Why Are Leaders Trusting Their Gut Instinct Over Analytics? AND WHAT TO DO ABOUT IT
Mar 26, 2019
3 min read
2029 VISION Where Technology Is Taking Business
NZBusiness and Management
Article
2029 VISION Where Technology Is Taking Business
May 27, 2019
6 min read
How AI Algorithms Could Help Design New Drugs
Futurity
Article
How AI Algorithms Could Help Design New Drugs
Apr 6, 2017
A new kind of AI algorithm—designed to work with a small amount of data—may be able to assist in the early stages of drug development. Artificially intelligent algorithms can learn to identify amazingly subtle information, enabling them to distinguis
3 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Digital Trust Is On The Horizon
The European Business Review
Article
Digital Trust Is On The Horizon
Mar 1, 2022
11 min read
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
The European Business Review
Article
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
Feb 4, 2019
6 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
6 Artificial Intelligence Trends Reshaping the Field of Marketing
Techfastly
Article
6 Artificial Intelligence Trends Reshaping the Field of Marketing
Jun 1, 2021
4 min read
Scroll Media
NZ Marketing
Article
Scroll Media
Sep 16, 2018
You have been in the digital advertising industry since 2001, what changes have you seen and what’s your view on it today? It seems we have come a long way from faxing order forms across town and fixed weekly rates, so any automation is a good thing.
3 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Pandora’s Box? Unleashing The Power Of AI
NZ Marketing
Article
Pandora’s Box? Unleashing The Power Of AI
Jun 22, 2023
8 min read
The Future Of Cannabis Data
High Times
Article
The Future Of Cannabis Data
Jan 10, 2024
3 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
How Can AI Help Your Business?
PC Pro Magazine
Article
How Can AI Help Your Business?
Jun 8, 2023
7 min read
The Art Of AI Maturity: Five Success Factors
Rotman Management
Article
The Art Of AI Maturity: Five Success Factors
Jan 1, 2023
TODAY, MUCH OF WHAT WE TAKE FOR GRANTED in our daily lives stems from machine learning. Every time you use a wayfinding app to get from point A to point B, use dictation to convert speech to text, or unlock your phone using face ID, you’re relying on
10 min read
Adoption of Cognitive Computing Across Various Industries
Techfastly
Article
Adoption of Cognitive Computing Across Various Industries
Dec 1, 2021
5 min read
Embracing AI in Financial Services
Rotman Management
Article
Embracing AI in Financial Services
Jan 1, 2020
You are the Chief Science Officer at RBC and you also oversee its AI research institute. Describe the bank’s interest in this arena. There are many aspects to our interest in AI. First of all, financial services is a very data-driven business. From t
6 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
The Big Tech Boost
Business Today
Article
The Big Tech Boost
Jan 5, 2024
5 min read
Time To Switch On Your Events
Marketing
Article
Time To Switch On Your Events
Feb 11, 2018
4 min read
THE SLEEPING GIANT: Voice in the Enterprise
The European Business Review
Article
THE SLEEPING GIANT: Voice in the Enterprise
Oct 3, 2019
9 min read
Inside APC
APC
Article
Inside APC
Apr 20, 2020
2 min read
BUSINESS SOFTWARE SOLUTIONS from the CAN-DO PARTNER NEXT DOOR
The European Business Review
Article
BUSINESS SOFTWARE SOLUTIONS from the CAN-DO PARTNER NEXT DOOR
Dec 3, 2019
6 min read

Related categories

Skip carousel

Reviews for Practical Predictive Analytics

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Practical Predictive Analytics - Ralph Winters

Practical Predictive Analytics

Back to the future with R, Spark, and more!

Ralph Winters

BIRMINGHAM - MUMBAI

< html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd>

Practical Predictive Analytics

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: June 2017

Production reference: 1300617

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78588-618-8

www.packtpub.com

Credits

About the Author

Ralph Winters started his career as a database researcher for a music performing rights organization (he composed as well!), and then branched out into healthcare survey research, finally landing in the Analytics and Information technology world. He has provided his statistical and analytics expertise to many large fortune 500 companies in the financial, direct marketing, insurance, healthcare, and pharmaceutical industries. He has worked on many diverse types of predictive analytics projects involving customer retention, anti-money laundering, voice of the customer text mining analytics, and health care risk and customer choice models.

He is currently data architect for a healthcare services company working in the data and advanced analytics group. He enjoys working collaboratively with a smart team of business analysts, technologists, actuaries as well as with other data scientists.

Ralph considered himself a practical person. In addition to authoring Practical Predictive Analytics for Packt Publishing, he has also contributed two tutorials illustrating the use of predictive analytics in Medicine and Healthcare in Practical Predictive Analytics and Decisioning Systems for Medicine: Miner et al., Elsevier September, 2014, and also presented Practical Text Mining with SQL using Relational Databases, at the 2013 11th Annual Text and Social Analytics Summit in Cambridge, MA.

Ralph resides in New Jersey with his loving wife Katherine, amazing daughters Claire and Anna, and his four-legged friends, Bubba and Phoebe, who can be unpredictable.

Ralph's web site can be found at ralphwinters.com.

About the Reviewers

Armando Fandango serves as chief technology officer of REAL Inc., building AI-based products and platforms for making smart connections between brands, agencies, publishers, and audiences. Armando founded NeuraSights with the goal of creating insights from small and big data using neural networks and machine learning. Previously, as chief data scientist and chief technology officer (CTO) for Epic Engineering and Consulting Group LLC, Armando worked with government agencies and large private organizations to build smart products by incorporating machine learning, big data engineering, enterprise data repositories, and enterprise dashboards. Armando has led data science and engineering teams as head of data for Sonobi Inc., driving big data and predictive analytics technology and strategy for JetStream, Sonobi's AdTech platform. Armando has managed high-performance computing (HPC) consulting and infrastructure for the Advanced Research Computing Centre at UCF. Armando has also been advising high-tech startups Quantfarm, Cortxia Foundation, and Studyrite as an advisory board member and AI expert. Armando has authored a book titled Python Data Analysis - Second Edition and has published research in international journals and conferences.

Alberto Boschetti is a data scientist, with strong expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he daily faces challenges spanning among natural language processing (NLP), machine learning, and distributed processing. He is very passionate about his job and he always tries to be updated on the latest developments in data science technologies, attending meetups, conferences, and other events. He is the author of Python Data Science Essentials, Regression Analysis with Python and Large Scale Machine Learning with Python, all published by Packt.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1785886185.

If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Getting Started with Predictive Analytics

Predictive analytics are in so many industries

Predictive Analytics in marketing

Predictive Analytics in healthcare

Predictive Analytics in other industries

Skills and roles that are important in Predictive Analytics

Related job skills and terms

Predictive analytics software

Open source software

Closed source software

Peaceful coexistence

Other helpful tools

Past the basics

Data analytics/research

Data engineering

Management

Team data science

Two different ways to look at predictive analytics

CRAN

R installation

Alternate ways of exploring R

How is a predictive analytics project organized?

Setting up your project and subfolders

GUIs

Getting started with RStudio

Rearranging the layout to correspond with the examples

Brief description of some important panes

Creating a new project

The R console

The source window

Creating a new script

Our first predictive model

Code description

Saving the script

Your second script

Code description

The predict function

Examining the prediction errors

R packages

The stargazer package

Installing stargazer package

Code description

Saving your work

References

Summary

The Modeling Process

Advantages of a structured approach

Ways in which structured methodologies can help

Analytic process methodologies

CRISP-DM and SEMMA

CRISP-DM and SEMMA chart

Agile processes

Six sigma and root cause

To sample or not to sample?

Using all of the data

Comparing a sample to the population

An analytics methodology outline – specific steps

Step 1 business understanding

Communicating business goals – the feedback loop

Internal data

External data

Tools of the trade

Process understanding

Data lineage

Data dictionaries

SQL

Example – Using SQL to get sales by region

Charts and plots

Spreadsheets

Simulation

Example – simulating if a customer contact will yield a sale

Example – simulating customer service calls

Step 2 data understanding

Levels of measurement

Nominal data

Ordinal data

Interval data

Ratio data

Converting from the different levels of measurement

Dependent and independent variables

Transformed variables

Single variable analysis

Summary statistics

Bivariate analysis

Types of questions that bivariate analysis can answer

Quantitative with quantitative variables

Code example

Nominal with nominal variables

Cross-tabulations

Mosaic plots

Nominal with quantitative variables

Point biserial correlation

Step 3 data preparation

Step 4 modeling

Description of specific models

Poisson (counts)

Logistic regression

Support vector machines (SVM)

Decision trees

Random forests

Example - comparing single decision trees to a random forest

An age decision tree

An alternative decision tree

The random forest model

Random forest versus decision trees

Variable importance plots

Dimension reduction techniques

Principal components

Clustering

Time series models

Naive Bayes classifier

Text mining techniques

Step 5 evaluation

Model validation

Area under the curve

Computing an ROC curve using the titanic dataset

In sample/out of sample tests, walk forward tests

Training/test/validation datasets

Time series validation

Benchmark against best champion model

Expert opinions: man against machine

Meta-analysis

Dart board method

Step 6 deployment

Model scoring

References

Notes

Summary

Inputting and Exploring Data

Data input

Text file Input

The read.table function

Database tables

Spreadsheet files

XML and JSON data

Generating your own data

Tips for dealing with large files

Data munging and wrangling

Joining data

Using the sqldf function

Housekeeping and loading of necessary packages

Generating the data

Examining the metadata

Merging data using Inner and Outer joins

Identifying members with multiple purchases

Eliminating duplicate records

Exploring the hospital dataset

Output from the str(df) function

Output from the View function

The colnames function

The summary function

Sending the output to an HTML file

Open the file in the browser

Plotting the distributions

Visual plotting of the variables

Breaking out summaries by groups

Standardizing data

Changing a variable to another type

Appending the variables to the existing dataframe

Extracting a subset

Transposing a dataframe

Dummy variable coding

Binning – numeric and character

Binning character data

Missing values

Setting up the missing values test dataset

The various types of missing data

Missing Completely at Random (MCAR)

Testing for MCAR

Missing at Random (MAR)

Not Missing at Random (NMAR)

Correcting for missing values

Listwise deletion

Imputation methods

Imputing missing values using the 'mice' package

Running a regression with imputed values

Imputing categorical variables

Outliers

Why outliers are important

Detecting outliers

Transforming the data

Tracking down the cause of the outliers

Ways to deal with outliers

Example – setting the outliers to NA

Multivariate outliers

Data transformations

Generating the test data

The Box-Cox Transform

Variable reduction/variable importance

Principal Components Analysis (PCA)

Where is PCA used?

A PCA example – US Arrests

All subsets regression

An example – airquality

Adjusted R-square plot

Variable importance

Variable influence plot

References

Summary

Introduction to Regression Algorithms

Supervised versus unsupervised learning models

Supervised learning models

Unsupervised learning models

Regression techniques

Advantages of regression

Generalized linear models

Linear regression using GLM

Logistic regression

The odds ratio

The logistic regression coefficients

Example - using logistic regression in health care to predict pain thresholds

Reading the data

Obtaining some basic counts

Saving your data

Fitting a GLM model

Examining the residuals

Residual plots

Added variable plots

Outliers in the regression

P-values and effect size

P-values and effect sizes

Variable selection

Interactions

Goodness of fit statistics

McFadden statistic

Confidence intervals and Wald statistics

Basic regression diagnostic plots

Description of the plots

An interactive game – guessing if the residuals are random

Goodness of fit – Hosmer-Lemeshow test

Goodness of fit example on the PainGLM data

Regularization

An example – ElasticNet

Choosing a correct lamda

Printing out the possible coefficients based on Lambda

Summary

Introduction to Decision Trees, Clustering, and SVM

Decision tree algorithms

Advantages of decision trees

Disadvantages of decision trees

Basic decision tree concepts

Growing the tree

Impurity

Controlling the growth of the tree

Types of decision tree algorithms

Examining the target variable

Using formula notation in an rpart model

Interpretation of the plot

Printing a text version of the decision tree

The ctree algorithm

Pruning

Other options to render decision trees

Cluster analysis

Clustering is used in diverse industries

What is a cluster?

Types of clustering

Partitional clustering

K-means clustering

The k-means algorithm

Measuring distance between clusters

Clustering example using k-means

Cluster elbow plot

Extracting the cluster assignments

Graphically displaying the clusters

Cluster plots

Generating the cluster plot

Hierarchical clustering

Examining some examples from cluster 1

Examining some examples from cluster 2

Examining some examples from cluster 3

Support vector machines

Simple illustration of a mapping function

Analyzing consumer complains data using SVM

Converting unstructured to structured data

References

Summary

Using Survival Analysis to Predict and Analyze Customer Churn

What is survival analysis?

Time-dependent data

Censoring

Left censoring

Right censoring

Our customer satisfaction dataset

Generating the data using probability functions

Creating the churn and no churn dataframes

Creating and verifying the new simulated variables

Recombining the churner and non-churners

Creating matrix plots

Partitioning into training and test data

Setting the stage by creating survival objects

Examining survival curves

Better plots

Contrasting survival curves

Testing for the gender difference between survival curves

Testing for the educational differences between survival curves

Plotting the customer satisfaction and number of service call curves

Improving the education survival curve by adding gender

Transforming service calls to a binary variable

Testing the difference between customers who called and those who did not

Cox regression modeling

Our first model

Examining the cox regression output

Proportional hazards test

Proportional hazard plots

Obtaining the cox survival curves

Plotting the curve

Partial regression plots

Examining subset survival curves

Comparing gender differences

Comparing customer satisfaction differences

Validating the model

Computing baseline estimates

Running the predict() function

Predicting the outcome at time 6

Determining concordance

Time-based variables

Changing the data to reflect the second survey

How survSplit works

Adjusting records to simulate an intervention

Running the time-based model

Comparing the models

Variable selection

Incorporating interaction terms

Displaying the formulas sublist

Comparing AIC among the candidate models

Summary

Using Market Basket Analysis as a Recommender Engine

What is market basket analysis?

Examining the groceries transaction file

Format of the groceries transaction Files

The sample market basket

Association rule algorithms

Antecedents and descendants

Evaluating the accuracy of a rule

Support

Calculating support

Examples

Confidence

Lift

Evaluating lift

Preparing the raw data file for analysis

Reading the transaction file

capture.output function

Analyzing the input file

Analyzing the invoice dates

Plotting the dates

Scrubbing and cleaning the data

Removing unneeded character spaces

Simplifying the descriptions

Removing colors automatically

The colors() function

Cleaning up the colors

Filtering out single item transactions

Looking at the distributions

Merging the results back into the original data

Compressing descriptions using camelcase

Custom function to map to camelcase

Extracting the last word

Creating the test and training datasets

Saving the results

Loading the analytics file

Determining the consequent rules

Replacing missing values

Making the final subset

Creating the market basket transaction file

Method one – Coercing a dataframe to a transaction file

Inspecting the transaction file

Obtaining the topN purchased items

Finding the association rules

Examining the rules summary

Examining the rules quality and observing the highest support

Confidence and lift measures

Filtering a large number of rules

Generating many rules

Plotting many rules

Method two – Creating a physical transactions file

Reading the transaction file back in

Plotting the rules

Creating subsets of the rules

Text clustering

Converting to a document term matrix

Removing sparse terms

Finding frequent terms

K-means clustering of terms

Examining cluster 1

Examining cluster 2

Examining cluster 3

Examining cluster 4

Examining cluster 5

Predicting cluster assignments

Using flexclust to predict cluster assignment

Running k-means to generate the clusters

Creating the test DTM

Running the apriori algorithm on the clusters

Summarizing the metrics

References

Summary

Exploring Health Care Enrollment Data as a Time Series

Time series data

Exploring time series data

Health insurance coverage dataset

Housekeeping

Read the data in

Subsetting the columns

Description of the data

Target time series variable

Saving the data

Determining all of the subset groups

Merging the aggregate data back into the original data

Checking the time intervals

Picking out the top groups in terms of average population size

Plotting the data using lattice

Plotting the data using ggplot

Sending output to an external file

Examining the output

Detecting linear trends

Automating the regressions

Ranking the coefficients

Merging scores back into the original dataframe

Plotting the data with the trend lines

Plotting all the categories on one graph

Adding labels

Performing some automated forecasting using the ets function

Converting the dataframe to a time series object

Smoothing the data using moving averages

Simple moving average

Computing the SMA using a function

Verifying the SMA calculation

Exponential moving average

Computing the EMA using a function

Selecting a smoothing factor

Using the ets function

Forecasting using ALL AGES

Plotting the predicted and actual values

The forecast (fit) method

Plotting future values with confidence bands

Modifying the model to include a trend component

Running the ets function iteratively over all of the categories

Accuracy measures produced by onestep

Comparing the Test and Training for the UNDER 18 YEARS group

Accuracy measures

References

Summary

Introduction to Spark Using R

About Spark

Spark environments

Cluster computing

Parallel computing

SparkR

Dataframes

Building our first Spark dataframe

Simulation

Importing the sample notebook

Notebook format

Creating a new notebook

Becoming large by starting small

The Pima Indians diabetes dataset

Running the code

Running the initialization code

Extracting the Pima Indians diabetes dataset

Examining the output

Output from the str() function

Output from the summary() function

Comparing outcomes

Checking for missing values

Imputing the missing values

Checking the imputations (reader exercise)

Missing values complete!

Calculating the correlation matrices

Calculating the column means

Simulating the data

Which correlations to use?

Checking the object type

Simulating the negative cases

Concatenating the positive and negative cases into a single Spark dataframe

Running summary statistics

Saving your work

Summary

Exploring Large Datasets Using Spark

Performing some exploratory analysis on positives

Displaying the contents of a Spark dataframe

Graphing using native graph features

Running pairwise correlations directly on a Spark dataframe

Cleaning up and caching the table in memory

Some useful Spark functions to explore your data

Count and groupby

Covariance and correlation functions

Creating new columns

Constructing a cross-tab

Contrasting histograms

Plotting using ggplot

Spark SQL

Registering tables

Issuing SQL through the R interface

Using SQL to examine potential outliers

Creating some aggregates

Picking out some potential outliers using a third query

Changing to the SQL API

SQL – computing a new column using the Case statement

Evaluating outcomes based upon the Age segment

Computing mean values for all of the variables

Exporting data from Spark back into R

Running local R packages

Using the pairs function (available in the base package)

Generating a correlation plot

Some tips for using Spark

Summary

Spark Machine Learning - Regression and Cluster Models

About this chapter/what you will learn

Reading the data

Running a summary of the dataframe and saving the object

Splitting the data into train and test datasets

Generating the training datasets

Generating the test dataset

A note on parallel processing

Introducing errors into the test data set

Generating a histogram of the distribution

Generating the new test data with errors

Spark machine learning using logistic regression

Examining the output:

Regularization Models

Predicting outcomes

Plotting the results

Running predictions for the test data

Combining the training and test dataset

Exposing the three tables to SQL

Validating the regression results

Calculating goodness of fit measures

Confusion matrix

Confusion matrix for test group

Distribution of average errors by group

Plotting the data

Pseudo R-square

Root-mean-square error (RMSE)

Plotting outside of Spark

Collecting a sample of the results

Examining the distributions by outcome

Registering some additional tables

Creating some global views

User exercise

Cluster analysis

Preparing the data for analysis

Reading the data from the global views

Inputting the previously computed means and standard deviations

Joining the means and standard deviations with the training data

Joining the means and standard deviations with the test data

Normalizing the data

Displaying the output

Running the k-means model

Fitting the model to the training data

Fitting the model to the test data

Graphically display cluster assignment

Plotting via the Pairs function

Characterizing the clusters by their mean values

Calculating mean values for the test data

Summary

Spark Models – Rule-Based Learning

Loading the stop and frisk dataset

Importing the CSV file to databricks

Reading the table

Running the first cell

Reading the entire file into memory

Transforming some variables to integers

Discovering the important features

Eliminating some factors with a large number of levels

Test and train datasets

Examining the binned data

Running the OneR model

Interpreting the output

Constructing new variables

Running the prediction on the test sample

Another OneR example

The rules section

Constructing a decision tree using Rpart

First collect the sample

Decision tree using Rpart

Plot the tree

Running an alternative model in Python

Running a Python Decision Tree

Reading the Stop and Frisk table

Indexing the classification features

Mapping to an RDD

Specifying the decision tree model

Producing a larger tree

Visual trees

Comparing train and test decision trees

Summary

Preface

This is a different kind of predictive analytics book. My original intention was to introduce predictive analytics techniques targeted towards legacy analytics folks, using open source tools.

However, I soon realized that they were certain aspects of legacy analytics tools that could benefit the new generation of data scientists. Having worked a large part of my career in enterprise data solutions, I was interested in writing about some different kinds of topics, such as analytics methodologies, agile, metadata, SQL analytics, and reproducible research, which are often neglected in some data science/predictive analytics books, but still critical to the success of analytics project.

I also wanted to write about some underrepresented analytics techniques that extend beyond standard regression and classification tasks, such as using survival analysis to predict customer churn, and using market basket analysis as a recommendation engine.

Since there is a lot of movement towards cloud-based solutions, I thought it was important to include some chapters on cloud based analytics (big data), so I included several chapters on developing predictive analytics solutions within a Spark environment.

Whatever your orientation is, a key point of this book is collaboration, and I hope that regardless of your definition of data science, predictive analytics, big data, or even a benign term such as forecasting, you will find something here that suits your needs.

Furthermore, I wanted to pay homage to the domain expert as part of the data science team. Often, these analysts are not given fancy titles, but business analysts, can make the difference between a successful analytics project and one that falls flat on its face. Hopefully, some of the topics I discuss will strike a chord with them, and get them more interested in some of the technical concepts of predictive analytics.

When I was asked by Packt to write a book about predictive analytics, I first wondered what would be a good open source language to bridge the gap between legacy analytics and today's data scientist world. I thought about this considerably, since each language brings its own nuances in terms of how solutions to problems are expressed. However, I decided ultimately not to sweat the details, since predictive analytics concepts are not language-dependent, and the choice of language often is determined by personal preference as well as what is in use within the company in which you work.

I chose the R language because my background is in statistics, and I felt that R had good statistical rigor and now has reasonable integration with propriety software such as SAS, and also has good integration with relational database systems, as well as web protocols. It also has an excellent plotting and visualization system, and along with its many good user contributed packages, covers most statistical and predictive analytics tasks.

Regarding statistics, I suggest that you learn as much statistics as you can. Knowing statistics can help you separate good models from bad, and help you identify many problems in bad data just by understanding basic concepts such as measures of central tendencies (mean, median, mode), hypothesis testing, p-values, and effect sizes. It will also help you shy away from merely running a package in an automated way, and help you look a little at what is under the hood.

One downside to R is that it processes data in memory, so the software can limit the size of potentially larger datasets when used on a single PC. For the datasets we use in this book, there should be no problems running R on a single PC. If you are interested in analyzing big data, I do spend several chapters discussing R and Spark within a cloud environment, in which you can processes very large datasets that are distributed between many different computers.

Speaking of the datasets used in this book, I did not want to use the same datasets that you see analyzed repeatedly. Some of these are datasets are excellent for demonstrating techniques, but I wanted some alternatives. However, I did not see a whole lot of them that I thought would be useful for this book. Some were from unknown sources, some needed formal permission to use, some lacked a good data dictionary. So, for many chapters, I ended up generating my own data using simulation techniques in R. I believe that was a good choice, since it enabled me to introduce some data generating techniques that you can use in your own work.

The data I used covers a good spectrum of marketing, retail and healthcare applications. I also would have liked to include some financial predictive analytics use cases but ran out of time. Maybe I will leave that for another book!

What this book covers

Chapter 1, Getting Started with Predictive Analytics, begins with a little bit of history of how predictive analytics developed. We then discuss some different roles of predictive analytics practitioners, and describe the industries in which they work. Ways to organize predictive analytic projects on a PC is discussed next, the R language is introduced, and we end the chapter with a short example of a predictive model.

Chapter 2, The Modeling Process, discusses how the development of predictive models can be organized into a series of stages, each with different goals, such as exploration and problem definition, leading to the actual development of a predictive model. We discuss two important analytics methodologies, CRISP-DM and SEMMA. Code examples are sprinkled through the chapter to demonstrate some of the ideas central to the methodologies, so you will hopefully, never be bored...

Chapter 3, Inputting and Exploring Data, introduces various ways that you can bring your own input data into R. We also discuss various data preparation techniques using standard SQL functions as well as analogous methods using the R dplyr package. Have no data to input? No problem. We will show you how to generate your own human-like data using the R package wakefield.

Chapter 4, Introduction to Regression Algorithms, begins with a discussion of supervised versus unsupervised algorithms. The rest of the chapter concentrates on regression algorithms, which represent the supervised algorithm category. You will learn about interpreting regression output such as model coefficients and residual plots. There is even an interactive game that supplies an interact test to see if you can determine if a series of residuals are random or not.

Chapter 5, Introduction to Decision trees, Clustering, and SVM, concentrates on three other core predictive algorithms that have widespread use, and, along with regression, can be used to solve many, if not most, of your predictive analytics problems. The last algorithm discussed, Support Vector Machines (SVMs), are often used with high-dimensional data, such as unstructured text, so we will accompany this example with some text mining techniques using some customer complaint comments.

Chapter 6, Using Survival Analysis to Predict and Analyze Customer Churn, discusses a specific modeling technique known as survival analysis and follows a hypothetical customer marketing satisfaction and retention example. We will also delve more deeply into simulating customer choice using some sampling functions available in R.

Chapter 7, Using Market Basket Analysis as a Recommender Engine, introduces the concept of association rules and market basket analysis, and steps you through some techniques that can predict future purchases based upon various combinations of previous purchases from an online retail store. It also introduces some text analytics techniques coupled with some cluster analysis that places various customers into different segments. You will learn some additional data cleaning techniques, and learn how to generate some interesting association plots.

Chapter 8, Exploring Health Care Enrollment Data as a Time Series, introduces time series analytics. Healthcare enrollment data from the CMS website is first explored. Then we move on to defining some basic time series concepts such as simple and exponential moving averages. Finally, we work with the R forecast package which, as its name implies, helps you to perform some time series forecasting.

Chapter 9, Introduction to Spark Using R, introduces RSpark, which is an environment for accessing large Spark clusters using R. No local version of R needs to be installed. It also introduces Databricks, which is a cloud-based environment for running R (as well as Python, SQL, and other language), against Spark-based big data. This chapter also demonstrates techniques for transforming small datasets into larger Spark clusters using the Pima Indians Diabetes database as reference.

Chapter 10, Exploring Large Datasets Using Spark, shows how to perform some exploratory data analysis using a combination of RSpark and Spark SQL using the Pima Indians Diabetes data loaded into Spark. We will learn the basics of exploring Spark data using some Spark-specific commands that allow us to filter, group and summarize, and visualize our Spark data.

Chapter 11, Spark Machine Learning – Regression and Cluster Models, covers machine learning by first illustrating a logistic regression model that has been built using a Spark cluster. We will learn how to split Spark data into training and test data in Spark, run a logistic regression model, and then evaluate its performance.

Chapter 12, Spark Models - Rules-Based Learning, teaches you how to run decision tree models in Spark using the Stop and Frisk dataset. You will learn how to overcome some of the algorithmic limitations of the Spark MLlib environment by extracting some cluster samples to your local machine and then run some non-Spark algorithms that you are already familiar with. This chapter will also introduce you to a new rule-based algorithm, OneR, and will also demonstrate how you can mix different languages together in Spark, such as mixing R, SQL, and even Python code together in the same notebook using the %magic directive.

What you need for this book

This is neither an introductory predictive analytics book, not an introductory book for learning R or Spark. Some knowledge of base R data manipulation techniques is expected. Some prior knowledge of predictive analytics is useful. As mentioned earlier, knowledge of basic statistical concepts such as hypothesis testing, correlation, means, standard deviations, and p-values will also help you navigate this book.

Who this book is for

This book is for those who have already had an introduction to R, and are looking to learn how to develop enterprise predictive analytics solutions. Additionally, traditional business analysts and managers who wish to extend their skills into predictive analytics using open source R may find the book useful. Existing predictive analytic practitioners who know another language, or those who wish to learn about analytics using Spark, will also find the chapters on Spark and R beneficial.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows:

Save all output to the /PracticalPredictiveAnalytics/Outputs directory.

A block of code is set as follows:

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Any command-line, (including commands at the R console) input or output is written as follows:

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Clicking the Next button moves you to the next screen.

Warnings or important notes appear in a box like this.

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Practical-Predictive-Analytics. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/PracticalPredictiveAnalytics_ColorImages.pdf.

Errata

Although we have taken every care to ensure

Enjoying the preview?

Page 1 of 1

Practical Predictive Analytics

About this ebook

Ralph Winters

Related authors

Related to Practical Predictive Analytics

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Practical Predictive Analytics

What did you think?

Book preview

Practical Predictive Analytics - Ralph Winters

Practical Predictive Analytics

Back to the future with R, Spark, and more!

Ralph Winters

< html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd>

Practical Predictive Analytics

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

Credits

About the Author

About the Reviewers

www.PacktPub.com

Why subscribe?

Customer Feedback

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata