Ebook582 pages4 hours

Machine Learning with SAS Viya

Name: Machine Learning with SAS Viya
Author: SAS Institute Inc.
ISBN: 9781951685379

By SAS Institute Inc.

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Master machine learning with SAS Viya!

Machine learning can feel intimidating for new practitioners. Machine Learning with SAS Viya provides everything you need to know to get started with machine learning in SAS Viya, including decision trees, neural networks, and support vector machines. The analytics life cycle is covered from data preparation and discovery to deployment. Working with open-source code? Machine Learning with SAS Viya has you covered – step-by-step instructions are given on how to use SAS Model Manager tools with open source. SAS Model Studio features are highlighted to show how to carry out machine learning in SAS Viya. Demonstrations, practice tasks, and quizzes are included to help sharpen your skills.

In this book, you will learn about:

Supervised and unsupervised machine learning
Data preparation and dealing with missing and unstructured data
Model building and selection
Improving and optimizing models
Model deployment and monitoring performance

Skip carousel

Intelligence (AI) & Semantics

LanguageEnglish

PublisherSAS Institute

Release dateMay 29, 2020

ISBN9781951685379

Author

SAS Institute Inc.

SAS is the leader in analytics, from data science to AI and machine learning. Build skills to help you land some of today's most sought-after positions, such as data scientists and business analysts, with books developed and written by SAS experts.

Related to Machine Learning with SAS Viya

Related ebooks

Skip carousel

SAS Viya: The Python Perspective
Ebook
SAS Viya: The Python Perspective
byKevin D. Smith
Rating: 0 out of 5 stars
0 ratings
Insightful Data Visualization with SAS Viya
Ebook
Insightful Data Visualization with SAS Viya
byFalko Schulz
Rating: 0 out of 5 stars
0 ratings
An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data
Ebook
An Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and Gain Insight into Your Data
byTricia Aanderud
Rating: 5 out of 5 stars
5/5
End-to-End Data Science with SAS: A Hands-On Programming Guide
Ebook
End-to-End Data Science with SAS: A Hands-On Programming Guide
byJames Gearheart
Rating: 0 out of 5 stars
0 ratings
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
Ebook
PROC REPORT by Example: Techniques for Building Professional Reports Using SAS: Techniques for Building Professional Reports Using SAS
byLisa Fine
Rating: 0 out of 5 stars
0 ratings
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
Ebook
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
byKattamuri S. Sarma
Rating: 0 out of 5 stars
0 ratings
Segmentation Analytics with SAS Viya: An Approach to Clustering and Visualization
Ebook
Segmentation Analytics with SAS Viya: An Approach to Clustering and Visualization
byRandall S. Collica
Rating: 0 out of 5 stars
0 ratings
Applied Data Mining for Forecasting Using SAS
Ebook
Applied Data Mining for Forecasting Using SAS
byTim Rey
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The R Perspective
Ebook
SAS Viya: The R Perspective
byYue Qi
Rating: 0 out of 5 stars
0 ratings
PROC SQL: Beyond the Basics Using SAS, Third Edition
Ebook
PROC SQL: Beyond the Basics Using SAS, Third Edition
byKirk Paul Lafler
Rating: 0 out of 5 stars
0 ratings
Smart Data Discovery Using SAS Viya: Powerful Techniques for Deeper Insights
Ebook
Smart Data Discovery Using SAS Viya: Powerful Techniques for Deeper Insights
byFelix Liao
Rating: 0 out of 5 stars
0 ratings
SAS Programming for Enterprise Guide Users, Second Edition
Ebook
SAS Programming for Enterprise Guide Users, Second Edition
byNeil Constable
Rating: 0 out of 5 stars
0 ratings
SAS Statistics Data Analysis Certification Questions: Unofficial SAS Data analysis Certification and Interview Questions
Ebook
SAS Statistics Data Analysis Certification Questions: Unofficial SAS Data analysis Certification and Interview Questions
byEquity Press
Rating: 5 out of 5 stars
5/5
Introduction to Statistical and Machine Learning Methods for Data Science
Ebook
Introduction to Statistical and Machine Learning Methods for Data Science
byCarlos Andre Reis Pinheiro
Rating: 0 out of 5 stars
0 ratings
Practical and Efficient SAS Programming: The Insider's Guide
Ebook
Practical and Efficient SAS Programming: The Insider's Guide
byMartha Messineo
Rating: 0 out of 5 stars
0 ratings
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
Ebook
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
bySAS Institute
Rating: 1 out of 5 stars
1/5
Applying Data Science: Business Case Studies Using SAS
Ebook
Applying Data Science: Business Case Studies Using SAS
byGerhard Svolba
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Programming in SAS: A Case Studies Approach
Ebook
Fundamentals of Programming in SAS: A Case Studies Approach
byJames Blum
Rating: 0 out of 5 stars
0 ratings
Interactive Reports in SAS® Visual Analytics: Advanced Features and Customization
Ebook
Interactive Reports in SAS® Visual Analytics: Advanced Features and Customization
byNicole Ball
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Numerical Applications with SAS
Ebook
Deep Learning for Numerical Applications with SAS
byHenry Bequet
Rating: 0 out of 5 stars
0 ratings
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Ebook
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
byKim Chantala
Rating: 0 out of 5 stars
0 ratings
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
Ebook
Hands-On Azure Data Platform: Building Scalable Enterprise-Grade Relational and Non-Relational database Systems with Azure Data Services
bySagar Lad
Rating: 0 out of 5 stars
0 ratings
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
Ebook
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
byJoni N. Shreve, PhD
Rating: 0 out of 5 stars
0 ratings
SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4
Ebook
SAS Certified Specialist Prep Guide: Base Programming Using SAS 9.4
bySAS Institute
Rating: 4 out of 5 stars
4/5
The SAS Programmer's PROC REPORT Handbook: ODS Companion
Ebook
The SAS Programmer's PROC REPORT Handbook: ODS Companion
byJane Eslinger
Rating: 0 out of 5 stars
0 ratings
SAS Programming in the Pharmaceutical Industry, Second Edition
Ebook
SAS Programming in the Pharmaceutical Industry, Second Edition
byJack Shostak
Rating: 5 out of 5 stars
5/5
SAS Statistics by Example
Ebook
SAS Statistics by Example
byRon Cody
Rating: 5 out of 5 stars
5/5
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
Ebook
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
byRajinder Kr. Chitoria
Rating: 0 out of 5 stars
0 ratings
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
Ebook
SAS Administration from the Ground Up: Running the SAS9 Platform in a Metadata Server Environment
byAnja Fischer
Rating: 5 out of 5 stars
5/5
Mastering the SAS DS2 Procedure: Advanced Data-Wrangling Techniques, Second Edition
Ebook
Mastering the SAS DS2 Procedure: Advanced Data-Wrangling Techniques, Second Edition
byMark Jordan
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
Podcast episode
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
Podcast episode
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
Podcast episode
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Podcast episode
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
byData Engineering Podcast
0 ratings
0% found this document useful
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service: A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.
Podcast episode
Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service: A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow. In this episode Tevje Olin explains how the platform is implemented, the features that it provides to reduce the amount of effort required to keep your pipelines running, and how you can start using it in your own team.
byData Engineering Podcast
0 ratings
0% found this document useful
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
Podcast episode
Adding An Easy Mode For The Modern Data Stack With 5X: The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
Podcast episode
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
Podcast episode
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
byData Engineering Podcast
0 ratings
0% found this document useful
End-to-End Data Science to Drive Business Decisions at LinkedIn with Burcu Baran - TWiML Talk #256: In this episode of our Strata Data conference series, we’re joined by Burcu Baran, Senior Data Scientist at LinkedIn. At Strata, Burcu, along with a few members of her team, delivered the presentation “Using the full spectrum of data science to...
Podcast episode
End-to-End Data Science to Drive Business Decisions at LinkedIn with Burcu Baran - TWiML Talk #256: In this episode of our Strata Data conference series, we’re joined by Burcu Baran, Senior Data Scientist at LinkedIn. At Strata, Burcu, along with a few members of her team, delivered the presentation “Using the full spectrum of data science to...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16
Podcast episode
Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16
byMLOps.community
0 ratings
0% found this document useful
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
Podcast episode
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable: Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.
byData Engineering Podcast
0 ratings
0% found this document useful
Machine Learning in Performance with Gopal Brugalette: Managing the performance of complex systems requires more than simply running load tests. You need to perform a careful analysis of test results and production metrics. The sheer amount of data generated makes analysis a challenge that is often left...
Podcast episode
Machine Learning in Performance with Gopal Brugalette: Managing the performance of complex systems requires more than simply running load tests. You need to perform a careful analysis of test results and production metrics. The sheer amount of data generated makes analysis a challenge that is often left...
byTestGuild Devops Toolchain Podcast
0 ratings
0% found this document useful
Use Your Data Warehouse To Power Your Product Analytics With NetSpring: With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.
Podcast episode
Use Your Data Warehouse To Power Your Product Analytics With NetSpring: With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.
byData Engineering Podcast
0 ratings
0% found this document useful
A "SaaS" Look Ahead for 2020
Podcast episode
A "SaaS" Look Ahead for 2020
byThe Cloudcast
100%
100% found this document useful
Trustworthy Data for Machine Learning // Chad Sanderson // MLOps Meetup #93
Podcast episode
Trustworthy Data for Machine Learning // Chad Sanderson // MLOps Meetup #93
byMLOps.community
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
Podcast episode
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
Podcast episode
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
byData Engineering Podcast
0 ratings
0% found this document useful
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
Podcast episode
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
Podcast episode
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
PC Matic For Mac: Don’t Bother
MacWorld
Article
PC Matic For Mac: Don’t Bother
Feb 13, 2024
3 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Three Low-code Options
PC Pro Magazine
Article
Three Low-code Options
Nov 12, 2020
Counting Intel, Vodafone and VW among its customers, OutSystems helps businesses create cloudbased, on-premises and hybrid applications for mobile and web. Its development environment is predominantly drag-and-drop, with views for processes, data and
3 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
AI As A Service
PC Pro Magazine
Article
AI As A Service
Jul 9, 2020
2 min read
PC Matic for Mac
Macworld UK
Article
PC Matic for Mac
Jan 12, 2024
3 min read
Time To Embrace Software-as-a-service
MoneyWeek
Article
Time To Embrace Software-as-a-service
Nov 17, 2023
Has your business embraced the software-as-a-service (SaaS) revolution yet? Research suggests that 53% of businesses in the UK now rely on SaaS solutions, with 80% expected to move to this approach by 2025. For small businesses, the benefits could be
1 min read
Business NAS appliances 2022
PC Pro Magazine
Article
Business NAS appliances 2022
Apr 10, 2022
4 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Alternatives For Adobe Acrobat, Photoshop, And More
PCWorld
Article
Alternatives For Adobe Acrobat, Photoshop, And More
Oct 1, 2019
6 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
Taming Complexity With Intelligence: A Movement To Help Businesses Along The SAP S/4HANA Journey
The European Business Review
Article
Taming Complexity With Intelligence: A Movement To Help Businesses Along The SAP S/4HANA Journey
Jan 31, 2020
6 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Bitwarden vs LastPass
Maximum PC
Article
Bitwarden vs LastPass
Mar 2, 2021
4 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
The Current Frontier In Undustrial Manufacturing: BRINGING SOFTWARE SYSTEMS TO MARKET
The European Business Review
Article
The Current Frontier In Undustrial Manufacturing: BRINGING SOFTWARE SYSTEMS TO MARKET
Jan 31, 2020
6 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
“The Biggest Problem I See When People Are Working From Home Is A Poorly Designed Network”
PC Pro Magazine
Article
“The Biggest Problem I See When People Are Working From Home Is A Poorly Designed Network”
Jun 8, 2023
6 min read
Doctor
Maximum PC
Article
Doctor
Oct 11, 2022
6 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
The Network NAS appliances 2024
PC Pro Magazine
Article
The Network NAS appliances 2024
Apr 4, 2024
4 min read
Best Password Managers For Your Android Device
Android Advisor
Article
Best Password Managers For Your Android Device
Jul 5, 2023
7 min read
Network Monitoring Software
PC Pro Magazine
Article
Network Monitoring Software
Dec 10, 2020
3 min read
The Problem Solvers
APC
Article
The Problem Solvers
Oct 31, 2022
6 min read
GOING FASTER IS NOT ENOUGH Add Innovation to Outperform
The European Business Review
Article
GOING FASTER IS NOT ENOUGH Add Innovation to Outperform
Oct 2, 2023
8 min read
Dashlane
PC Pro Magazine
Article
Dashlane
Oct 5, 2023
PRICE Business, £8 per user per month from dashlane.com Dashlane boasts that more than 2.5 billion credentials have been saved on it, by customers including Wayfair and PepsiCo. You can try it yourself with the free plan, which lets you store unlimit
1 min read
The Big Tech Boost
Business Today
Article
The Big Tech Boost
Jan 5, 2024
5 min read

Related categories

Skip carousel

Reviews for Machine Learning with SAS Viya

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Machine Learning with SAS Viya - SAS Institute Inc.

Preface

What Is Machine Learning?

Machine learning is a branch of artificial intelligence (AI) that automates the building of models that learn from data, identify patterns, and predict future results—with minimal human intervention.

Machine learning is not all science fiction. Common examples in use today include self-driving cars, online recommenders such as movies that you might like on Netflix or products from Amazon, sentiment detection on Twitter, or real-time credit card fraud detection.

Statistical Modeling Versus Machine Learning

Just like statistical models, the goal of machine learning is to understand the structure of the data. In statistics, you fit theoretical distributions to the data that are well understood. So, with statistical models there is a theory behind the model that is mathematically proven, but this requires that data meets certain strong assumptions too. Machine learning has developed based on the ability to use computers to probe the data for structure without having a theory of what that structure looks like. The test for a machine learning model is a validation error on new data, not a theoretical test that proves a null hypothesis. Because machine learning often uses an iterative approach to learn from data, the learning can be easily automated. Passes are run through the data until a robust pattern is found.

Algorithms

Building representative machine learning models that generalize well on new data requires careful consideration of both the data used for the model to train and the assumptions about the various training algorithms. It is important to choose the right algorithm for both the data that you will be modeling and the business problem that you are trying to solve. For example, if you are building a model to detect tumors, then it would be important to choose a model with a high accuracy, as it would be more important not to miss any possible tumors. On the other hand, if you were looking to build a model to predict who best to send an offer to in a marketing campaign with a limited budget, you would want the model that is best at predicting rank, or the top 100 or so customers most likely to use the offer. In Chapter 2, we discuss different measures of model performance and when they should be used in more detail.

While many machine learning algorithms have been around for a long time, advances in computer power and parallel processing have allowed the ability to automatically apply complex mathematical calculations to big data faster and faster, making them a lot more useful.

Most industries working with large amounts of data recognize the value in machine learning technology to gain insights and automate decisioning. Common application areas include:

● Fraud

● Targeted Marketing

● Financial Risk

● Churn

Fraud

Fraud detection methods attempt to detect or impede illegal activity that involves financial transactions. Anomaly detection is one of the ways to detect fraud. You look to predict an event that occurs rarely and identify patterns in the data that do not conform to expected behavior, such as an abnormally high purchase made on a credit card.

Targeted Marketing

Targeted marketing is another common application area. Most companies rely on some form of direct marketing to acquire new customers and generate additional revenue from existing customers. Predictive modeling generally accomplishes this by helping companies answer crucial questions such as: Who should I contact? What should I offer? When should I make the offer? How should I make the offer?

Financial Risk

Financial risk management models attempt to predict monetary events such as credit default, loan prepayment, and insurance claim. Banks use multiple models to meet a variety of regulations (such as CCAR and Basel III). With increased scrutiny on model risk, bankers must establish a model risk management program for regulatory compliance and business benefits. Models are useful things to have around, and bankers have come to rely on them for certain applications, some of which expose the bank to significant risks. Predictive models fall into this category. Examples include loan approval using credit scoring and hedging models using swaps and options to manage the balance sheet while protecting liquidity and determining capital adequacy.

Churn

Customer churn is one of the main problems in many businesses. Churn or attrition is the turnover of customers of a product or users of a service. Studies have shown that attracting new customers is much more expensive than retaining existing ones. Consequently, companies focus on developing accurate and reliable predictive models to identify potential customers who will churn soon.

What Is SAS Viya?

SAS Viya is an open, cloud-enabled, analytic run-time environment with a number of supporting services, including SAS Cloud Analytic Services (CAS). CAS is the in-memory engine on the SAS Platform.

Run-time environment refers to the combination of hardware and software in which data management and analytics occur.

CAS is designed to run in a single-machine symmetric multiprocessing (SMP) or multi-machine massively parallel processing (MPP) configuration. CAS supports multiple platform and infrastructure configurations. CAS also has a communications layer that supports fault tolerance. When CAS is running in an MPP configuration, it can continue processing requests even if it loses connectivity to some nodes. This communication layer also enables you to remove or add nodes while the server is running.

Distributed Server: Massively Parallel Processing (MPP)

A distributed server uses multiple machines to perform massively parallel processing. The figure below depicts the server topology for a distributed server. Of the multiple machines used, one machine acts as the controller and other machines act as workers to process data.

Distributed Server: Massively Parallel Processing (MPP)

Figure 1.1 Some JMP Help Options

Client applications communicate with the controller, and the controller coordinates the processing that is performed by the worker nodes. One or more machines are designated as worker nodes. Each worker node performs data analysis on the rows of data that are in-memory on the node. The server scales horizontally. If processing times are unacceptably long due to large data volumes, more machines can be added as workers to distribute the workload. Distributed servers are fault tolerant. If communication with a worker node is lost, a surviving worker node uses a redundant copy of the data to complete the data analysis. Whenever possible, distributed servers load data into memory in parallel. This provides the fastest load times.

Single-Machine Server: Symmetric Multiprocessing (SMP)

The figure below depicts the server topology for a single-machine server. The single machine is designated as the controller. Because there are no worker nodes, the controller node performs data analysis on the rows of data that are in-memory. The single machine uses multiple CPUs and threads to speed up data analysis.

Single-Machine Server: Symmetric Multiprocessing (SMP)

Figure 1.1 Some JMP Help Options

This architecture is often referred to as symmetric multi-processing (SMP). All the in-memory analytic features of a distributed server are available to the single-machine server. Single-machine servers cannot load data into memory in parallel from any data source.

Using Cloud Analytic Services (CAS)

Leveraging the CAS server that is part of the SAS Viya release includes a whole host of tangible benefits. The main reason is represented by a simple three-word phrase: tremendous performance gains. Because processes run so much faster, you can complete your work faster. This means that you can complete more work, and even entire projects, in a significantly reduced time frame.

* Increase depends on many factors including hardware allocation. Performance could be higher.

See Appendix A.1 for information about working with CAS, CAS-supported data types, and loading data into CAS.

The Mindset Shift

There are some differences that you need to be aware off when working with SAS Viya. In SAS Viya, you might have nondeterministic results or might not get reproducible results, essentially because of two reasons:

● distributed computing environment

● nondeterministic algorithms

In distributed computing, cases are divided over compute nodes, and there could be variation in the results. You might get slightly different results even in the same server when the controllers/workers are more manageable. In different servers, this is even more expectable. A CAS server represents pooled memory and runs code multi-threaded. Multi-threading tends to distribute the same instructions to other available threads for execution, creating many different queues on many different cores using separate allocations or subsets of data. Most of the time, multiple threads perform operations on isolated collections of data that are independent of one another but part of a larger table. For that reason, it is possible to have a counter (for example, n+1;) operating on one thread to produce a result that might be different from a counter operating on another thread because each thread is working on a different subset of the data.

Therefore, results can be different from thread to thread unless and until the individual results from multiple threads are summed together. It is not as complicated as it might sound. That is because SAS Viya automatically takes care of most collation and reassembly of processing results, with a few minor exceptions where you must further specify how to combine results from multiple threads.

A nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm might behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition. A probabilistic algorithm’s behaviors depend on a random number generator. The nondeterministic algorithms are often used to find an approximation to a solution when the exact solution would be too costly to obtain using a deterministic one (Wikipedia). Some SAS Visual Data Mining and Machine Learning models are created with a nondeterministic process. This means that you might experience different displayed results when you run a model, save that model, close the model, and re-open the report or print the report later.

Deterministic and Nondeterministic Algorithms

Figure 1.1 Some JMP Help Options

Image source: By Eleschinski2000—With a paint program, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=43528132

A deterministic algorithm that performs f(n) steps always finishes in f(n) steps and always returns the same result. A nondeterministic algorithm that has f(n) levels might not return the same result on different runs. A nondeterministic algorithm might never finish due to the potentially infinite size of the fixed height tree.

It is an altogether different mindset!

You are converging on a model or estimating a model, not exactly computing the parameters of the model. Bayesian models understand this when they look for convergence of parameters. They try to converge to a distribution, not a point. Maybe it would be interesting to try running the models 10 times across different samples and ensembling them to see the dominant signal. You cannot expect the results to be reproduced because some algorithms have randomness included in the process. However, the results do converge. This is a distinguished computing environment designed for big data, and this non-reproducibility is the price that we pay.

Note: Data Science’s Reproducibility Crisis https://towardsdatascience.com/data-sciences-reproducibility-crisis-b87792d88513 is an interesting read.

SAS Visual Data Mining and Machine Learning:

A variety of products sit in SAS Viya. They enable users to perform their jobs as part of the analytics life cycle. In this book, you use SAS Visual Data Mining and Machine Learning.

The Model Studio interface is superset of SAS Visual Data Mining and Machine Learning, SAS Visual Forecasting, and SAS Visual Text Analytics.

SAS Visual Data Mining and Machine Learning is a product offering in SAS Viya that contains:

1. underlying CAS actions and SAS procedures for data mining and machine learning applications

2. GUI-based applications for different levels and types of users.

These applications are as follows:

● Programming interface: a collection of SAS procedures for direct coding or access through tasks in SAS Studio.

● Interactive modeling interface: a collection of tasks in SAS Visual Analytics for creating models in an interactive manner with automated assessment visualizations

● Automated modeling interface: a pipeline application called Model Studio that enables you to construct automated flows consisting of various nodes for preprocessing and modeling, with automated model assessment and comparison, and direct model publishing and registration.

Each of these executes the same underlying actions in the CAS execution environment. In addition, there are supplementary interfaces for preparing your data (Data Studio) and managing and deploying your models (SAS Model Manager and SAS Decision Manager) to support all phases of a machine learning application.

In this book, you primarily explore the Model Studio interface and its integration with other SAS Visual Data Mining and Machine Learning interfaces.

You use the SAS Visual Data Mining and Machine Learning web client to visually assemble, configure, build, and compare data mining models and pipelines for a wide range of analytic data mining tasks.

Chapter 1: Introduction to Machine Learning

Introduction

Supervised Learning

Unsupervised Learning

Semisupervised Learning and Reinforcement Learning

Supervised Learning Predictions

Decision Prediction

Ranking Prediction

Estimation Prediction

Model Building and Selection

Model Complexity

Introducing Model Studio

Demo 1.1: Creating a Project and Loading Data

Model Studio: Analysis Elements

Demo 1.2: Building a Pipeline from a Basic Template

Quiz

Introduction

There are two main types of machine learning methods, supervised learning and unsupervised learning.

Supervised Learning

Supervised learning (also known as predictive modeling) starts with a training data set. The observations in a training data set are known as training cases (also known as examples, instances, or records). The variables are called inputs (also known as predictors, features, explanatory variables, or independent variables) and targets (also known as responses, outcomes, or dependent variables). The learning algorithm receives a set of inputs along with the corresponding correct outputs or targets, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like classification, regression, prediction, and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data. In other words, the purpose of the training data is to generate a predictive model. The predictive model is a concise representation of the association between the inputs and the target variables.

Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.

Unsupervised Learning

Unsupervised learning is used against data that has no historical labels. In other words, the system is not told the right answer – there is no target data – the algorithm must figure out what is being shown. The goal is to explore the data and find some structure or pattern. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering, and singular value decomposition. These algorithms are also used to segment text topics, recommend items, and identify data outliers.

Semisupervised Learning and Reinforcement Learning

Other common methods include semisupervised learning and reinforcement learning. Semisupervised learning is used for similar applications as supervised learning. But it uses both labeled and unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data (because unlabeled data is less expensive and takes less effort to acquire). This type of learning can be used with methods such as classification, regression, and prediction. Semisupervised learning is useful when the cost associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person’s face on a web cam.

Reinforcement learning is often used for robotics, gaming, and navigation. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with), and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.

In this book, we will be focusing on supervised learning or predictive modeling.

Supervised Learning Predictions

The outputs of the predictive model are referred to as predictions. Predictions represent your best guess for the target given a set of input measurements. The predictions are based on the associations learned from the training data by the predictive model.

The training data are used to construct a model (rule) that relates the inputs to the target. The predictions can be categorized into three distinct types:

● decisions

● rankings

● estimates

Decision Prediction

Decision predictions are the simplest type of prediction. Decisions usually are associated with some type of action (such as classifying a case as a churn or no-churn). For this reason, decisions are also known as classifications. Decision prediction examples include handwriting recognition, fraud detection, and direct mail solicitation.

Figure 1.1: Decision Predictions

Figure 1.1 Some JMP Help Options

Decision predictions usually relate to a categorical target variable. For this reason, they are identified as primary, secondary, and tertiary in correspondence with the levels of the target.

Note: Model assessment in Model Studio generally assumes decision predictions when the target variable has a categorical measurement level (binary, nominal, or ordinal).

Ranking Prediction

Ranking predictions order cases based on the input variables’ relationships with the target variable. Using the training data, the prediction model attempts to rank high value cases higher than low value cases. It is assumed that a similar pattern exists in the scoring data so that high value cases have high scores. The actual produced scores are inconsequential. Only the relative order is important. The most common example of a ranking prediction is a credit score.

Figure 1.2: Ranking Predictions

Figure 1.1 Some JMP Help Options

Ranking predictions can be transformed into decision predictions by taking the primary decision for cases above a certain threshold while making secondary and tertiary decisions for cases below the correspondingly lower thresholds. In credit scoring, cases with a credit score above 700 can be called good risks, those with a score between 600 and 700 can be intermediate risks, and those below 600 can be considered poor risks.

Estimation Prediction

Estimation prediction uses the inputs to estimate a value for the dependent variable conditioned on some unobserved values of the independent variable. For cases with numeric targets, this can be thought of as the average value of the target for all cases having the observed input measurements. For cases with categorical targets, this number might equal the probability of a target outcome.

Figure 1.3: Estimate Prediction.

Figure 1.1 Some JMP Help Options

Prediction estimates are most commonly used when their values are integrated into a mathematical expression. For example, two-stage modeling, where the probability of an event is combined with an estimate of profit or loss to form an estimate of unconditional expected profit or loss. Prediction estimates are also useful when you are not sure of the ultimate application of the model.

Estimate predictions can be transformed into both decision and ranking predictions. When in doubt, use this option. Most Model Studio modeling tools can be configured to produce estimate predictions.

Model Building and Selection

In order to choose the best model for the business problem and data, many models are built and compared in order to choose a champion model, which can then be deployed into production. We will discuss scoring and model selection in a later chapter. But before you start building models it is important to hold back some of the data to be used to help select the best model.

Model Complexity

Selecting model complexity is a balance between bias and variance. An insufficiently complex model might not be flexible enough, which leads to underfitting. An underfit model leads to biased inferences, which means that they are not the true ones in the population; for example, in the case of a decisioning model, they could predict no when the target should be yes.

An overly complex model might be too flexible, which leads to overfitting. An overfit model includes the random noise in the sample, which can lead to models that have higher variance when applied to the population. This model would perform almost perfectly with the training data but is likely to have poor performance with the validation data.

A model with just enough flexibility gives the best generalization.

Figure 1.4: Accuracy Versus Generalizability

Figure 1.1 Some JMP Help Options

Introducing Model Studio

Model Studio enables you to explore ideas and discover insights by preparing data and building models. It is part of the discovery piece of the analytics life cycle. Model Studio is a central, web-based application that includes a suite of integrated data mining tools. The data mining tools supported in Model Studio are designed to take advantage of the SAS Viya programming and cloud processing environments to deliver and distribute analytic model data mining champion models, score code, and results.

Figure 1.1 Some JMP Help Options Demo 1.1: Creating a Project and Loading Data

In this demonstration, you will create a new project in Model Studio based on the commsdata data set. A project is a top-level container for your analytic work in Model Studio. The table is imported from a local drive. The type of project is defined. This project is used to predict churn for a fictitious telecommunications company. A target variable is selected for this table.

1. First, open SAS Drive on your machine and select SAS Viya  SAS Drive from the bookmarks bar or from the link on the page.

2. Next, log on using your user ID and password.

Note: Use caution when you enter the user ID and password because values can be case-sensitive.

3. Click Sign In.

4. Select Yes in the Assumable Groups window. The SAS Drive home page appears.

Figure 1.1 Some JMP Help Options

Note: The SAS Drive page on your computer might not have the same tiles as the image above.

5. Click the Applications menu in the upper left corner of the SAS Drive page. Select Build Models.

Figure 1.1 Some JMP Help Options

This launches Model Studio.

Note: Some of the top features in Model Studio in SAS Visual Data Mining and Machine Learning are presented in a paper titled Playing Favorites: Our Top 10 Model Studio Features in SAS® Visual Data Mining and Machine Learning at https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3236-2019.pdf.

Alternatively, click New in the upper left corner to reveal a menu to create a new item. Select Model Studio project from the menu.

Note: When this alternative process is used to go to Model Studio, it bypasses the Model Studio Projects page and immediately opens the window to create a new project as shown below in step 7 of this demonstration.

Figure 1.1 Some JMP Help Options

The Model Studio Projects page is now displayed.

Figure 1.1 Some JMP Help Options

Note: On your computer, the Projects page might differ from the image above. There might be pre-existing projects on your computer.

From the Model Studio Projects page, you can view existing projects, create new projects, access the Exchange, and access Global Metadata. Model Studio projects can be one of three types (depending on the SAS licensing for your site): Forecasting projects, Data Mining and Machine Learning projects, and Text Analytics projects.

Note: The Exchange organizes your favorite settings and enables you to collaborate with others in one place. Find a recommended node template or create your own template for a streamlined workflow for your team. The Exchange is accessed later in this chapter.

6. Select New Project in the upper right corner of the Projects page.

7. Enter Demo as the name in the New Project window. Leave the default type of Data Mining and Machine Learning. Click Browse in the Data field.

Figure 1.1 Some JMP Help Options

Note: You can specify a pipeline template at project creation. Continue with a blank template. Pipeline templates are discussed soon.

8. Import a SAS data set into CAS.

a. In the Choose Data window, click Import.

Figure 1.1 Some JMP Help Options

b. Under Import, select Local File.

Figure 1.1 Some JMP Help Options

c. Navigate to the data folder.

d. Select the commsdata.sas7bdat table. Click Open.

e. Select Import Item. Model Studio parses the data set and pre-populates the window with data set configurations.

Figure 1.1 Some JMP Help Options

Note: When the data is in memory, it is available for other projects through the Available tab.

f. Click OK after the table is imported.

Figure 1.1 Some JMP Help Options

Note: Tables are imported to the CAS server and are available to use with SAS Visual Analytics. When the import is complete, you are returned to Model Studio. For more information about data types supported in CAS and how to load

Enjoying the preview?

Page 1 of 1

Machine Learning with SAS Viya

About this ebook

SAS Institute Inc.

Read more from Sas Institute Inc.

Related authors

Related to Machine Learning with SAS Viya

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning with SAS Viya

What did you think?

Book preview

Machine Learning with SAS Viya - SAS Institute Inc.

Preface

What Is Machine Learning?

Statistical Modeling Versus Machine Learning

Algorithms

What Is SAS Viya?

Distributed Server: Massively Parallel Processing (MPP)

Single-Machine Server: Symmetric Multiprocessing (SMP)

Using Cloud Analytic Services (CAS)

The Mindset Shift

SAS Visual Data Mining and Machine Learning:

Chapter 1: Introduction to Machine Learning

Introduction

Supervised Learning

Unsupervised Learning

Semisupervised Learning and Reinforcement Learning

Supervised Learning Predictions

Decision Prediction

Ranking Prediction

Estimation Prediction

Model Building and Selection

Model Complexity

Introducing Model Studio