Tree-Based Machine Learning Methods in SAS Viya

Ebook628 pages4 hours

Tree-Based Machine Learning Methods in SAS Viya

Name: Tree-Based Machine Learning Methods in SAS Viya
Author: Sharad Saxena
ISBN: 9781954846654

By Sharad Saxena

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Discover how to build decision trees using SAS Viya!

Tree-Based Machine Learning Methods in SAS Viya covers everything from using a single tree to more advanced bagging and boosting ensemble methods. The book includes discussions of tree-structured predictive models and the methodology for growing, pruning, and assessing decision trees, forests, and gradient boosted trees. Each chapter introduces a new data concern and then walks you through tweaking the modeling approach, modifying the properties, and changing the hyperparameters, thus building an effective tree-based machine learning model. Along the way, you will gain experience making decision trees, forests, and gradient boosted trees that work for you.

By the end of this book, you will know how to:

build tree-structured models, including classification trees and regression trees.
build tree-based ensemble models, including forest and gradient boosting.
run isolation forest and Poisson and Tweedy gradient boosted regression tree models.
implement open source in SAS and SAS in open source.
use decision trees for exploratory data analysis, dimension reduction, and missing value imputation.

Skip carousel

LanguageEnglish

PublisherSAS Institute

Release dateFeb 21, 2022

ISBN9781954846654

Author

Sharad Saxena

Dr. Sharad Saxena is a Principal Analytical Training Consultant based at the SAS R&D center in Pune, India. Working in the field of statistics and analytics since 2000, he provides education consulting in the area of advanced analytics and machine learning across the globe including the UK, USA, Singapore, Italy, Australia, Netherlands, Middle East, China, Philippines, Nigeria, Hong Kong, Malaysia, Indonesia, Mexico, and India for a variety of SAS customers in banking, insurance, retail, government, health, agriculture, and telecommunications. Dr. Saxena earned a bachelor's degree in mathematics with statistics and economics minors, a master's degree in statistics, and a Ph.D. in statistics from the School of Studies in Statistics at Vikram University, India. Dr. Saxena has more than 35 publications including research papers in journals such as the Journal of Statistical Planning and Inference, Communications in Statistics–Theory and Methods, Statistica, Statistical Papers, and Vikalpa. He is also a co-author of the book, Randomness and Optimal Estimation in Data Sampling. Overall, Dr. Saxena has more than two decades of rich experience in research, teaching, training, consulting, writing, and education product design, more than 14 years of which have been with SAS and the remaining in academia as a faculty member with some top-notch institutes in India like the Institute of Management Technology, Ghaziabad; Institute of Management, Nirma University, and more.

Related authors

Skip carousel

Related to Tree-Based Machine Learning Methods in SAS Viya

Related ebooks

Skip carousel

Machine Learning with SAS Viya
Ebook
Machine Learning with SAS Viya
bySAS Institute Inc.
Rating: 0 out of 5 stars
0 ratings
Segmentation Analytics with SAS Viya: An Approach to Clustering and Visualization
Ebook
Segmentation Analytics with SAS Viya: An Approach to Clustering and Visualization
byRandall S. Collica
Rating: 0 out of 5 stars
0 ratings
SAS Visual Analytics for SAS Viya
Ebook
SAS Visual Analytics for SAS Viya
bySAS Institute Inc.
Rating: 0 out of 5 stars
0 ratings
End-to-End Data Science with SAS: A Hands-On Programming Guide
Ebook
End-to-End Data Science with SAS: A Hands-On Programming Guide
byJames Gearheart
Rating: 0 out of 5 stars
0 ratings
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
Ebook
Predictive Modeling with SAS Enterprise Miner: Practical Solutions for Business Applications, Third Edition
byKattamuri S. Sarma
Rating: 0 out of 5 stars
0 ratings
Biostatistics Using JMP: A Practical Guide
Ebook
Biostatistics Using JMP: A Practical Guide
byTrevor Bihl
Rating: 0 out of 5 stars
0 ratings
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
Ebook
SAS Certification Prep Guide: Statistical Business Analysis Using SAS9
byJoni N. Shreve, PhD
Rating: 0 out of 5 stars
0 ratings
Biostatistics by Example Using SAS Studio
Ebook
Biostatistics by Example Using SAS Studio
byRon Cody
Rating: 0 out of 5 stars
0 ratings
Introduction to Statistical and Machine Learning Methods for Data Science
Ebook
Introduction to Statistical and Machine Learning Methods for Data Science
byCarlos Andre Reis Pinheiro
Rating: 0 out of 5 stars
0 ratings
Elementary Statistics Using SAS
Ebook
Elementary Statistics Using SAS
bySandra D. Schlotzhauer
Rating: 0 out of 5 stars
0 ratings
Insightful Data Visualization with SAS Viya
Ebook
Insightful Data Visualization with SAS Viya
byFalko Schulz
Rating: 0 out of 5 stars
0 ratings
Exercises and Projects for The Little SAS Book, Sixth Edition
Ebook
Exercises and Projects for The Little SAS Book, Sixth Edition
byRebecca A. Ottesen
Rating: 0 out of 5 stars
0 ratings
Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner: A Beginner's Guide
Ebook
Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner: A Beginner's Guide
byOlivia Parr-Rud
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Computer Vision with SAS: An Introduction
Ebook
Deep Learning for Computer Vision with SAS: An Introduction
byRobert Blanchard
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Numerical Applications with SAS
Ebook
Deep Learning for Numerical Applications with SAS
byHenry Bequet
Rating: 0 out of 5 stars
0 ratings
Operations Research for Social Good: A Practitioner’s Introduction Using SAS and Python
Ebook
Operations Research for Social Good: A Practitioner’s Introduction Using SAS and Python
byNatalia Summerville
Rating: 0 out of 5 stars
0 ratings
Building Better Models with JMP Pro
Ebook
Building Better Models with JMP Pro
byJim Grayson
Rating: 0 out of 5 stars
0 ratings
Practical and Efficient SAS Programming: The Insider's Guide
Ebook
Practical and Efficient SAS Programming: The Insider's Guide
byMartha Messineo
Rating: 0 out of 5 stars
0 ratings
Smart Data Discovery Using SAS Viya: Powerful Techniques for Deeper Insights
Ebook
Smart Data Discovery Using SAS Viya: Powerful Techniques for Deeper Insights
byFelix Liao
Rating: 0 out of 5 stars
0 ratings
Pharmaceutical Quality by Design Using JMP: Solving Product Development and Manufacturing Problems
Ebook
Pharmaceutical Quality by Design Using JMP: Solving Product Development and Manufacturing Problems
byRob Lievense
Rating: 5 out of 5 stars
5/5
Intelligence at the Edge: Using SAS with the Internet of Things
Ebook
Intelligence at the Edge: Using SAS with the Internet of Things
byCSPtrade2
Rating: 0 out of 5 stars
0 ratings
Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS
Ebook
Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS
byMatthew Windham
Rating: 0 out of 5 stars
0 ratings
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Ebook
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
byKim Chantala
Rating: 0 out of 5 stars
0 ratings
SAS Programming for Enterprise Guide Users, Second Edition
Ebook
SAS Programming for Enterprise Guide Users, Second Edition
byNeil Constable
Rating: 0 out of 5 stars
0 ratings
Preparing Data for Analysis with JMP
Ebook
Preparing Data for Analysis with JMP
byRobert Carver
Rating: 0 out of 5 stars
0 ratings
Categorical Data Analysis Using SAS, Third Edition
Ebook
Categorical Data Analysis Using SAS, Third Edition
byMaura E. Stokes
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Predictive Analytics with JMP, Third Edition
Ebook
Fundamentals of Predictive Analytics with JMP, Third Edition
byRon Klimberg
Rating: 0 out of 5 stars
0 ratings
Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS
Ebook
Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS
byDr. Goutam Chakraborty
Rating: 0 out of 5 stars
0 ratings
SAS Text Analytics for Business Applications: Concept Rules for Information Extraction Models
Ebook
SAS Text Analytics for Business Applications: Concept Rules for Information Extraction Models
byTeresa Jade
Rating: 0 out of 5 stars
0 ratings
Expert Cube Development with SSAS Multidimensional Models
Ebook
Expert Cube Development with SSAS Multidimensional Models
byMarco Russo
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C. Lennox
Rating: 4 out of 5 stars
4/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
Ebook
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
byJerry Kaplan
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
Podcast episode
Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary: Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
Podcast episode
Data Sharing Across Business And Platform Boundaries: Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
Podcast episode
Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer: Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.
byData Engineering Podcast
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
BEST-OF-BRAD: Using Top Tier Solutions to Build Hybrid Cloud Ecosystems with Brad Feakes: Today on What the Duck?!, we’re ducking around with Brad Feakes, an expert in operations, supply chain management, and information technology. Brad sits down with Host, Sarah Scudder, to discuss the use of best-of-breed solutions to build hybrid Cloud ecosystems that support ERP customer needs. Brad shares his personal and professional journey, including his education, career choices, and his experience working with ERP systems in manufacturing companies. They also touch upon Brad's role as a business analyst and his involvement in implementing Epicor as company-wide ERP system and his current role at EstesGroup.
Podcast episode
BEST-OF-BRAD: Using Top Tier Solutions to Build Hybrid Cloud Ecosystems with Brad Feakes: Today on What the Duck?!, we’re ducking around with Brad Feakes, an expert in operations, supply chain management, and information technology. Brad sits down with Host, Sarah Scudder, to discuss the use of best-of-breed solutions to build hybrid Cloud ecosystems that support ERP customer needs. Brad shares his personal and professional journey, including his education, career choices, and his experience working with ERP systems in manufacturing companies. They also touch upon Brad's role as a business analyst and his involvement in implementing Epicor as company-wide ERP system and his current role at EstesGroup.
byWhat the Duck - Another Supply Chain Podcast
0 ratings
0% found this document useful
What Happened to Data Marts?
Podcast episode
What Happened to Data Marts?
byInsights Tomorrow
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
Podcast episode
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
Podcast episode
Reconciling The Data In Your Databases With Datafold: A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
Podcast episode
Build Better Tests For Your dbt Projects With Datafold And data-diff: Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
byData Engineering Podcast
0 ratings
0% found this document useful
10: Test Case Design using Given-When-Then from BDD: It doesn’t matter if you are using pytest, unittest, nose, or something completely different, this episode will help you write better tests.
Podcast episode
10: Test Case Design using Given-When-Then from BDD: It doesn’t matter if you are using pytest, unittest, nose, or something completely different, this episode will help you write better tests.
byTest and Code
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Database Monitoring & Observability
Podcast episode
Database Monitoring & Observability
byThe Cloudcast
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Unlocking Your dbt Projects With Practical Advice For Practitioners: The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.
Podcast episode
Unlocking Your dbt Projects With Practical Advice For Practitioners: The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.
byData Engineering Podcast
0 ratings
0% found this document useful
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
Podcast episode
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
Podcast episode
How to Build a Website — The Show For Beginners: In this episode of Syntax, Scott and Wes talk about the basics of building a website — how to get started for beginners! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at and put SYNTAX in the “How did you hear about us?”...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
Podcast episode
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
byData Engineering Podcast
0 ratings
0% found this document useful
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
Podcast episode
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
Podcast episode
Building Applications With Data As Code On The DataOS: The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics lifecycle through code, while providing automatic knowledge graphs and data discovery. In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
Podcast episode
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
byData Engineering Podcast
0 ratings
0% found this document useful
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
Podcast episode
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine: Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Trustworthy Data for Machine Learning // Chad Sanderson // MLOps Meetup #93
Podcast episode
Trustworthy Data for Machine Learning // Chad Sanderson // MLOps Meetup #93
byMLOps.community
0 ratings
0% found this document useful
[Bite] Documenting Data Science Projects
Podcast episode
[Bite] Documenting Data Science Projects
byDataCafé
0 ratings
0% found this document useful
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
Podcast episode
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

The Network NAS appliances 2024
PC Pro Magazine
Article
The Network NAS appliances 2024
Apr 4, 2024
4 min read
Business NAS appliances 2022
PC Pro Magazine
Article
Business NAS appliances 2022
Apr 10, 2022
4 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Deep Into Storage Space
APC
Article
Deep Into Storage Space
Oct 7, 2019
8 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Deep Into Storage Space
Maximum PC
Article
Deep Into Storage Space
Jun 25, 2019
8 min read
Bitwarden vs LastPass
Maximum PC
Article
Bitwarden vs LastPass
Mar 2, 2021
4 min read
Best Password Managers For Your Android Device
Android Advisor
Article
Best Password Managers For Your Android Device
Jul 5, 2023
7 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
Business NAS appliances 2021
PC Pro Magazine
Article
Business NAS appliances 2021
May 13, 2021
4 min read
RAID Drives
MacLife
Article
RAID Drives
May 11, 2017
3 min read
Mac 911
MacWorld
Article
Mac 911
Sep 18, 2018
5 min read
Business NAS appliances 2023
PC Pro Magazine
Article
Business NAS appliances 2023
Apr 6, 2023
4 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Master Linux VM creation in Azure
Linux Format
Article
Master Linux VM creation in Azure
May 30, 2023
12 min read
Bitwarden vs LastPass
APC
Article
Bitwarden vs LastPass
Mar 22, 2021
4 min read
Mac 911
MacWorld
Article
Mac 911
Apr 20, 2021
7 min read
Hybrid Backup For Business
PC Pro Magazine
Article
Hybrid Backup For Business
Apr 8, 2021
4 min read
One Tree To Rule Them All
Family Tree
Article
One Tree To Rule Them All
Apr 19, 2022
7 min read
Dashlane
PC Pro Magazine
Article
Dashlane
Oct 5, 2023
PRICE Business, £8 per user per month from dashlane.com Dashlane boasts that more than 2.5 billion credentials have been saved on it, by customers including Wayfair and PepsiCo. You can try it yourself with the free plan, which lets you store unlimit
1 min read
Retrospect Backup 18.5
PC Pro Magazine
Article
Retrospect Backup 18.5
Oct 8, 2022
2 min read
BUYER'S GUIDE TO Cloud File Sharing In 2021
PC Pro Magazine
Article
BUYER'S GUIDE TO Cloud File Sharing In 2021
Jan 7, 2021
4 min read
KeePassXC: The Friendlier Free Offline Password Manager
PCWorld
Article
KeePassXC: The Friendlier Free Offline Password Manager
Sep 5, 2023
7 min read
Drill Down Deeper
MacLife
Article
Drill Down Deeper
Aug 16, 2022
2 min read
PC Matic For Mac: Don’t Bother
MacWorld
Article
PC Matic For Mac: Don’t Bother
Feb 13, 2024
3 min read
Cloud & Hybrid Backup 2022
PC Pro Magazine
Article
Cloud & Hybrid Backup 2022
Aug 7, 2022
4 min read
Alternatives For Adobe Acrobat, Photoshop, And More
PCWorld
Article
Alternatives For Adobe Acrobat, Photoshop, And More
Oct 1, 2019
6 min read
Extract Maximum Detail
Digital Photographer
Article
Extract Maximum Detail
Sep 5, 2023
3 min read
The Great Software Sweep-up
Music Tech Magazine
Article
The Great Software Sweep-up
Oct 15, 2020
5 min read
Doctor
Maximum PC
Article
Doctor
Dec 6, 2022
6 min read

Related categories

Skip carousel

Reviews for Tree-Based Machine Learning Methods in SAS Viya

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Tree-Based Machine Learning Methods in SAS Viya - Sharad Saxena

Chapter 1: Introduction to Tree-Structured Models

Introduction

Sometimes you make the right decision, sometimes you make the decision right.

–Phil McGraw

A decision tree has many analogies in real life. In decision analysis, a tree can be used to represent decisions and decision making visually and explicitly. As the name suggests, it uses a tree-like model of decisions.

The adjective decision in decision trees is a curious one, and misleading. In the 1960s, originators of the tree approach described the splitting rules as decision rules. The terminology remains popular. This is ill-fated because it inhibits the use of ideas and terminology from decision theory. The term decision tree is used in decision theory to depict a series of decisions for choosing alternative activities. You create the tree and specify probabilities and benefits of outcomes of the activities. Software, including SAS, finds the most beneficial path. The project follows a single path and never performs the unchosen activities. The decider follows a path based on a set of criteria.

Decision theory is not about data analysis. The choice of a decision might be made without reference to data. The trees in this book are only about data analysis. A tree is fit to a data set to enable interpretation and prediction of data. An apt name would be data-splitting trees that would be used for supervised learning also called predictive modeling.

In supervised learning, a set of input variables (predictors) is used to predict the value of one or more target variables (outcome). The mapping of the inputs to the target is a predictive model. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the input variables. The data used to estimate a predictive model is a set of cases (observations, examples) consisting of values of the inputs and target. The fitted model is typically applied to new cases where the target is unknown.

Decision Tree – What Is It?

There are several tree-structured models that include one or more decision trees. Decision trees are a fundamental machine learning technique that every data scientist should know. Luckily, the construction and implementation of decision trees in SAS Viya is straightforward and easy to produce.

A decision tree represents a grouping of the data that is created by applying a series of simple rules. Each rule assigns an observation to a group based on the value of one input. One rule is applied after another, resulting in a hierarchy of groups within groups. The hierarchy is called a tree, and each group is called a node. The original group contains the entire data set and is called the root node of the tree. A node with all its successors forms a branch of the node that created it. The final nodes are called leaves. For each leaf, a decision is made and applied to all observations in the leaf. The type of decision depends on the context. In supervised learning, the decision is the predicted value.

You use the decision tree to do one of the following tasks:

classify observations based on the values of nominal, binary, or ordinal targets

predict outcomes for interval targets

predict the appropriate decision when you specify decision alternatives

The tree depicts the first split into groups as branches emanating from a root and subsequent splits as branches emanating from nodes on older branches. Figure 1.1 is an example decision tree predicting a nominal target Cause of Death using two binary inputs Weight Status and Smoking Status. The decision nodes include a bar chart related to the node’s sample target values and other details. The leaves of the tree are the final groups, the unsplit nodes. For some perverse reason, trees are always drawn upside down, like an organizational chart. For a tree to be useful, the data in a leaf must be similar with respect to some target measure so that the tree represents the segregation of a mixture of data into purified groups.

Types of Decision Trees

Decision trees are a nonparametric supervised learning method used for both classification and regression tasks. A classification tree models a categorical response, and a regression tree models a continuous response. See Figure 1.2. Both types of trees are called decision trees because the model is expressed as a series of if-then statements. For each type of tree, you specify a response variable (also called a target variable), whose values you want to predict, and one or more input variables (called predictor variables), whose values are used to predict the values of the target variable.

Figure 1.1: A Simple Decision Tree

Figure 1.2: Classification and Regression Trees

The predictor variables for tree models can be categorical or continuous. The set of all combinations of the predictor variables are called the predictor space. The model is based on partitioning the predictor space into nonoverlapping groups, which correspond to the leaves of the tree. Partitioning is done repeatedly, starting with the root node, which contains all the data, and continuing until a stopping criterion is met. At each step, the parent node is split into child nodes by selecting a predictor variable and a split value for that variable that minimize the variability according to a specified measure (or the default measure) in the response variable across the child nodes. Various measures, such as the Gini index, entropy, and residual sum of squares, can be used to assess candidate splits for each node. The selected predictor variable and its split value are called the primary splitting rule.

Tree-structured models are built from training data for which the response values are known, and these models are subsequently used to score (classify or predict) response values for new data. For classification trees, the most frequent response level of the training observations in a leaf is used to classify observations in that leaf. For regression trees, the average response of the training observations in a leaf is used to predict the response for observations in that leaf. The splitting rules that define the leaves provide the information that is needed to score new data; these rules consist of the primary splitting rules, surrogate rules, and default rules for each node.

The process of building a decision tree begins with growing a large, full tree. The full tree can overfit the training data, resulting in a model that does not adequately generalize to new data. To prevent overfitting, the full tree is often pruned back to a smaller subtree that balances the goals of fitting training data and predicting new data. Two commonly applied approaches for finding the best subtree are cost-complexity pruning and C4.5 pruning.

Compared with other regression and classification methods, tree-structured models have the advantage that they are easy to interpret and visualize, especially when the tree is small. Tree-based methods scale well to large data, and they offer various methods of handling missing values, including surrogate splits.

However, tree-structured models have limitations. Regression tree models fit response surfaces that are constant over rectangular regions of the predictor space, so they often lack the flexibility needed to capture smooth relationships between the predictor variables and the response. Another limitation of tree models is that slight changes in the data can lead to quite different splits, and this undermines the interpretability of the model.

Tree-Based Models in SAS Viya

SAS Viya is a cloud-enabled, analytic run-time environment with several supporting services, including SAS Cloud Analytic Services (CAS). CAS is the in-memory engine on the SAS Viya Platform.

SAS Viya builds tree-based statistical models for classification and regression. You can build three tree-based models in SAS Viya starting from a single tree to more complex ensembles of trees like forest and gradient boosting.

A random forest is just what the name implies. It is a bunch of decision trees – each with a randomly selected subset of the data – all combined into one result. Using a random forest helps address the problem of overfitting inherent to an individual decision tree.

Gradient boosting creates an ensemble model of weak decision trees in a stage-wise, iterative, sequential manner. Gradient boosting algorithms convert weak learners to strong learners. One advantage of gradient boosting is that it can reduce bias and variance in supervised learning.

Analytics Platform from SAS

The SAS Analytics Platform is a software foundation that is engineered to address today’s business challenges and to generate insights from your data in any computing environment. SAS Viya is the latest extension of the SAS Analytics Platform, which is designed to orchestrate your entire analytic ecosystem, connecting and accelerating all analytics life cycle – from data, to discovery, to deployment. SAS Viya seamlessly scales to data of any size, type, speed, and complexity, and is interoperable with SAS 9. As an integrated part of the SAS Analytics Platform, SAS Viya is a cloud-enabled, in-memory analytics engine.

The SAS Viya Platform architecture is illustrated in Figure 1.3. At the heart of SAS Viya is SAS Cloud Analytic Services (CAS), an in-memory, distributed analytics engine. It uses scalable, high-performance, multi-threaded algorithms to rapidly perform analytical processing on in-memory data of any size.

SAS Viya contains microservices. A microservice is a small service that runs in its own process and communicates with a lightweight mechanism (hypertext transfer protocol (HTTP)). Microservices are a series of containers that define all the different analytic life cycle functions, sometimes described as actions that fit together in a modular way. The in-memory engine is independent from the microservices and allows for independent scalability.

Figure 1.3: SAS Viya Platform Architecture

On the left of Figure 1.3 you see a series of source-based data engines.

SAS Viya has a middle tier implemented on a micro-services architecture, deployed and orchestrated through the industry standard cloud Platform as a Service also known as Cloud Foundry. Through Cloud Foundry, SAS Viya can be deployed, managed, monitored, scaled, and updated. Cloud Foundry enables SAS Viya to support multiple cloud infrastructure allowing customers to deploy SAS in a hybrid cloud environment spanning multiple clouds including the combination of on-premises cloud infrastructure and public cloud infrastructure.

You can choose to use other platforms like Docker and the open container initiative. You can operate on private infrastructure such as OpenStack or VMware, or open infrastructure such as Amazon Web Services, Azure, and so on.

Existing SAS solutions and new ones are being built on SAS Viya. In addition, you can use REST API to include SAS Viya actions in your existing applications. A REST API is an application programming interface that conforms to the constraints of representational state transfer (REST) architectural style and allows for interaction with RESTful web services.

SAS Visual Data Mining and Machine Learning

SAS Visual Data Mining and Machine Learning is a product offering in SAS Viya that contains the underlying CAS actions and SAS procedures for data mining and machine learning applications, and graphical user interface (GUI)-based applications for various levels and types of users.

These applications are as follows:

Programming interface: a collection of CAS action sets and SAS procedures for direct coding or access through tasks in SAS Studio.

Interactive modeling interface: a collection of objects in SAS Visual Analytics for creating models in an interactive manner with automated assessment visualizations.

Automated modeling interface: a pipeline application called Model Studio that enables you to construct automated flows consisting of various nodes for preprocessing and modeling with automated model assessment and comparison and direct model publishing and registration.

Each of these executes the same underlying actions in the CAS execution environment.

You can use the SAS Visual Data Mining and Machine Learning web client to assemble, configure, build, and compare tree-based models visually and programmatically.

SAS Viya provides two programming run-time servers for processing data that is not performed by the CAS server. Which server is used is determined by your SAS environment. When your SAS environment includes the SAS Viya visual and programming environments, your SAS administrator determines the server. The SAS Workspace Server and the SAS Compute Server support the same SAS code and produce the same results.

There are several interfaces and ways of executing analyses in SAS Viya. This includes the CAS actions, SAS procedures, and visual applications shown in Figure 1.4.

The Decision Tree Action Set

Decision Tree Action Set (Table 1.1) provides actions for modeling and scoring with tree-based models that include decision trees, forests, and gradient boosting.

Figure 1.4: Interfaces and Ways of Executing Analyses in SAS Viya

SAS Viya also supports new analytic methods that can be accessed from SAS and other programming languages that include R, Python, Lua, and Java, as well as public REST APIs.

TREESPLIT, FOREST, and GRADBOOST Procedures

The TREESPLIT procedure builds tree-based statistical models for classification and regression in SAS Viya. The procedure produces a classification tree, which models a categorical response, or a regression tree, which models a continuous response. For each type of tree, you specify a target variable whose values you want PROC TREESPLIT to predict and one or more input variables whose values the procedure uses to predict the values of the target variable.

The following statements and options are available in the TREESPLIT procedure:

PROC TREESPLIT ;

AUTOTUNE ;

CLASS variables;

CODE ;

FREQ variable;

GROW criterion ;

MODEL response = variable. . .;

OUTPUT OUT=CAS-libref.data-table output-options;

PARTITION ;

PRUNE prune-method <(prune-options)>;

VIICODE ;

WEIGHT variable;

The PROC TREESPLIT statement and the MODEL statement are required.

The FOREST procedure creates a predictive model called a forest (which consists of several decision trees) in SAS Viya. The FOREST procedure creates an ensemble of decision trees to predict a single target of either interval or nominal measurement level. An input variable can have an interval or nominal measurement level.

The following statements are available in the FOREST procedure:

PROC FOREST ;

AUTOTUNE ;

CODE ;

CROSSVALIDATION ;

GROW criterion;

ID variables;

INPUT variables ;

OUTPUT OUT=CAS-libref.data-table ;

PARTITION partition-option;

SAVESTATE RSTORE=CAS-libref.data-table;

TARGET variable ;

VIICODE ;

WEIGHT variable;

The PROC FOREST, INPUT, and TARGET statements are required. The INPUT statement can appear multiple times.

The GRADBOOST procedure creates a predictive model called a gradient boosting model in SAS Viya. Based on the boosting method in Hastie, Tibshirani, and Friedman (2001) and Friedman (2001), the GRADBOOST procedure creates a predictive model by fitting a set of additive trees.

The following statements are available in the GRADBOOST procedure:

PROC GRADBOOST ;

AUTOTUNE ;

CODE ;

CROSSVALIDATION ;

ID variables;

INPUT variables ;

OUTPUT OUT=CAS-libref.data-table ;

PARTITION partition-option;

SAVESTATE RSTORE=CAS-libref.data-table;

TARGET variable ;

TRANSFERLEARN variable ;

VIICODE ;

WEIGHT variable;

The PROC GRADBOOST, INPUT, and TARGET statements are required. The INPUT statement can appear multiple times.

Decision Tree, Forest, and Gradient Boosting Tasks and Objects

Shown in Figure 1.5 are SAS Studio tasks (left) and SAS Visual Analytics objects (right) relevant to tree-based models.

Figure 1.5: SAS Studio Tasks and SAS Visual Analytics Objects

SAS Studio is more than just an editor. It is familiar to SAS programmers who just want to write code – no point and click required to start writing in SAS. If you are not familiar with SAS code, SAS Studio includes visual point-and-click tasks that generate code so that you do not have to code. SAS Studio comes with code snippet libraries for frequently used operations, as well as interactive assistance for defining code that works.

SAS Viya enables you to develop, deploy, and manage enterprise-class analytical assets throughout the analytics life cycle (data, discovery, and deployment) with a single platform with the underlying engine called CAS.

SAS Viya delivers a single, consolidated, and centralized analytics environment. Customers no longer need to stitch together different analytic code bases.

It natively supports programming in SAS and access to SAS from other languages such as R, Python, Java, and Lua. This means that data scientists and coders not familiar with SAS can use SAS Viya, but they do not need to learn SAS code.

It supports access to SAS from third-party applications with public REST APIs, so developers can easily include SAS Analytics in their

Enjoying the preview?

Page 1 of 1

Tree-Based Machine Learning Methods in SAS Viya

About this ebook

Sharad Saxena

Related authors

Related to Tree-Based Machine Learning Methods in SAS Viya

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Tree-Based Machine Learning Methods in SAS Viya

What did you think?

Book preview

Tree-Based Machine Learning Methods in SAS Viya - Sharad Saxena

Introduction

Sometimes you make the right decision, sometimes you make the decision right.

Decision Tree – What Is It?

Types of Decision Trees

Figure 1.1: A Simple Decision Tree

Figure 1.2: Classification and Regression Trees

Tree-Based Models in SAS Viya

Analytics Platform from SAS

Figure 1.3: SAS Viya Platform Architecture

SAS Visual Data Mining and Machine Learning

The Decision Tree Action Set

TREESPLIT, FOREST, and GRADBOOST Procedures

Decision Tree, Forest, and Gradient Boosting Tasks and Objects

Figure 1.5: SAS Studio Tasks and SAS Visual Analytics Objects