Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Ebook411 pages2 hours

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

Name: Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data
Author: Dejan Sarka
ISBN: 9781484271735

By Dejan Sarka

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Learn about business intelligence (BI) features in T-SQL and how they can help you with data science and analytics efforts without the need to bring in other languages such as R and Python. This book shows you how to compute statistical measures using your existing skills in T-SQL. You will learn how to calculate descriptive statistics, including centers, spreads, skewness, and kurtosis of distributions. You will also learn to find associations between pairs of variables, including calculating linear regression formulas and confidence levels with definite integration.
No analysis is good without data quality. Advanced Analytics with Transact-SQL introduces data quality issues and shows you how to check for completeness and accuracy, and measure improvements in data quality over time. The book also explains how to optimize queries involving temporal data, such as when you search for overlapping intervals. More advanced time-oriented information in the book includes hazard and survival analysis. Forecasting with exponential moving averages and autoregression is covered as well.

Every web/retail shop wants to know the products customers tend to buy together. Trying to predict the target discrete or continuous variable with few input variables is important for practically every type of business. This book helps you understand data science and the advanced algorithms use to analyze data, and terms such as data mining, machine learning, and text mining.

Key to many of the solutions in this book are T-SQL window functions. Author Dejan Sarka demonstrates efficient statistical queries that are based on window functions and optimized through algorithms built using mathematical knowledge and creativity. The formulas and usage of those statistical procedures are explained so you can understand and modify the techniques presented.
T-SQL is supported in SQL Server,Azure SQL Database, and in Azure Synapse Analytics. There are so many BI features in T-SQL that it might become your primary analytic database language. If you want to learn how to get information from your data with the T-SQL language that you already are familiar with, then this is the book for you.

What You Will Learn

Describe distribution of variables with statistical measures
Find associations between pairs of variables
Evaluate the quality of the data you are analyzing
Perform time-series analysis on your data
Forecast values of a continuous variable
Perform market-basket analysis to predict customer purchasing patterns
Predict target variable outcomes from one or more input variables
Categorize passages of text by extracting and analyzing keywords

Who This Book Is For
Database developers and database administrators who want to translate their T-SQL skills into the world of business intelligence (BI) and data science. For readers who want to analyze large amounts of data efficiently by using their existing knowledge of T-SQL and Microsoft’s various database platforms such as SQL Server and Azure SQL Database. Also for readers who want to improve their querying by learning new and original optimization techniques.

Skip carousel

LanguageEnglish

PublisherApress

Release dateJul 16, 2021

ISBN9781484271735

Author

Dejan Sarka

Related to Advanced Analytics with Transact-SQL

Related ebooks

Skip carousel

SQL Interview Questions: A complete question bank to crack your ANN SQL interview with real-time examples
Ebook
SQL Interview Questions: A complete question bank to crack your ANN SQL interview with real-time examples
byPrasad Kulkarni
Rating: 0 out of 5 stars
0 ratings
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics
Ebook
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics
byGayathri Rajagopalan
Rating: 0 out of 5 stars
0 ratings
Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server
Ebook
Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server
byEdward Pollack
Rating: 0 out of 5 stars
0 ratings
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
Ebook
SQL 101 Crash Course: Comprehensive Guide to SQL Fundamentals and Practical Applications
byEmrys Callahan
Rating: 5 out of 5 stars
5/5
Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
Ebook
Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn
byTshepo Chris Nokeri
Rating: 0 out of 5 stars
0 ratings
Querying SQL Server: Run T-SQL operations, data extraction, data manipulation, and custom queries to deliver simplified analytics (English Edition)
Ebook
Querying SQL Server: Run T-SQL operations, data extraction, data manipulation, and custom queries to deliver simplified analytics (English Edition)
byAdam Aspin
Rating: 0 out of 5 stars
0 ratings
Joe Celko's Trees and Hierarchies in SQL for Smarties
Ebook
Joe Celko's Trees and Hierarchies in SQL for Smarties
byJoe Celko
Rating: 0 out of 5 stars
0 ratings
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Developing Analytic Talent: Becoming a Data Scientist
Ebook
Developing Analytic Talent: Becoming a Data Scientist
byVincent Granville
Rating: 3 out of 5 stars
3/5
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Ebook
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
byKim Chantala
Rating: 0 out of 5 stars
0 ratings
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
Ebook
Pandas 1.x Cookbook - Second Edition: Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python, 2nd Edition
byMatt Harrison
Rating: 5 out of 5 stars
5/5
Simple Data Science (R)
Ebook
Simple Data Science (R)
byNarayana Nemani
Rating: 5 out of 5 stars
5/5
Statistics with Rust: 50+ Statistical Techniques Put into Action
Ebook
Statistics with Rust: 50+ Statistical Techniques Put into Action
byKeiko Nakamura
Rating: 0 out of 5 stars
0 ratings
Beginning DAX with Power BI: The SQL Pro’s Guide to Better Business Intelligence
Ebook
Beginning DAX with Power BI: The SQL Pro’s Guide to Better Business Intelligence
byPhilip Seamark
Rating: 0 out of 5 stars
0 ratings
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
Ebook
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
byRajinder Kr. Chitoria
Rating: 0 out of 5 stars
0 ratings
Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries
Ebook
Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries
byKathi Kellenberger
Rating: 0 out of 5 stars
0 ratings
Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing
Ebook
Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing
byRyan Wade
Rating: 0 out of 5 stars
0 ratings
Pro DAX with Power BI: Business Intelligence with PowerPivot and SQL Server Analysis Services Tabular
Ebook
Pro DAX with Power BI: Business Intelligence with PowerPivot and SQL Server Analysis Services Tabular
byPhilip Seamark
Rating: 0 out of 5 stars
0 ratings
Beginning Azure Synapse Analytics: Transition from Data Warehouse to Data Lakehouse
Ebook
Beginning Azure Synapse Analytics: Transition from Data Warehouse to Data Lakehouse
byBhadresh Shiyal
Rating: 0 out of 5 stars
0 ratings
Power Query for Power BI and Excel
Ebook
Power Query for Power BI and Excel
byChristopher Webb
Rating: 0 out of 5 stars
0 ratings
Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering
Ebook
Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering
byHarsh Chawla
Rating: 0 out of 5 stars
0 ratings
SQL A Complete Guide - 2021 Edition
Ebook
SQL A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
R in Action, Third Edition: Data analysis and graphics with R and Tidyverse
Ebook
R in Action, Third Edition: Data analysis and graphics with R and Tidyverse
byRobert I. Kabacoff
Rating: 0 out of 5 stars
0 ratings
Joe Celko's Analytics and OLAP in SQL
Ebook
Joe Celko's Analytics and OLAP in SQL
byJoe Celko
Rating: 4 out of 5 stars
4/5
Visual Studio Code for Python Programmers
Ebook
Visual Studio Code for Python Programmers
byApril Speight
Rating: 0 out of 5 stars
0 ratings
Cody's Data Cleaning Techniques Using SAS, Third Edition
Ebook
Cody's Data Cleaning Techniques Using SAS, Third Edition
byRon Cody
Rating: 5 out of 5 stars
5/5
Joe Celko's SQL Programming Style
Ebook
Joe Celko's SQL Programming Style
byJoe Celko
Rating: 4 out of 5 stars
4/5
Machine Learning For Absolute Beginners A Step by Step guide Algorithms For Supervised and Unsupervised Learning With Real World Applications
Ebook
Machine Learning For Absolute Beginners A Step by Step guide Algorithms For Supervised and Unsupervised Learning With Real World Applications
byRaymond Kazuya
Rating: 2 out of 5 stars
2/5
Joe Celko's SQL for Smarties: Advanced SQL Programming
Ebook
Joe Celko's SQL for Smarties: Advanced SQL Programming
byJoe Celko
Rating: 3 out of 5 stars
3/5
End-to-End Data Science with SAS: A Hands-On Programming Guide
Ebook
End-to-End Data Science with SAS: A Hands-On Programming Guide
byJames Gearheart
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
Geometry For Dummies
Ebook
Geometry For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
Basic Math & Pre-Algebra Workbook For Dummies with Online Practice
Ebook
Basic Math & Pre-Algebra Workbook For Dummies with Online Practice
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
Precalculus: A Self-Teaching Guide
Ebook
Precalculus: A Self-Teaching Guide
bySteve Slavin
Rating: 4 out of 5 stars
4/5
Painless Algebra
Ebook
Painless Algebra
byLynette Long
Rating: 0 out of 5 stars
0 ratings
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Calculus Essentials For Dummies
Ebook
Calculus Essentials For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
Mental Math: Tricks To Become A Human Calculator
Ebook
Mental Math: Tricks To Become A Human Calculator
byAbhishek VR
Rating: 5 out of 5 stars
5/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
Introducing Game Theory: A Graphic Guide
Ebook
Introducing Game Theory: A Graphic Guide
byIvan Pastine
Rating: 4 out of 5 stars
4/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Flatland
Ebook
Flatland
byEdwin A. Abbott
Rating: 4 out of 5 stars
4/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5
Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis
Ebook
Summary of The Black Swan: by Nassim Nicholas Taleb | Includes Analysis
byInstaread Summaries
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
Podcast episode
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
Engineering interview tips & tricks: with Emma Draper & Jonas
Podcast episode
Engineering interview tips & tricks: with Emma Draper & Jonas
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Measuring Your Python Learning Progress
Podcast episode
Measuring Your Python Learning Progress
byThe Real Python Podcast
100%
100% found this document useful
#121 — ChatGPT and How Generative AI is Augmenting Workflows
Podcast episode
#121 — ChatGPT and How Generative AI is Augmenting Workflows
byDataFramed
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
008: Inside The Microsoft Excel Offices With John Campbell: John Campbell is an experienced Program Manager at Microsoft Excel for and has been working on the Excel Team for over 10 years. In this episode John takes us behind the scenes and inside Microsoft’s offices to discuss the evolution of Microsoft...
Podcast episode
008: Inside The Microsoft Excel Offices With John Campbell: John Campbell is an experienced Program Manager at Microsoft Excel for and has been working on the Excel Team for over 10 years. In this episode John takes us behind the scenes and inside Microsoft’s offices to discuss the evolution of Microsoft...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
002: Power Query with Ken Puls: Ken Puls is an Excel MVP who is the founder of long standing Excel website & forum called ExcelGuru.ca He is an expert in Power Query and has co-authored the book M is for (Data) Monkey: A Guide to the M Language in Excel Power Query ? Join Our...
Podcast episode
002: Power Query with Ken Puls: Ken Puls is an Excel MVP who is the founder of long standing Excel website & forum called ExcelGuru.ca He is an expert in Power Query and has co-authored the book M is for (Data) Monkey: A Guide to the M Language in Excel Power Query ? Join Our...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
Podcast episode
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
byAnalytics on Fire
0 ratings
0% found this document useful
Keeping Your Data Warehouse In Order With DataForm - Episode 102: An interview about Dataform and how it helps you to keep your data warehouse in good working order
Podcast episode
Keeping Your Data Warehouse In Order With DataForm - Episode 102: An interview about Dataform and how it helps you to keep your data warehouse in good working order
byData Engineering Podcast
0 ratings
0% found this document useful
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
Podcast episode
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
010: Excel Power Query (Get & Transform) & Data Cleansing Online Course: In this special course announcement podcast, I talk with Oz du Soleil (Excel MVP, Author & world expert on data cleansing) about our new Excel online course which I had the pleasure to co-author! We talk about the need to clean dirty data and...
Podcast episode
010: Excel Power Query (Get & Transform) & Data Cleansing Online Course: In this special course announcement podcast, I talk with Oz du Soleil (Excel MVP, Author & world expert on data cleansing) about our new Excel online course which I had the pleasure to co-author! We talk about the need to clean dirty data and...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
86: Be Data Literate: The Skills Everyone Needs w/ Jordan Morrow: We have all seen massive shifts in the world of data literacy in the last few years, with the pandemic having undeniable and sometimes surprising effects on the professional landscape. Joining me to talk about these changes and the skills that can...
Podcast episode
86: Be Data Literate: The Skills Everyone Needs w/ Jordan Morrow: We have all seen massive shifts in the world of data literacy in the last few years, with the pandemic having undeniable and sometimes surprising effects on the professional landscape. Joining me to talk about these changes and the skills that can...
byAnalytics on Fire
0 ratings
0% found this document useful
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
Podcast episode
An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem: Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing Data Platforms For Fintech Companies: Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.
Podcast episode
Designing Data Platforms For Fintech Companies: Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.
byData Engineering Podcast
0 ratings
0% found this document useful
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
Podcast episode
Unpacking The Seven Principles Of Modern Data Pipelines: Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.
byData Engineering Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Podcast episode
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
byData Engineering Podcast
0 ratings
0% found this document useful
Automating Analytics Teams
Podcast episode
Automating Analytics Teams
byThe Cloudcast
0 ratings
0% found this document useful
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
Podcast episode
Surveying The Market Of Database Products: Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.
byData Engineering Podcast
0 ratings
0% found this document useful
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
Podcast episode
Eliminate The Overhead In Your Data Integration With The Open Source dlt Library: Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.
byData Engineering Podcast
0 ratings
0% found this document useful
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
Podcast episode
Using Data To Illuminate The Intentionally Opaque Insurance Industry: The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.
byData Engineering Podcast
0 ratings
0% found this document useful
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
Podcast episode
Defining A Strategy For Your Data Products: The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
Building ETL Pipelines With Generative AI: Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative models it is being used to build new ETL workflows. In this episode Jay Mishra shares his experiences and insights building ETL pipelines with the help of generative AI.
Podcast episode
Building ETL Pipelines With Generative AI: Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Now that AI has reached the level of sophistication seen in the various generative models it is being used to build new ETL workflows. In this episode Jay Mishra shares his experiences and insights building ETL pipelines with the help of generative AI.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
PC Pro Magazine
Article
“There’s No Single ‘Best’ Language To Learn. I Think The Real Key Is To Learn How To Write Code”
Oct 8, 2022
9 min read
Precision Medicine Is Crushing Once-Untreatable Cancers
Newsweek
Article
Precision Medicine Is Crushing Once-Untreatable Cancers
Jul 26, 2019
12 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
Top 10 Excel Functions That Everyone Should Know
Techfastly
Article
Top 10 Excel Functions That Everyone Should Know
Feb 4, 2021
5 min read
Q&A
Rotman Management
Article
Q&A
May 1, 2023
Describe the capability that companies like Netflix, UPS, Amazon and Caesars Entertainment have in common. These are all leading firms in their industries with respect to leveraging analytics as a source of competitive advantage. We now have so much
7 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
ETL OR ELT: Build vs Buy
Techfastly
Article
ETL OR ELT: Build vs Buy
Apr 1, 2021
2 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Integrated Workplace Management Systems
Facility Management
Article
Integrated Workplace Management Systems
Dec 23, 2018
Property and facilities management are data-rich operating worlds. This is becoming even more complex as the Internet of Things (IoT) provides the capability to imbed sensors and diagnostic tools to monitor the use and performance of everything in re
4 min read
Make AI Work For You
Linux Format
Article
Make AI Work For You
Apr 2, 2024
8 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Buyer’s Guide Network Monitoring
PC Pro Magazine
Article
Buyer’s Guide Network Monitoring
Feb 9, 2023
4 min read
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
The European Business Review
Article
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
Dec 3, 2019
7 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
Deconstructing Management Analytics
Rotman Management
Article
Deconstructing Management Analytics
Sep 1, 2022
7 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read

Related categories

Skip carousel

Reviews for Advanced Analytics with Transact-SQL

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Advanced Analytics with Transact-SQL - Dejan Sarka

Part IStatistics

D. SarkaAdvanced Analytics with Transact-SQLhttps://doi.org/10.1007/978-1-4842-7173-5_1

1. Descriptive Statistics

Dejan Sarka¹

(1)

Ljubjana, Slovenia

Descriptive statistics summarize or quantitatively describe variables from a dataset. In a SQL Server table, a dataset is a set of the rows, or a rowset, that comes from a SQL Server table, view, or tabular expression. A variable is stored in a column of the rowset. In statistics, a variable is frequently called a feature .

When you analyze a variable, you first want to understand the distribution of its values. You can get a better understanding through graphical representation and descriptive statistics. Both are important. For most people, a graphical representation is easier to understand. However, with descriptive statistics, where you get information through numbers, it is simpler to analyze a lot of variables and compare their aggregated values; for example, their means and variability. You can always order numbers and quickly notice which variable has a higher mean, median, or other measure.

Transact-SQL is not very useful for graphing. Therefore, I focus on calculating descriptive statistics measures. I also include a few graphs, which I created with Power BI.

Variable Types

Before I calculate the summary values, I need to introduce the types of variables. Different types of variables require different calculations. The most basic division of the Variables are basically divided into two groups: discrete and continuous.

Discrete variables can only take a value from a limited pool. For example, there are only seven different or distinct values for the days of the week. Discrete variables can be further divided into two groups: nominal and ordinal.

If a value does not have a quantitative value (e.g., a label for a group), it is a nominal variable. For example, a variable that describes marital status could have three possible values: single, married, or divorced.

Discrete variables could also have an intrinsic order, which are called ordinal variables. If the values are represented as numbers, it is easy to notice the order. For example, evaluating a product purchased on a website could be expressed with numbers from 1 to 7, where a higher number means greater satisfaction with the product. If the values of a variable are represented with strings, it is sometimes harder to notice the order. For example, education could be represented with strings, like high school degree, graduate degree, and so forth. You probably don’t want to sort the values alphabetically because there is an order hidden in the values. With education, the order is defined through the years of schooling needed to get the degree.

If a discrete variable can take only two distinct values, it is a dichotomous variable called an indicator, a flag, or a binary variable. If the variable can only take a single value, it is a constant. Constants are not useful for analysis; there is no information in a constant. After all, variables are called variables because they introduce some variability.

Continuous variables can take a value from an unlimited, uncountable set of possible values. They are represented with integral or decimal numbers. They can be further divided into two classes: intervals or numerics (or true numerics).

Intervals are limited on the lower side, the upper side, or both sides. For example, temperature is an interval, limited with absolute zero on the lower side. On the other hand, true numerics have no limits on any side. For example, cashflow can be positive, negative, or zero.

It is not always completely clear if a variable is discrete or continuous. For example, the number of cars owned is an integer and can take any value between zero and infinite. You can use such variables in both ways—as discrete, when needed, or as continuous. For example, the naïve Bayes algorithm, which is explained in Chapter 7, uses only discrete variables so that you can treat the number of cars owned variable as discrete. But the linear regression algorithm, which is explained in the same chapter, uses only continuous variables, and you can treat the same variable as continuous.

Demo Data

I use a couple of demo datasets for the demos in this book. In this chapter, I use the mtcars demo dataset that comes from the R language; mtcars is an acronym for MotorTrend Car Road Tests. The dataset includes 32 cases, or rows, originally with 11 variables. For demo purposes, I add a few calculated variables. The data comes from a 1974 MotorTrend magazine and includes design and performance aspects for 32 cars, all 1973 and 1974 models. You can learn more about this dataset at www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars.

I introduce variables when needed.

From SQL Server 2016, it is easy to execute R code inside SQL Server Database Engine. You can learn more about machine learning inside SQL Server with R or the Python language in official Microsoft documentation. A good introduction is at https://docs.microsoft.com/en-us/sql/machine-learning/sql-server-machine-learning-services?view=sql-server-ver15. Since this book is about T-SQL and not R, I will not spend more time explaining the R part of the code. I introduce the code that I used to import the mtcars dataset, with some additional calculated columns, in a SQL Server table.

First, you need to enable external scripts execution in SQL Server.

-- Configure SQL Server to enable external scripts

USE master;

EXEC sys.sp_configure 'show advanced options', 1;

RECONFIGURE

EXEC sys.sp_configure 'external scripts enabled', 1;

RECONFIGURE;

I created a new table in the AdventureWorksDW2017 demo database, which is a Microsoft-provided demo database. I use the data from this database later in this book as well. You can find the AdventureWorks sample databases at https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver15&tabs=ssms. For now, I won’t spend more time on the content of this database. I just needed a database to create a table in, and because I use this database later, it seems like the best place for my first table with demo data. Listing 1-1 shows the T-SQL code for creating the demo table.

-- Create a new table in the AWDW database

USE AdventureWorksDW2017;

DROP TABLE IF EXISTS dbo.mtcars;

CREATE TABLE dbo.mtcars

(

mpg numeric(8,2),

cyl int,

disp numeric(8,2),

hp int,

drat numeric(8,2),

wt numeric(8,3),

qsec numeric(8,2),

vs int,

am int,

gear int,

carb int,

l100km numeric(8,2),

dispcc numeric(8,2),

kw numeric(8,2),

weightkg numeric(8,2),

transmission nvarchar(10),

engine nvarchar(10),

hpdescription nvarchar(10),

carbrand nvarchar(20) PRIMARY KEY

)

Listing 1-1

Creating the Demo Table

I want to discuss the naming conventions in this book. When I create tables in SQL Server, I start with the column(s) that form the primary key and use pascal case (e.g., FirstName) for the physical columns. For computed columns, typically aggregated columns from a query, I tend to use camel case (e.g., avgAmount). However, the book deals with data from many sources. Demo data provided from Microsoft demo databases is not enough for all of my examples. Two demo tables come from R. In R, the naming convention is not strict. I had a choice to make on how to proceed. I decided to go with the original names when data comes from R, so the names of the columns in the table in Listing 1-1 are all lowercase (e.g., carbrand).

Note

Microsoft demo data is far from perfect. Many dynamic management objects return all lowercase objects or reserved keywords as the names of the columns. For example, in Chapter 8, I use two tabular functions by Microsoft that return two columns named [KEY] and [RANK]. Both are uppercase reserved words in SQL, so they need to be enclosed in brackets.

Now let’s use the sys.sp_execute_external_script system stored procedure to execute the R code. Listing 1-2 shows how to execute the INSERT...EXECUTE T-SQL statement to get the R dataset in a SQL Server table.

-- Insert the mtcars dataset

INSERT INTO dbo.mtcars

EXECUTE sys.sp_execute_external_script

@language=N'R',

@script = N'

data(mtcars)

mtcars$l100km = round(235.214583 / mtcars$mpg, 2)

mtcars$dispcc = round(mtcars$disp * 16.38706, 2)

mtcars$kw = round(mtcars$hp * 0.7457, 2)

mtcars$weightkg = round(mtcars$wt * 1000 * 0.453592, 2)

mtcars$transmission = ifelse(mtcars$am == 0,

Automatic, Manual)

mtcars$engine = ifelse(mtcars$vs == 0,

V-shape, Straight)

mtcars$hpdescription =

factor(ifelse(mtcars$hp > 175, Strong,

ifelse(mtcars$hp < 100, Weak, Medium)),

order = TRUE,

levels = c(Weak, Medium, Strong))

mtcars$carbrand = row.names(mtcars)

@output_data_1_name = N'mtcars';

Listing 1-2

Inserting R Data in the SQL Server Demo Table

You can check if the demo data successfully imported with a simple SELECT statement.

SELECT *

FROM dbo.mtcars;

When the demo data is loaded, let’s start analyzing it.

Frequency Distribution of Discrete Variables

You usually represent the distribution of a discrete variable with frequency distribution or frequencies. In the simplest example, you can calculate only the values’ count. You can also express these value counts as percentages of the total number of rows or cases.

Frequencies of Nominals

The following is a simple example of calculating the counts and percentages for the transmission variable, which shows the transmission type .

-- Simple, nominals

SELECT c.transmission,

COUNT(c.transmission) AS AbsFreq,

CAST(ROUND(100. * (COUNT(c.transmission)) /

(SELECT COUNT(*) FROM mtcars), 0) AS int) AS AbsPerc

FROM dbo.mtcars AS c

GROUP BY c.transmission;

The following is the result.

transmission AbsFreq AbsPerc

------------ ----------- -----------

Automatic 19 59

Manual 13 41

I used a simple GROUP BY clause of the SELECT statement and the COUNT() aggregate function. Graphically, you can represent the distribution with vertical or horizontal bar charts. Figure 1-1 shows the bar charts for three variables from the mtcars dataset, created with Power BI.

../images/507257_1_En_1_Chapter/507257_1_En_1_Fig1_HTML.jpg

Figure 1-1

Bar charts for discrete variables

You can see the distribution of the transmission, engine, and cyl variables. The cyl variable is represented with the numbers 4, 6, and 8, which represent the number of engine cylinders. Can you create a bar chart with T-SQL? You can use the percentage number as a parameter to the REPLICATE() function and mimic the horizontal bar chart, or a horizontal histogram, as the following code shows.

WITH freqCTE AS

(

SELECT c.transmission,

COUNT(c.transmission) AS AbsFreq,

CAST(ROUND(100. * (COUNT(c.transmission)) /

(SELECT COUNT(*) FROM mtcars), 0) AS int) AS AbsPerc

FROM dbo.mtcars AS c

GROUP BY c.transmission

)

SELECT transmission,

AbsFreq,

AbsPerc,

CAST(REPLICATE('*', AbsPerc) AS varchar(50)) AS Histogram

FROM freqCTE;

I used a common table expression to enclose the first query, which calculated the counts and the percentages, and then added the horizontal bars in the outer query. Figure 1-2 shows the result.

../images/507257_1_En_1_Chapter/507257_1_En_1_Fig2_HTML.jpg

Figure 1-2

Counts with a horizontal bar

For nominal variables, this is usually all that you calculate. For ordinals, you can also calculate running totals.

Frequencies of Ordinals

Ordinals have intrinsic order. When you sort the values in the correct order, it makes sense to also calculate the running totals. What is the total count of cases up to some specific value? What is the running total of percentages? You can use the T_SQL window aggregate functions to calculate the running totals. Listing 1-3 shows the calculation for the cyl variable.

-- Ordinals - simple with numerics

WITH frequency AS

(

SELECT v.cyl,

COUNT(v.cyl) AS AbsFreq,

CAST(ROUND(100. * (COUNT(v.cyl)) /

(SELECT COUNT(*) FROM dbo.mtcars), 0) AS int) AS AbsPerc

FROM dbo.mtcars AS v

GROUP BY v.cyl

)

SELECT cyl,

AbsFreq,

SUM(AbsFreq)

OVER(ORDER BY cyl

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS CumFreq,

AbsPerc,

SUM(AbsPerc)

OVER(ORDER BY cyl

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS CumPerc,

CAST(REPLICATE('*', AbsPerc) AS varchar(50)) AS Histogram

FROM frequency

ORDER BY cyl;

Listing 1-3

Frequencies of an Ordinal Variable

The query returns the result shown in Figure 1-3.

../images/507257_1_En_1_Chapter/507257_1_En_1_Fig3_HTML.jpg

Figure 1-3

Frequencies of an ordinal variable

Note

If you are not familiar with the T-SQL window functions and the OVER() clause, please refer to the official SQL Server documentation at https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15.

Ordering by the cyl variable was simple because the values are represented with integral numbers, and the order is automatically correct. But if an ordinal is represented with strings, you need to be careful with the proper order. You probably do not want to use alphabetical order.

For a demo, I created (already in the R code) a hpdescription derived variable (originally stored in the hp continuous variable), which shows engine horsepower in three classes: weak, medium, and strong. The following query incorrectly returns the result in alphabetical order.

-- Ordinals - incorrect order with strings

WITH frequency AS

(

SELECT v.hpdescription,

COUNT(v.hpdescription) AS AbsFreq,

CAST(ROUND(100. * (COUNT(v.hpdescription)) /

(SELECT COUNT(*) FROM dbo.mtcars), 0) AS int) AS AbsPerc

FROM dbo.mtcars AS v

GROUP BY v.hpdescription

)

SELECT hpdescription,

AbsFreq,

SUM(AbsFreq)

OVER(ORDER BY hpdescription

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS CumFreq,

AbsPerc,

SUM(AbsPerc)

OVER(ORDER BY hpdescription

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS CumPerc,

CAST(REPLICATE('*', AbsPerc) AS varchar(50)) AS Histogram

FROM frequency

ORDER BY hpdescription;

The results of this query are shown in Figure 1-4.

../images/507257_1_En_1_Chapter/507257_1_En_1_Fig4_HTML.jpg

Figure 1-4

Frequencies of the hpdescription variable with incorrect order

You can use the CASE T-SQL expression to change the strings and include proper ordering with numbers at the beginning of the string. Listing 1-4 shows the calculation of the frequencies of a string ordinal with proper ordering.

-- Ordinals - correct order

WITH frequency AS

(

SELECT

CASE v.hpdescription

WHEN N'Weak' THEN N'1 - Weak'

WHEN N'Medium' THEN N'2 - Medium'

WHEN N'Strong' THEN N'3 - Strong'

END AS hpdescriptionord,

COUNT(v.hpdescription) AS AbsFreq,

CAST(ROUND(100. * (COUNT(v.hpdescription)) /

(SELECT COUNT(*) FROM dbo.mtcars), 0) AS int) AS AbsPerc

FROM dbo.mtcars AS v

GROUP BY v.hpdescription

)

SELECT hpdescriptionord,

AbsFreq,

SUM(AbsFreq)

OVER(ORDER BY hpdescriptionord

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS CumFreq,

AbsPerc,

SUM(AbsPerc)

OVER(ORDER BY hpdescriptionord

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS CumPerc,

CAST(REPLICATE('*', AbsPerc) AS varchar(50)) AS Histogram

FROM frequency

ORDER BY hpdescriptionord;

Listing 1-4

Frequencies of an Ordinal with Proper Ordering

Figure 1-5 shows the result of the query from Listing 1-4.

../images/507257_1_En_1_Chapter/507257_1_En_1_Fig5_HTML.jpg

Figure 1-5

Frequencies of the hpdescription ordinal variable

With frequencies, I covered discrete variables. Now let’s calculate some descriptive statistics for continuous variables.

Descriptive Statistics for Continuous Variables

You can calculate many statistical values for the distribution of a continuous variable. Next, I show you the calculation for the centers of distribution, spread, skewness, and tailedness. I also explain the mathematical formulas for calculation and the meaning of the measures. These measures help describe the distribution of a continuous variable without graphs.

Centers of a Distribution

The most known and the most abused statistical measure is the mean or the average of a variable. How many times have you heard or read about the average …? Many times, this expression makes no sense, although it looks smart to use it. Let’s discuss an example.

Take a group of random people in a bar. For the sake of the example, let’s say they are all local people from the country where the bar is located. You want to estimate the wealth of these people.

The mean value is also called the expected value. It is used as the estimator for the target variable, in this case, wealth. It all depends on how you calculate the mean. You can ask every person in the group her or his income and then calculate the group’s mean. This is the sample mean.

Your group is a sample of the broader population. You could also calculate the mean for the whole country. This would be the population mean. The population mean is a good estimator for the group. However, the sample mean could be very far from the actual wealth of the majority of people in the group. Imagine that there are 20 people in the group, including one extremely rich person worth more than $20 billion. The sample mean would be more than a billion dollars, which seems like a group of billionaires are in the bar. This could be far from the truth.

Extreme values, especially if they are rare, are called outliers. Outliers can have a big impact on the mean value. This is clear from the formula for the mean.

$$ \mu =\frac{1}{n}\ast {\sum}_{i=1}^n{v}_i $$

Each value, vi, is part of the calculation of the mean, μ. A value of 100 adds a hundred times more to the mean than the value of 1. The mean of the sample is rarely useful if it is the only value you are measuring. The calculation of the mean involves every value on the first degree. That is why the mean is also called the first population moment.

Apparently, we

Enjoying the preview?

Page 1 of 1

Advanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data

About this ebook

Dejan Sarka

Read more from Dejan Sarka

Related authors

Related to Advanced Analytics with Transact-SQL

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Advanced Analytics with Transact-SQL

What did you think?

Book preview

Advanced Analytics with Transact-SQL - Dejan Sarka

1. Descriptive Statistics

Variable Types

Demo Data

Frequency Distribution of Discrete Variables

Frequencies of Nominals

Frequencies of Ordinals

Descriptive Statistics for Continuous Variables

Centers of a Distribution