Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R

Ebook583 pages6 hours

Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R

Name: Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
Author: Dejan Sarka
ISBN: 9789391246785

By Dejan Sarka, Jernej Rihar and Klemen Vončina

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Empowering You to Master Business Intelligence and Solve Real-world Analytical Problems.

DESCRIPTION

Skip carousel

LanguageEnglish

PublisherOrange Education Pvt Ltd

Release dateOct 13, 2023

ISBN9789391246785

Author

Dejan Sarka

Related authors

Skip carousel

Related to Advanced Analytics with Power BI and Excel

Related ebooks

Skip carousel

Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
Ebook
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
byDejan Sarka
Rating: 0 out of 5 stars
0 ratings
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
Ebook
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
byVinicius Aquino do Vale
Rating: 0 out of 5 stars
0 ratings
Instant MapReduce Patterns – Hadoop Essentials How-to
Ebook
Instant MapReduce Patterns – Hadoop Essentials How-to
bySrinath Perera
Rating: 0 out of 5 stars
0 ratings
Cloud Data Architectures Demystified: Gain the expertise to build Cloud data solutions as per the organization's needs (English Edition)
Ebook
Cloud Data Architectures Demystified: Gain the expertise to build Cloud data solutions as per the organization's needs (English Edition)
byAshok Boddeda
Rating: 0 out of 5 stars
0 ratings
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
Ebook
Business Analytics with SAS Studio: Deliver Business Intelligence by Combining SQL Processing, Insightful Visualizations, and Various Data Mining Techniques
byRajinder Kr. Chitoria
Rating: 0 out of 5 stars
0 ratings
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
Ebook
SQL and NoSQL Interview Questions: Your essential guide to acing SQL and NoSQL job interviews (English Edition)
byVishwanathan Narayanan
Rating: 0 out of 5 stars
0 ratings
Power Query for Power BI and Excel
Ebook
Power Query for Power BI and Excel
byChristopher Webb
Rating: 0 out of 5 stars
0 ratings
Practical Business Intelligence
Ebook
Practical Business Intelligence
byAhmed Sherif
Rating: 3 out of 5 stars
3/5
Applied Microsoft Business Intelligence
Ebook
Applied Microsoft Business Intelligence
byPatrick LeBlanc
Rating: 3 out of 5 stars
3/5
End of Abundance in Tech: How IT Leaders Can Find Efficiencies to Drive Business Value
Ebook
End of Abundance in Tech: How IT Leaders Can Find Efficiencies to Drive Business Value
byBen DeBow
Rating: 0 out of 5 stars
0 ratings
Excel 2003 Formulas
Ebook
Excel 2003 Formulas
byJohn Walkenbach
Rating: 4 out of 5 stars
4/5
.NET Mastery: The .NET Interview Questions and Answers
Ebook
.NET Mastery: The .NET Interview Questions and Answers
byChetan Singh
Rating: 0 out of 5 stars
0 ratings
SQL Server Reporting Services Complete Self-Assessment Guide
Ebook
SQL Server Reporting Services Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset
Ebook
The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset
byJoy Mundy
Rating: 0 out of 5 stars
0 ratings
Professional SQL Server Reporting Services
Ebook
Professional SQL Server Reporting Services
byPaul Turley
Rating: 0 out of 5 stars
0 ratings
Data Analytics & Visualization All-in-One For Dummies
Ebook
Data Analytics & Visualization All-in-One For Dummies
byJack A. Hyman
Rating: 0 out of 5 stars
0 ratings
Instant SQL Server Analysis Services 2012 Cube Security
Ebook
Instant SQL Server Analysis Services 2012 Cube Security
bySatya SK Jayanty
Rating: 0 out of 5 stars
0 ratings
Self-Service AI with Power BI Desktop: Machine Learning Insights for Business
Ebook
Self-Service AI with Power BI Desktop: Machine Learning Insights for Business
byMarkus Ehrenmueller-Jensen
Rating: 0 out of 5 stars
0 ratings
Mastering HTML and CSS for Modern Development
Ebook
Mastering HTML and CSS for Modern Development
byTHE NORTHERN HIMALAYAS
Rating: 0 out of 5 stars
0 ratings
SQL
Ebook
SQL
byBrandon Cooper
Rating: 0 out of 5 stars
0 ratings
Mastering MLOps Architecture: From Code to Deployment: Manage the production cycle of continual learning ML models with MLOps (English Edition)
Ebook
Mastering MLOps Architecture: From Code to Deployment: Manage the production cycle of continual learning ML models with MLOps (English Edition)
byRaman Jhajj
Rating: 0 out of 5 stars
0 ratings
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 4 out of 5 stars
4/5
Data reporting Complete Self-Assessment Guide
Ebook
Data reporting Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
“Mastering Relational Databases: From Fundamentals to Advanced Concepts”: GoodMan, #1
Ebook
“Mastering Relational Databases: From Fundamentals to Advanced Concepts”: GoodMan, #1
byPatrick Mukosha
Rating: 0 out of 5 stars
0 ratings
Active Directory: Network Management Best Practices For System Administrators
Ebook
Active Directory: Network Management Best Practices For System Administrators
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
SQL Server Functions and tutorials 50 examples
Ebook
SQL Server Functions and tutorials 50 examples
byNino Paiotta
Rating: 1 out of 5 stars
1/5
Programming Microsoft Dynamics™ NAV 2015
Ebook
Programming Microsoft Dynamics™ NAV 2015
byDavid Studebaker
Rating: 0 out of 5 stars
0 ratings
The Key to Successful Data Migration: Pre-Migration Activities
Ebook
The Key to Successful Data Migration: Pre-Migration Activities
byRajender Kumar
Rating: 0 out of 5 stars
0 ratings
Interactive Reports in SAS® Visual Analytics: Advanced Features and Customization
Ebook
Interactive Reports in SAS® Visual Analytics: Advanced Features and Customization
byNicole Ball
Rating: 0 out of 5 stars
0 ratings
Visual Basic for Applications A Complete Guide - 2019 Edition
Ebook
Visual Basic for Applications A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
Ebook
Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance
byKatherine Johnson Martinko
Rating: 0 out of 5 stars
0 ratings
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Going Text: Mastering the Command Line
Ebook
Going Text: Mastering the Command Line
byBrian Schell
Rating: 4 out of 5 stars
4/5
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
Ebook
AP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice
bySeth Reichelson
Rating: 0 out of 5 stars
0 ratings
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Episode 469 - Microsoft Fabric: Azure and Data technical specialist Ian Pike, joins us on the Podcast to give us a primer on Fabric and what is means for customers that use various data-related services on Azure.   Media file: https://azpodcast.blob.core.windows.net/episodes/Episode469.mp3 YouTube: https://youtu.be/k0DkkVkfw2o Resources:   Announcement for Fabric including the key dates and how to enable : Introducing Microsoft Fabric and Copilot in Microsoft Power BI | Microsoft Power BI Blog | Microsoft Power BI Fabric Getting started : Get started with Microsoft Fabric - Training | Microsoft Learn Happy Paths : Introducing the end-to-end scenarios in Microsoft Fabric | Microsoft Fabric Blog | Microsoft Fabric Cube in a Cube (2 Microsoft Employees with some great content, who are part of the Fabric Customer Advisory Team) : (488) Guy in a Cube - YouTube   Endjin : (488) endjin - YouTube Advancing Analytics :  (488) Advancing Fabric - YouTube   Other
Podcast episode
Episode 469 - Microsoft Fabric: Azure and Data technical specialist Ian Pike, joins us on the Podcast to give us a primer on Fabric and what is means for customers that use various data-related services on Azure.   Media file: https://azpodcast.blob.core.windows.net/episodes/Episode469.mp3 YouTube: https://youtu.be/k0DkkVkfw2o Resources:   Announcement for Fabric including the key dates and how to enable : Introducing Microsoft Fabric and Copilot in Microsoft Power BI | Microsoft Power BI Blog | Microsoft Power BI Fabric Getting started : Get started with Microsoft Fabric - Training | Microsoft Learn Happy Paths : Introducing the end-to-end scenarios in Microsoft Fabric | Microsoft Fabric Blog | Microsoft Fabric Cube in a Cube (2 Microsoft Employees with some great content, who are part of the Fabric Customer Advisory Team) : (488) Guy in a Cube - YouTube   Endjin : (488) endjin - YouTube Advancing Analytics :  (488) Advancing Fabric - YouTube   Other
byThe Azure Podcast
0 ratings
0% found this document useful
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
Podcast episode
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams: With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Cloud SQL Insights with Nimesh Bhagat: This week on the podcast, Mark Mirchandani and Gabi Ferrara talk with Nimesh Bhagat about Cloud SQL Insights.
Podcast episode
Cloud SQL Insights with Nimesh Bhagat: This week on the podcast, Mark Mirchandani and Gabi Ferrara talk with Nimesh Bhagat about Cloud SQL Insights.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
2022 Real Python Tutorial & Video Course Wrap Up
Podcast episode
2022 Real Python Tutorial & Video Course Wrap Up
byThe Real Python Podcast
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
#93 - Maximum Value Maximum Speed Software - Dave Thomas
Podcast episode
#93 - Maximum Value Maximum Speed Software - Dave Thomas
byTech Lead Journal
0 ratings
0% found this document useful
How Redpanda Extracts Business Value from Data Events with Alex Gallego
Podcast episode
How Redpanda Extracts Business Value from Data Events with Alex Gallego
byScreaming in the Cloud
0 ratings
0% found this document useful
MLOps Meetup #29 // Scaling Machine Learning Capabilities in Large Organizations // Bertjan Broeksema & Axel Goblet
Podcast episode
MLOps Meetup #29 // Scaling Machine Learning Capabilities in Large Organizations // Bertjan Broeksema & Axel Goblet
byMLOps.community
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
Podcast episode
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
byThe Web Platform Podcast
0 ratings
0% found this document useful
72 | Working with Storybook: In this episode, Amy shares her experience with working with Storybook, the pros and cons, and how it's changed her developer workflow.
Podcast episode
72 | Working with Storybook: In this episode, Amy shares her experience with working with Storybook, the pros and cons, and how it's changed her developer workflow.
byCOMPRESSEDfm
0 ratings
0% found this document useful
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
Podcast episode
Building An Internal Database As A Service Platform At Cloudflare: Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.
byData Engineering Podcast
0 ratings
0% found this document useful
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
Podcast episode
Tackling Real Time Streaming Data With SQL Using RisingWave: Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
byData Engineering Podcast
0 ratings
0% found this document useful
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
Podcast episode
Making Email Better With AI At Shortwave: Generative AI has rapidly transformed everything in the technology sector. When Andrew Lee started work on Shortwave he was focused on making email more productive. When AI started gaining adoption he realized that he had even more potential for a transformative experience. In this episode he shares the technical challenges that he and his team have overcome in integrating AI into their product, as well as the benefits and features that it provides to their customers.
byData Engineering Podcast
0 ratings
0% found this document useful
End-to-End Data Science to Drive Business Decisions at LinkedIn with Burcu Baran - TWiML Talk #256: In this episode of our Strata Data conference series, we’re joined by Burcu Baran, Senior Data Scientist at LinkedIn. At Strata, Burcu, along with a few members of her team, delivered the presentation “Using the full spectrum of data science to...
Podcast episode
End-to-End Data Science to Drive Business Decisions at LinkedIn with Burcu Baran - TWiML Talk #256: In this episode of our Strata Data conference series, we’re joined by Burcu Baran, Senior Data Scientist at LinkedIn. At Strata, Burcu, along with a few members of her team, delivered the presentation “Using the full spectrum of data science to...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
26 | How Blogging Can Change Your Career: In this episode, Amy and James discuss all the benefits that come from blogging, including the positive effect it can have on your career.
Podcast episode
26 | How Blogging Can Change Your Career: In this episode, Amy and James discuss all the benefits that come from blogging, including the positive effect it can have on your career.
byCOMPRESSEDfm
0 ratings
0% found this document useful
30 | WordPress in 2021: In this episode, Amy and James discuss the state of WordPress in 2021. Is it still relevant? Is it worth learning? What does the developer experience look like? What does the future of WordPress hold?
Podcast episode
30 | WordPress in 2021: In this episode, Amy and James discuss the state of WordPress in 2021. Is it still relevant? Is it worth learning? What does the developer experience look like? What does the future of WordPress hold?
byCOMPRESSEDfm
0 ratings
0% found this document useful
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
Podcast episode
Building Linked Data Products With JSON-LD: A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.
byData Engineering Podcast
0 ratings
0% found this document useful
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Podcast episode
Designing A Non-Relational Database Engine: Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
byData Engineering Podcast
0 ratings
0% found this document useful
27 | Talking to Some of Our Favorite Content Creators: In this episode, we hear from 14 of our favorite content creators: their advice, thoughts on content creation, and how content has impacted their careers.
Podcast episode
27 | Talking to Some of Our Favorite Content Creators: In this episode, we hear from 14 of our favorite content creators: their advice, thoughts on content creation, and how content has impacted their careers.
byCOMPRESSEDfm
0 ratings
0% found this document useful
Using ORDS to Make Your ADB Data Available in VBS: Visual Builder Studio requires its data sources to connect to the webpage it produces using REST calls. Therefore, the data source has to provide a REST interface. A simple, easy, secure, and free way to do that is with Oracle REST Data Services...
Podcast episode
Using ORDS to Make Your ADB Data Available in VBS: Visual Builder Studio requires its data sources to connect to the webpage it produces using REST calls. Therefore, the data source has to provide a REST interface. A simple, easy, secure, and free way to do that is with Oracle REST Data Services...
byOracle University Podcast
0 ratings
0% found this document useful
57 | Authentication and Authorization and Other Buzz Words: In this episode, James and Amy, explain all the buzz words: authentication, authorization, JWTs, sessions, and cookies. And what's the best implementation for your site?
Podcast episode
57 | Authentication and Authorization and Other Buzz Words: In this episode, James and Amy, explain all the buzz words: authentication, authorization, JWTs, sessions, and cookies. And what's the best implementation for your site?
byCOMPRESSEDfm
0 ratings
0% found this document useful
The Disciplined Pursuit of Less: Using AI and Design to Maximize Customer Impact w/ Dheeraj Pandey #169: In today’s episode, we’re resharing Dheeraj Pandey’s popular session from ELC Annual 2023 on the disciplined pursuit of less! As the Co-Founder, CEO & Chairman of DevRev.ai, he shares how AI tools can maximize customer impact & reduce information asymmetry between various teams, including eng, customer support, product, sales, etc., ultimately creating a more customer-centric mindset. He reveals how to leverage AI to tackle “verbs,” such as classifying, routing, attributing, summarizing and more, further streamlining productivity and empowering your org to focus on customer needs.
Podcast episode
The Disciplined Pursuit of Less: Using AI and Design to Maximize Customer Impact w/ Dheeraj Pandey #169: In today’s episode, we’re resharing Dheeraj Pandey’s popular session from ELC Annual 2023 on the disciplined pursuit of less! As the Co-Founder, CEO & Chairman of DevRev.ai, he shares how AI tools can maximize customer impact & reduce information asymmetry between various teams, including eng, customer support, product, sales, etc., ultimately creating a more customer-centric mindset. He reveals how to leverage AI to tackle “verbs,” such as classifying, routing, attributing, summarizing and more, further streamlining productivity and empowering your org to focus on customer needs.
byThe Engineering Leadership Podcast
0 ratings
0% found this document useful
Powering Vector Search With Real Time And Incremental Vector Indexes: The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.
Podcast episode
Powering Vector Search With Real Time And Incremental Vector Indexes: The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Real Time Applications With Operational Simplicity Using Dozer: Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. Despite that, it is still a complex set of capabilities. To bring streaming data in reach of application engineers Matteo Pelati helped to create Dozer. In this episode he explains how investing in high performance and operationally simplified streaming with a familiar API can yield significant benefits for software and data teams together.
Podcast episode
Build Real Time Applications With Operational Simplicity Using Dozer: Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. Despite that, it is still a complex set of capabilities. To bring streaming data in reach of application engineers Matteo Pelati helped to create Dozer. In this episode he explains how investing in high performance and operationally simplified streaming with a familiar API can yield significant benefits for software and data teams together.
byData Engineering Podcast
0 ratings
0% found this document useful
Project/Product Management for MLOps // Korri Jones - Simarpal Khaira - Veselina Staneva // MLOps Meetup #68
Podcast episode
Project/Product Management for MLOps // Korri Jones - Simarpal Khaira - Veselina Staneva // MLOps Meetup #68
byMLOps.community
0 ratings
0% found this document useful
MLOps Coffee Sessions #13 How to Choose the Right Machine Learning Tool: A Conversation // Jose Navarro and Mariya Davydova
Podcast episode
MLOps Coffee Sessions #13 How to Choose the Right Machine Learning Tool: A Conversation // Jose Navarro and Mariya Davydova
byMLOps.community
0 ratings
0% found this document useful
#118 How Power BI Empowers Collaboration
Podcast episode
#118 How Power BI Empowers Collaboration
byDataFramed
0 ratings
0% found this document useful
Potluck — $100k Dev Jobs × Sponsored Blog Posts × How To Keep Your Skills Up To Date × Libraries vs Custom × Dev Tools × More!: It’s another potluck! In this episode, Scott and Wes answer your questions about VS Code, JavaScript, $100k-per-year dev jobs, sponsored blog posts, how to use dev tools, how to keep your skills up to date, and more! Prismic - Sponsor Prismic is a...
Podcast episode
Potluck — $100k Dev Jobs × Sponsored Blog Posts × How To Keep Your Skills Up To Date × Libraries vs Custom × Dev Tools × More!: It’s another potluck! In this episode, Scott and Wes answer your questions about VS Code, JavaScript, $100k-per-year dev jobs, sponsored blog posts, how to use dev tools, how to keep your skills up to date, and more! Prismic - Sponsor Prismic is a...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful

Skip carousel

Plotting Applications
Linux Format
Article
Plotting Applications
Mar 10, 2020
1 min read
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
PC Pro Magazine
Article
“We’re Learning As We Go And Accepting Any False Starts As Being A Part Of The Process”
Jul 8, 2021
6 min read
Enterprise Soaring Success
Linux Format
Article
Enterprise Soaring Success
Aug 27, 2019
7 min read
WWDC 2022 SPECIAL FOCUS Young Singaporean App Developers
HWM Singapore
Article
WWDC 2022 SPECIAL FOCUS Young Singaporean App Developers
Jul 7, 2022
7 min read
“When Something Goes Wrong, You Realise You’re Like That Cartoon Character That Has Run Off The Edge Of The Cliff”
PC Pro Magazine
Article
“When Something Goes Wrong, You Realise You’re Like That Cartoon Character That Has Run Off The Edge Of The Cliff”
Feb 9, 2023
We need to talk about data. Specifically, your data and my data. The stuff we use on a day-to-day basis, from where we store it to what our expectations are for its safe handling. Now let me get one thing clear from the beginning: I am going to sugge
9 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Reclaim Your Time Clever Productivity Tools Tech-savvy Leaders Use
NZBusiness and Management
Article
Reclaim Your Time Clever Productivity Tools Tech-savvy Leaders Use
Jul 15, 2020
6 min read
Craft A Perfect Personal Document Library
iCreate
Article
Craft A Perfect Personal Document Library
Jan 26, 2023
There are countless apps available that can be used to organise your notes and also many word processors designed to help you create smart-looking documents, but Craft aims to do both in style. The idea is to include a huge number of advanced documen
1 min read
Microsoft Viva Is Teams’ Attempt To Replace Your Company’s Intranet
Tech Advisor
Article
Microsoft Viva Is Teams’ Attempt To Replace Your Company’s Intranet
Mar 3, 2021
3 min read
Jeremy Bolz
3D World
Article
Jeremy Bolz
Nov 2, 2021
5 min read
Learning to Love What I Don’t Know
Inc.
Article
Learning to Love What I Don’t Know
Nov 1, 2017
LIKE MANY WHO HAVE made the leap into Startupland, I guessed from the outset that I had a lot to learn. I was right. Indeed, I jumped into the wormhole of blind spots and unknown unknowns. This has been especially true on matters technological. At Io
2 min read
The Big Tech Boost
Business Today
Article
The Big Tech Boost
Jan 5, 2024
5 min read
Office 365 Features For Business
PC Pro Magazine
Article
Office 365 Features For Business
Dec 8, 2022
4 min read
Jonathan Ellis INTERVIEW
Linux Format
Article
Jonathan Ellis INTERVIEW
Oct 22, 2019
6 min read
Three Low-code Options
PC Pro Magazine
Article
Three Low-code Options
Nov 12, 2020
Counting Intel, Vodafone and VW among its customers, OutSystems helps businesses create cloudbased, on-premises and hybrid applications for mobile and web. Its development environment is predominantly drag-and-drop, with views for processes, data and
3 min read
Digital Trust Is On The Horizon
The European Business Review
Article
Digital Trust Is On The Horizon
Mar 1, 2022
11 min read
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
PCWorld
Article
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
Sep 5, 2023
4 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
CalicoPie Family Historian 7
Computeractive
Article
CalicoPie Family Historian 7
Mar 24, 2021
SOFTWARE | £60 from Family Historian Store www.snipca.com/37615 If you’ve ever researched your family tree, you’ll know it’s much harder than the BBC’s celebrity genealogy programme Who Do You Think You Are? makes it appear. You’ll certainly need to
2 min read
Software Worth Paying For: 10 Programs You Won’t Regret Buying
PCWorld
Article
Software Worth Paying For: 10 Programs You Won’t Regret Buying
Jun 2, 2020
8 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
Must-have Freebies For Your Mac
MacFormat
Article
Must-have Freebies For Your Mac
Jan 12, 2021
2 min read
22 Awesome Open-source Programs That Do Everything You Need
PCWorld
Article
22 Awesome Open-source Programs That Do Everything You Need
Oct 30, 2023
6 min read
Raising The 'Data Bar' In New Zealand
NZ Marketing
Article
Raising The 'Data Bar' In New Zealand
Mar 24, 2019
4 min read
Cybersecurity Made Simple: Taming The Password
The European Business Review
Article
Cybersecurity Made Simple: Taming The Password
Mar 1, 2022
8 min read
5 Tools To Help Your Remote-work Business Click
TechLife News
Article
5 Tools To Help Your Remote-work Business Click
Aug 14, 2021
3 min read
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Business Today
Article
Saxo Bank And Thoughtworks: Enabling Data Democratization At A Global Investment Bank
Jan 20, 2023
2 min read
An Expert Speaks Up on What You Should Know About Programming Languages
Entrepreneur
Article
An Expert Speaks Up on What You Should Know About Programming Languages
Oct 1, 2015
1 min read
Growing Communities of Practice
Rotman Management
Article
Growing Communities of Practice
Sep 1, 2019
According to the SAP Digital Transformation Executive Study, 80 per cent of companies that have embraced digital transformation have experienced increased profitability. So why have only 21 per cent of companies completed their digital transformation
5 min read

Related categories

Skip carousel

Reviews for Advanced Analytics with Power BI and Excel

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Advanced Analytics with Power BI and Excel - Dejan Sarka

CHAPTER 1

Introducing the Theoretical Background for Democratizing Analytics

Introduction

The two pillars of online analytical processing (OLAP) are logical design and physical implementation. With OLAP, we want to enable end users to create different reports and analyze data. The tool must be able to write queries based on the report design, and the queries must be executed at lightning speed. The star schema logical model enables tools, like Power BI, to write queries automatically. Columnar storage is the physical storage implementation by Microsoft that provides the appropriate performance. This chapter introduces the two pillars of OLAP. It also describes the building blocks and the history of Power BI.

Structure

In this chapter, let us discuss the following topics:

Logical design – star schema

Dimensions and fact tables

Columnar storage

Introducing Power BI and its history

Logical design – star schema

The relational model is the logical model commonly used for modeling the schema of a database for an online transactional system (OLTP). In an OLTP, or in short, in a transactional system, the data is written to the database in real-time, or online. The data must comply with integrity rules, it must be in accordance with business rules. Therefore, the relational model is ideal for line of business (LOB) applications, where data integrity is crucial. LOB applications are applications that are important for an enterprise to conduct its day-to-day operations. An example would be an application for an enterprise resource planning (ERP) system.

The relational model, contrived in 1970 by Edgar F. Codd, enforces data integrity through data types, database schema, declarative constraints, and even programming code through triggers and procedures. When you design a relational database, you go through some predefined processes, like normalization and specialization. In both of these two processes, you are creating more and more tables. Let us not get into the details of relational modeling. The point here is to show that after the model is finished, it is not very suitable for reports and analysis. The big problem is that you might finish with hundreds or even thousands of tables for even a quite small LOB application.

Before continuing, let’s do a recap. The relational model is definitely a great model for LOB applications, and for any other applications that need to enforce data integrity. However, querying the model is complex. You need to join multiple tables for even quite simple reports. You must first know where the data you need is stored, which tables and which columns you need to use in the report. Figure 1.1 shows a small example of a relational model. The figure has been created in SQL Server Management Studio (SSMS) for the tables in the Sales schema in the AdventureWorks2019 database. AdventureWorks2019 is a demo database by Microsoft, showing how to model a database for LOB applications:

Figure 1.1: Sales Part of the AdventureWorks2019 Database Schema

If you count the tables, you can see that there are already seventeen (17). There is no simple query from this schema. Remember, in a real-world system that supports sales, you could have far larger number of tables. Unfortunately, the number of tables is not the only problem. The relational model does not enforce a naming convention. From the theoretical perspective, it is perfectly OK if you use names like Table001, Table002, Column001, Column002, and so on. Some databases really use similar, from the business perspective, meaningless names. The applications add labels and captions to the user interface, so it turns out that learning a querying language, like Transact-SQL (or T-SQL) for SQL Server databases, is not the biggest problem. The biggest problem is to learn where, in which columns of which tables, can you find the data you need for your report.

Business always wants to analyze the data over time. Business people want to compare this year's sales with the previous year's sales. They want to follow the behavior of their customers over time. Unfortunately, the databases that support LOB applications frequently store current data only, with no history. This is another problem for creating reports.

Another issue worth mentioning is data quality. A LOB system can live with a small amount of data that must be correct occasionally only. For example, you need to have existent and correct data about a customer only when you are in a contact with this customer, for example, when the customer places an order, when you ship the goods, or when you send an invoice. The data might have been missing a moment before the contact and might be updated incorrectly a moment after the contact. And in a LOB application, you need a few pieces of data only. For example, if the customer is a company, the name of the company, the address, the VAT ID, and probably nothing else. However, for analysis, data like the size of the company, measured by the number of employees, or gross income, would be very welcome.

There are never enough reports. In a modern company, end users must be able to create their own reports. The analytics must not be limited to IT people only; the analytics must be democratized. Of course, you cannot expect end users to learn a query language, and even more to know where to find the data for their analysis in the database. The tool that the end users use to create the reports should be able to create the queries automatically. The users should be able to create reports through the graphical user interface (GUI), and the tool should create the queries based on this report design. Ideally, the queries should be processed extremely quickly, and return the results instantly. We want to have an analysis in real-time. We want to have an online analytical processing or OLAP system.

We are at a dead end here. IT professionals are not able to create queries from a complex relational model. End users are not able to create the queries even more. So how could a tool be so smart to create the queries needed? Apparently, we need a new data model. Of course, we cannot change the design of the databases that support LOB systems. This means that we need to create new databases with a new model, extract the data from the sources, transform it for the destination schema, and load it into the destination database. The standard acronym for this extract-transform-load process is ETL. But what kind of model do we need?

Without any further hesitation, let’s see what a star schema is. The star schema was formally introduced by Ralph Kimball in 1996, although there were some analytical applications that used the same or a similar design already earlier. Figure 1.2 shows the star schema for the sales through the reseller channel, created from the AdventreWorksDW2019 demo database:

Figure 1.2: Star Schema for the Reseller Sales

You can immediately see how the star schema got its name. We have a single central table and multiple surrounding tables. Note that the central table is on the many side of every single relationship, meaning that many rows from the fact table are associated with a single row from a dimension table. Now a tool can create queries automatically. The tool must find the central table from the metadata and then join the surrounding tables. Let us see what exactly is included in a star schema.

Dimensions and fact tables

Star schema is an informal standard for designing analytical databases. A star schema consists of two types of tables only: dimensions and fact tables. In addition, the informal standard also defines the column types in these two tables. Let us start the description of these two table types with an example of a fact table. Figure 1.3 shows the dbo.FactResellerSales table from the AdventureWorksDW2019 demo database:

Figure 1.3: The dbo.FactResellerSales Table

Let us look at the columns. The first nine columns are foreign keys. The next two columns form the primary key of the table. A primary key in a table uniquely identifies each row. A foreign key is a primary key from another table, called the parent table, imported into a child table to maintain the associations between rows in both tables. For example, CustomerID might be the primary key in the Customers table and is imported as a foreign key into the Orders table to maintain the relationship between customers and their orders. All of these eleven columns can be grouped into a single group: keys. Keys are the first type of columns in the fact table. The next type of column is the column that is measuring something. They are called measures. Starting with OrderQuantity and up to Freight, we have ten measures in the table. Measures are the columns you are analyzing, and the columns you are aggregating. Keys and measures are the two main types of columns in a fact table. In the example, you can see some other columns. Columns RevisionNumber, CarrierTrackingNumber and CustomerPONumber help us by connecting the rows to the source system. We can follow immediately why the rows came into our system, for example, some rows came based on a specific customer’s purchase order. Such columns are called lineage columns. In the lineage columns, you could also follow who and when transferred the rows from the source system to the analytical database. Finally, the last three columns are just the three dates converted to the datetime data type from the original date key columns, which are integers. These columns are not really needed in the table; they do not bring any additional information, but they can simplify the reports.

There are three types of measures. The most useful measures for analyses are additive measures. Additivity means that you can use the SUM aggregate function for aggregating across any dimension. The SalesAmount column from the dbo.FactResellerSales table is an example of an additive measure.

Non-additive measures are measures for which the SUM function makes no sense across any dimension. The UnitPrice column is an example. Summing prices does not make sense either across products, or across resellers, and also not over time.

The third kind of measure is the semi-additive measure. Nearly-additive would probably be a better name because these measures are additive across every single dimension except time. All kinds of levels, like the stock level of a product, are semi-additive measures. In the dbo.FactResellerSales table, there is no example of a semi-additive measure. Aggregating semi-additive measures is a bit more complicated because you have to use SUM across all dimensions but time. Over time, you can use the last known state as the aggregate. For example, the aggregate of the stock level of a product for a month would be the level on the last date of that month when the level is known. Let’s say that today is March 23rd, 2023. The aggregate for February 2023 would be the state on February 28th, and the aggregate for March would be the state on March 22nd or March 23rd if the last data from March 23rd was already transferred.

Please note that star schema is not a proper relational schema anymore; it is denormalized. The dbo.FactResellerSales table does not comply with the second normal form, which states that all non-key columns must be dependent on the full key. A key can be a composite key, consisting of more than one column. The key columns for the composite key of the table are the SalesOrderNumber and the SalesOrderLineNumber columns. For many foreign key columns, you need to know the SalesOrderNumber column value only, to get the value of the column. For example, you do not need SalesOrderLineNumber to know the values of the OrderDateKey, DueDateKey and ShipDateKey columns.

Now let us focus on the dimensions. Let me start with the date (or time) dimension. This dimension is present in practically every star schema. It is a somewhat privileged dimension. Because it is always there, Power BI introduces many functions in the Data Analysis eXpressions (DAX) language, specialized to work with dates. For example, the PARALLELPERIOD() DAX function returns the dates from the parallel period of the current context date. The PARALLELPERIOD(DimDate[DateKey], -1, year) returns the same date of the previous year for every date of the current year. Figure 1.4 shows the dbo.DimDate table:

Figure 1.4: The dbo.DimDate Table

Because all dates are known, you can populate the date dimension in advance. Like in the dbo.DimDate, you can add many calculated columns that create natural hierarchies for aggregates. In the dbo.DimDate, you can create two hierarchies from the last six columns:

CalendarYear – CalendarSemester – CalendarQuarter – month - date and

FiscalYear – FiscalSemester – FiscalQuarter – month – date.

Natural hierarchies are very useful because they form a natural drill-down path. Users many times analyze data with the divide-and-conquer method. Imagine there is a problem with the total sales amount. You need to find the root of the problem. First, you check sales over years, then you find a year with suspicious numbers. You drill down to semesters, quarters, and months to find the critical months.

Natural hierarchies in the dbo.DimDate table breaks the third normal form. Non-key columns should be mutually independent, they should depend on the key only. However, if you know the month, you know the quarter and the semester. But don’t worry, everything is fine with star schema denormalization. You are not building a new relational model used in the source systems; you are building a model that is suitable for analyses.

In Figure 1.5, you can see the dbo.DimProduct table. However, you can also notice two associated tables, namely the dbo.DimProductSubcategory and dbo.DimProductCategory tables:

Figure 1.5: The dbo.DimProduct Table

For the products, the natural hierarchy category – subcategory – product is kept in lookup tables. This dimension complies with the third normal form, it is not denormalized. We also say that we have the snowflake schema. For Power BI, you should always try to target star schema. Snowflake schema brings more tables. More tables mean a more complex user interface for creating reports, slower queries, and more work with the ETL process. Therefore, let us repeat: Use star schema always and join the data from the associated tables in source queries or the ETL process.

In a dimension, we also have standard column types. First, we have the key, which is in this example the ProductKey column. Then we have column(s) that give the name to a member of the dimension. In the example, we have EnglishProductName, but also translations SpanishProductName and FrenchProductName for the name columns. Then we have columns that are used for analyses, to give the context to aggregated measures. For example, the Color column can be used to aggregate sales amounts over colors. These columns are called attributes in the star schema terminology. Some attributes can form natural hierarchies. Finally, we can have columns called member properties. These columns are not used for analyses, they are included only if we need them on the reports. For example, the SafetyStockLevel column would probably never be used for aggregations but could be useful as an additional piece of info in a report. Another, maybe even better example, would be a customer’s e-mail address.

There is another possible approach – hybrid star and snowflake schema. You can see how the geography data model part is solved in Figure 1.6:

Figure 1.6: Hybrid Schema for Geography

There are two fact tables, dbo.FactInternetSales and dbo.FactResellerSales. They are related to appropriate dimensions, the dbo.DimReseller and dbo.DimCustomer tables. For both customers and resellers, we have the same geography data. There is only one level of the snowflake lookup table, the dbo.DimGeography table, which is connected with foreign keys to both aforementioned dimensions. The denormalization starts in this table, with the hierarchy of country – region – city. This model is quite reasonable; however, for the sake of simplicity, we still prefer a complete star schema.

One star schema covers one business area. In the example we used so far, the schema covers reseller sales. What about internet sales? Or product inventory? In the AdventureWorksDW2019 database, there are more fact tables. A single-star schema has a single fact table. Multiple star schemas are connected through shared dimensions. For example, we could create a product inventory star schema around the dbo.FactProductInventory table.

Multiple star schemas are connected through shared dimensions. In our example, reseller sales and product inventory schemas share the product dimension. However, they do not share the reseller dimension. All the star schemas in a single database usually share the date dimension.

A database with multiple star schemas, an explicit date dimension, a good naming convention, and cleansed data in the tables, is called a data warehouse. A data warehouse is stored in a relational database management system (RDBMS), like Microsoft SQL Server. In an enterprise, a data warehouse is typically the source for all kinds of reports and analyses, including Power BI reports. The development of Power BI reports is much faster if a data warehouse is a source. Note also that we mentioned a good naming convention. The names of the tables and the columns are frequently shown on the reports directly. Many times, there is no client application in between that would add descriptive labels. Power BI allows renaming of the columns for clearer reports; however, a good naming convention still helps you developing the reports quickly. In addition, you might use another analytical tool using the same data warehouse data which does not allow renaming. In such a case, a good naming convention in a DW also helps keeping the same name for a single column in different tools.

However, developing and maintaining a data warehouse is quite exhaustive and expensive. Many times, you need ad-hoc reports for the transactional data. Developing a data warehouse, although highly recommended in longer terms, might not be feasible for quick reporting. This is no problem for Power BI; you can query the transactional data directly, and create the star schema model in Power BI, as you will see in the next chapters of this book. However, the process of building the Power BI dataset might be much longer and tiring compared to building it from a data warehouse.

Even if you start with a data warehouse, most of the time you will have to do some additional data modeling in Power BI. An RDBMS-like SQL Server does not have enough metadata needed for analyses. For example, we see the columns that create natural hierarchies in the dbo.DimDate table. However, the SQL Server knows nothing about hierarchies. Look at the tabular report in Figure 1.7:

Figure 1.7: An Example of a Problematic Report

You can immediately notice multiple problems in this report. For the measure, the reseller key is used, and it is summarized over product categories and then subcategories. What does the sum of reseller keys mean? Nothing, of course. And note that the aggregation order is incorrect, it does not follow the natural hierarchy category -> subcategory. The order is turned around to subcategory -> category.

The report was created in Power BI Desktop; the data is imported from the AdventureWorksDW2019 database. Here, we have imported the following tables: dbo.FactResellerSales, dbo.DimDate, dbo.DimProduct, dbo.DimProductSubcategory and dbo.DimProductCategory. Star (or better snowflake) schema was inherited, as you can see in Figure 1.8:

Figure 1.8: The Data Model in Power BI Desktop

Of course, it would be possible to create a meaningful report as well, as Figure 1.9 shows:

Figure 1.9: A Meaningful Report

If you are creating a dataset for personal usage only, for a quick report, you might be satisfied by inheriting the star schema from your data warehouse. However, if you share the dataset you create in Power BI with multiple users, you might want to prevent the creation of such meaningless reports. Therefore, additional modeling in Power BI is needed.

Don’t bother for now how the data was transferred to Power BI Desktop and how the two reports were created. You will learn this in the following chapters.

Columnar storage

The logical model is just a part of the story. If we want to achieve real online analytics, then the queries must execute at lightning speed. Modern RDBMSs are pretty fast. Still, they are not fully optimized for querying the data; the optimization balances the speed of queries with the speed of modifications. In addition, the systems must take care of data integrity, they must check the constraints and maintain multiple commands together as a single transaction. A lot of locking is involved. For example, the SQL Server cannot allow lost updates; therefore, any row that is currently being updated is exclusively locked until the end of the transaction. These facts and many more things influence the performance of queries.

Data compression helps to lower disk input and output (disk IO). Disk IO is read-and-write operations or transfers of data between a random-access memory (RAM) and a disk. Compression speeds up reading the data. Nevertheless, there is no free lunch. Compression slows down updates and raises the load on the processors because data must be decompressed before the update and recompressed after the update. SQL Server supports two levels of compression: Row compression and page compression. We will test both of them in order to compare them with columnar storage (columnstore) compression used in Power BI, which Microsoft labels as tabular model technology, to get a feeling of how good the latter is.

Row compression means that every column is stored in variable length format, using only the minimal number of bytes possible. For example, the integer data type has a fixed length of four bytes. Not all integers need four bytes. Smaller numbers can occupy a single, two, or three bytes only. For Unicode character data, we calculate that one character on an average occupies two bytes; yet, many characters can take a single byte only. Row compression uses variable length storage for all columns, including those defined with fixed-length data types, and does the Unicode compression of strings as well. You can immediately imagine that the actual ratio of compression depends on the data. If you are already using variable-length data types and non-Unicode strings only, then row compression cannot help much.

Page compression is built on top of row compression. It adds dictionary and prefix compressions. For the dictionary compression, SQL Server creates a mini dictionary in every single 8kB page, where it stores common substrings from multiple columns in multiple rows on the same page, and stores just the pointers to these substrings in the original data values. Prefix compression is somehow similar. For character data, prefixes are just substrings at the beginning of the strings. However, prefix compression work on numbers and dates as well. For example, you might have numbers 1000, 1001, 1002, 1010, and 1020 in your data. You can store the prefix 1000 in the dictionary, and values 0, 1, 2, 10, and 20 in the original cells.

While row compression might be viable for systems with many updates, page compression is already not recommended for such systems. Let us now test both compressions. We are starting by creating four tables in the AdventureWorksDW2019 demo database:

USE AdventureWorksDW2019;

SET NOCOUNT ON;

-- Tables for the compression test

CREATE TABLE dbo.UniqueIntegers

(col1 INT NOT NULL);

CREATE TABLE dbo.UniqueGUIDs

(col1 UNIQUEIDENTIFIER NOT NULL);

CREATE TABLE dbo.BigCardinalityIntegers

(col1 INT NOT NULL);

CREATE TABLE dbo.SmallCardinalityStrings

(col1 NVARCHAR(10) NOT NULL);

You can see that there is one column only in every single table. Take a closer look at the data types. We will store unique integers in the first table, globally unique identifiers (GUIDs) in the second table, integers with high cardinality (many distinct values, but not unique) in the third table, and strings with low cardinality in the fourth table. The following code inserts the data:

-- Insert the data

-- Unique integers

INSERT INTO dbo.UniqueIntegers(col1)

SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rn

FROM

(SELECT TOP 1000 CustomerKey

FROM dbo.DimCustomer) AS c1(col1)

CROSS JOIN

(SELECT TOP 1000 CustomerKey

FROM dbo.DimCustomer) AS c2(col2);

-- GUIDs

INSERT INTO dbo.UniqueGUIDs(col1)

SELECT NEWID()

FROM dbo.UniqueIntegers;

-- High cardinality integers

INSERT INTO dbo.BigCardinalityIntegers(col1)

SELECT (col1 - 1) / 50 +1 AS col1

FROM dbo.UniqueIntegers;

-- Low cardinality strings

INSERT INTO dbo.SmallCardinalityStrings(col1)

SELECT CAST(col1 % 5 AS NCHAR(1)) + N'AAAA' AS col1

FROM dbo.UniqueIntegers;

The previous code inserts one million rows in every table. Before checking the data, let us also create the clustered indices for the four tables. Clustered indices reorganize the data in a table in a logically sorted structure called a balanced tree, or a B-tree:

-- Creating clustered indices

CREATE CLUSTERED INDEX cix_UniqueIntegers

ON dbo.UniqueIntegers(col1);

CREATE CLUSTERED INDEX cix_UniqueGUIDs

ON dbo.UniqueGUIDs(col1);

CREATE CLUSTERED INDEX cix_BigCardinalityIntegers

ON dbo.BigCardinalityIntegers(col1);

CREATE CLUSTERED INDEX cix_SmallCardinalityStrings

ON dbo.SmallCardinalityStrings(col1);

To get the impression of cardinality, let us count the number of distinct values in the only column of every single table:

-- Check the cardinality

SELECT '1. unInt' AS testData, COUNT(DISTINCT col1) AS cntDist

FROM dbo.UniqueIntegers

UNION

SELECT '2. GUIDs' AS testData, COUNT(DISTINCT col1) AS cntDist

FROM dbo.UniqueGUIDs

UNION ALL

SELECT '3. bcInt' AS testData, COUNT(DISTINCT col1) AS cntDist

FROM dbo.BigCardinalityIntegers

UNION ALL

SELECT '4. lcStr' AS testData, COUNT(DISTINCT col1) AS cntDist

FROM dbo.SmallCardinalityStrings

ORDER BY testData;

Here are the results:

testData cntDist

-------- -----------

1. unInt 1000000

2. GUIDs 1000000

3. bcInt 20000

4. lcStr 5

You can see that there are only five distinct strings in the fourth table. Now it is time to test row and page compressions. Instead of doing the actual compression, we are just estimating the compression savings with the sys.sp_estimate_data_compression_savings system stored procedure:

-- Estimating row and page compression savings