Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
By Brian Knight, Devin Knight, Adam Jorgensen and
3/5
()
About this ebook
If you are just starting to get a handle on Microsoft Business Intelligence (BI) tools, this book and accompanying video provides you with the just the right amount of information to perform basic business analysis and reporting. You'll explore the components and related tools that comprise the Microsoft BI toolset as well as the new BI features of Office 2010. After a basic primer on BI and data modeling, the expert team of authors provides you with step-by-step lessons in the book and videos on the accompanying DVD on how to use SQL Server Integration Services, SQL Server Analysis Services, SQL Server Reporting Services, Excel BI (including PowerPivot), and SharePoint.
- Integrates instructional videos with each of the lessons found in the book to enhance your learning experience
- Explores the Microsoft Business Intelligence (BI) toolset as well as the new BI features of Office 2010
- Encourages you to practice what you've learned in "Try It Out" sections
- Contains video demonstrations that walk you through how to tackle each lesson featured in the book
With Knight's Microsoft Business Intelligence 24-Hour Trainer, veteran authors present you with an ideal introductory book-and-video package so that you can get started working with the BI toolset immediately!
Note: As part of the print version of this title, video lessons are included on DVD. For e-book versions, video lessons can be accessed at wrox.com using a link provided in the interior of the e-book.
Read more from Brian Knight
Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer Rating: 0 out of 5 stars0 ratingsPro Microsoft Power Platform: Solution Building for the Citizen Developer Rating: 0 out of 5 stars0 ratingsProfessional Microsoft SQL Server 2012 Administration Rating: 0 out of 5 stars0 ratingsProfessional Microsoft SQL Server 2014 Integration Services Rating: 0 out of 5 stars0 ratingsProfessional Microsoft SQL Server 2014 Administration Rating: 0 out of 5 stars0 ratingsFeral Rating: 0 out of 5 stars0 ratingsProfessional Microsoft SQL Server 2008 Administration Rating: 0 out of 5 stars0 ratingsProfessional Microsoft SQL Server 2012 Integration Services Rating: 0 out of 5 stars0 ratingsChildren of Filth: Tulpa Chapbook Series Rating: 0 out of 5 stars0 ratingsThey Call Us Monsters: An Omnibus Rating: 0 out of 5 stars0 ratingsBroken Angel Rating: 0 out of 5 stars0 ratingsDragonfly Rating: 0 out of 5 stars0 ratingsHacks Rating: 0 out of 5 stars0 ratingsMidnight Blues: Tulpa Chapbook Series Rating: 0 out of 5 stars0 ratingsJohnny Junk: Tulpa Chapbook Series Rating: 0 out of 5 stars0 ratingsDangerous Toys Rating: 0 out of 5 stars0 ratingsApocalypse Green: Tulpa Chapbook Series Rating: 0 out of 5 stars0 ratingsHeart of the Monster: Tulpa Chapbook Series Rating: 0 out of 5 stars0 ratingsDeath is Blind: Tulpa Chapbook Series Rating: 0 out of 5 stars0 ratings
Related to Knight's Microsoft Business Intelligence 24-Hour Trainer
Related ebooks
Applied Microsoft Business Intelligence Rating: 3 out of 5 stars3/5Big Data: Understanding How Data Powers Big Business Rating: 2 out of 5 stars2/5Data Visualization Strategy Standard Requirements Rating: 0 out of 5 stars0 ratingsHow to successfully implement an ERP Rating: 0 out of 5 stars0 ratingsSelf-Service Data & Analytics Third Edition Rating: 0 out of 5 stars0 ratingsBusiness Dashboards A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsBusiness Administration A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsBi Tools A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsData Analyst A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsIT Demand Management A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsThe Virtual Sales Handbook: A Hands-on Approach to Engaging Customers Rating: 0 out of 5 stars0 ratingsMarketing Analytics A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsInformation technology consulting The Ultimate Step-By-Step Guide Rating: 0 out of 5 stars0 ratingsThe Key to Successful Data Migration: Pre-Migration Activities Rating: 0 out of 5 stars0 ratingsData Visualization A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsMultichannel Customer Analytics The Ultimate Step-By-Step Guide Rating: 0 out of 5 stars0 ratingsInteractive Data Visualization A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsBusiness Relationship Management A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsDigital Operating Model: The Future of Business Rating: 0 out of 5 stars0 ratingsBusiness Patterns for Software Developers Rating: 4 out of 5 stars4/5Demand Forecasting A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsAI, ML, and Knowledge Management Unite: Unleashing the Power Rating: 0 out of 5 stars0 ratingsTaming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics Rating: 4 out of 5 stars4/5Making Big Data Work for Your Business: A guide to effective Big Data analytics Rating: 0 out of 5 stars0 ratingsData Marts A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsKnowledge Base Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratingsThe Growth Code: The Key to Unlocking Sustainable Growth in any Modern Business Rating: 0 out of 5 stars0 ratingsCommunication Skills: 3-in-1 Guide to Master Business Conversation, Email Writing, Effective Communication & Be Charismatic Rating: 0 out of 5 stars0 ratingsCustomer insight Complete Self-Assessment Guide Rating: 0 out of 5 stars0 ratings
Computers For You
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsAlan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsThe Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsCreating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Childhood Unplugged: Practical Advice to Get Kids Off Screens and Find Balance Rating: 0 out of 5 stars0 ratingsAP Computer Science Principles Premium, 2024: 6 Practice Tests + Comprehensive Review + Online Practice Rating: 0 out of 5 stars0 ratingsCompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Going Text: Mastering the Command Line Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5Remote/WebCam Notarization : Basic Understanding Rating: 3 out of 5 stars3/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5
Reviews for Knight's Microsoft Business Intelligence 24-Hour Trainer
1 rating0 reviews
Book preview
Knight's Microsoft Business Intelligence 24-Hour Trainer - Brian Knight
Chapter 1
Why Business Intelligence?
Congratulations on your choice to explore how Business Intelligence can improve your organization’s view into its operations and uncover hidden areas of profitability and analysis. The largest challenges most organizations face around their data are probably mirrored in yours. Challenges include:
Data is stored in a number of different systems on different platforms, such as inventory and logistics in SAP, financials in Oracle, web analytics in SQL Server, and manufacturing on the mainframe. This can make data difficult to get to, require multiple accounts for access, and keep teams and departments at arm’s length from each other.
Pockets of knowledge about the data are spread throughout teams that don’t regularly interact. This spread causes data to be analyzed in different ways and metrics to be calculated inconsistently, which, in turn, leads to unpredictable analysis and inappropriate actions being taken based on the data.
Documentation is limited or nonexistent. Many times documentation is not created for reports or the metadata underneath them, and this lack of documentation is a critical problem. If you don’t know where the data is coming from for the report, or how certain metrics are being calculated, then you can’t truly understand or communicate the value of the data and calculations.
Consolidated reporting is very time-consuming, when it is possible at all. With reports coming from so many different places, you run into the same problems mentioned in the previous point. These challenges require more people to know different reporting systems, lead to more administrative headaches, and so on.
Reporting teams spend significant time finding and aligning data instead of analyzing and mining it for actionable information. If reporting teams need to go out and gather data from across the company constantly, this doesn’t leave much time for analyzing and interpreting the data. These challenges cause many reporting teams to rework large portions of their reports several times as opposed to spending that time understanding what the users are asking for and delivering more actionable information.
How Intelligent Is Your Organization?
Business Intelligence (BI) is a term that encompasses the process of getting your data out of the disparate systems and into a unified model, so you can use the tools in the Microsoft BI stack to analyze, report, and mine the data. Once you organize your company’s data properly, you can begin to find information that will help you make actionable reports and decisions based on how the data from across your organization lines up. For instance, you can answer questions like, How do delays in my manufacturing or distribution affect my sales and customer confidence?
Answers like this come from aligning logistics data with sales and marketing data, which, without a Business Intelligence solution, would require you to spend time exporting data from several systems and combining it into some form that you could consume with Excel, or another reporting tool.
Business Intelligence systems take this repetitive activity out of your life. BI automates the extracting, transforming, and loading (ETL) process and puts the data in a dimensional model (you’ll create one in the next two lessons) that sets you up to be able to use cutting-edge techniques and everyday tools like Microsoft Excel to analyze, report on, and deliver results from your data.
Getting Intelligence from Data
How do you get information from data? First, you need to understand the difference. As you learned earlier, data can come from many different places, but information requires context and provides the basis for action and decision-making. Identifying your data, transforming it, and using the tools and techniques you learn from this book will enable you to provide actionable information out of the mass of data your organization stores. There are several ways to transform your data into actionable information and each has its pros and cons.
Typical solutions for reporting include a few different architectures:
Departmental reporting: Many organizations have their own departmental reporting environments. This situation leads to a significant increase in licensing costs, since using different vendors for each department and reporting environment increases spending on hardware and software licensing, end-user training, and ramp-up time.
Individual data access: Some organizations find it easier to grant lots of individual users access to the data. This is not only dangerous from a security perspective, but likely to lead to performance problems, because users are not the most adept at creating their own queries in code. Also, with all the industry and federal compliance and regulation governing data access, widespread access can quickly lead to a security audit failure, especially in a publicly held company.
BI add-on from each vendor: When teams seek out and apply different strategies, it exacerbates the original problem of data being all over the organization. The data will still be segmented, and additionally the analysis on it will be inconsistent and applied based on each team’s individual understanding of how its data fits into the enterprise, instead of the correct view based on the organizational goals.
Automated reports from different systems: It may be nice to get the automated reports and they likely serve a purpose, but they usually cannot be counted on to run an enterprise. Strategic reporting, dashboard drill-through, and detailed analysis require a BI implementation to support them and provide the at your fingertips
data and analysis that your end users, managers, and executives are craving.
You have likely seen some form of all of these problems in your organization. These are the opposite of what you want to accomplish with a great BI infrastructure.
BI to the Rescue
A well-thought-out BI strategy will mitigate the problems inherent to each of the previously listed approaches. A good BI approach should provide the targeted departmental reporting that is required by those end users while adjusting the data so it can be consumed by executives through a consolidated set of reports, ad hoc analysis using Excel, or a SharePoint dashboard. Business Intelligence provides a combination of automated reporting, dashboard capabilities, and ad hoc capabilities that will propel your organization forward.
BI provides a single source of truth that can make meetings and discussions immediately more productive. How many times have you gotten a spreadsheet via e-mail before a meeting and shown up to find that everyone had his or her own version of the spreadsheet with different numbers? Business Intelligence standardizes organizational calculations, while still giving you the flexibility to add your own and enhance the company standard. These capabilities allow everyone to speak the same language when it comes to company metrics and to the way the data should be measured across the enterprise or department.
Integrating Business Intelligence with your organization’s current reporting strategy will improve the quality of the data as well as the accuracy of the analysis and the speed at which you can perform it. Using a combination of a data warehouse and BI analytics from Analysis Services and Excel, you can also perform in-depth data mining against your data. This enables you to utilize forecasting, data-cluster analysis, fraud detection, and other great approaches to analyze and forecast actions. Data mining is incredibly useful for things like analyzing sales trends, detecting credit fraud, and filling in empty values based on historical analysis. This powerful capability is delivered right through Excel, using Analysis Services for the back-end modeling and mining engine.
BI = Business Investment
A focused Business Intelligence plan can streamline the costs of reporting and business analytics. The Microsoft BI stack does a great job of providing you with the entire tool set for success within SQL Server Enterprise Edition. We provide more details on that shortly, but the most important bit of information you should take away right now is that the cost of managing multiple products and versions of reporting solutions to meet departmental needs is always higher than the cost of a cohesive strategy that employs one effective licensing policy from a single vendor. When organizations cannot agree or get together on their data strategy, you need to bring them together for the good of the organization. In the authors’ experience, this single, cohesive approach to reporting is often a gateway to a successful BI implementation. Realizing the 360-degree value of that approach and seeing the value it can have in your organization are the two most important first steps.
Microsoft’s Business Intelligence Stack
Microsoft’s Business Intelligence stack comes with SQL Server and is greatly enhanced with the addition of SharePoint Server.
SQL Server Enterprise includes industry-leading software components to build, implement, and maintain your BI infrastructure. The major components of the Microsoft BI stack that are included with SQL Server are the following:
SQL Server Database Engine
SQL Server Integration Services (SSIS)
SQL Server Analysis Services (SSAS)
SQL Server Reporting Services (SSRS)
These programs work together in a tightly integrated fashion to deliver solutions like those you’ll build in this book. See Figure 1-1 for more details.
Figure 1-1
missing image fileIn Figure 1-1 you see the layers of Microsoft’s Business Intelligence stack. SharePoint is at the top as the most end user–facing program for reporting, dashboard, and analytic capabilities. On the next level down you see the more common end user tools, and continuing down you can see the development tools, the core components, and some of the multitude of potential data sources you can consume with the products we discuss in this book.
BI Users and Technology
Different levels of users will have different sorts of questions around which they will use these BI technologies we are discussing. To see this at a glance, review Table 1-1. In that table you can see a breakdown that helps answer which users will rely on which tool for their reporting and data analysis and makes clear that Microsoft’s Business Intelligence stack does address the needs of users at every level.
Table 1-1
Try It
Your Try It for this lesson is a bit different than most others in the book. Throughout the book you will be challenged with hands-on tasks to enhance your understanding. For this lesson, your Try It is to make sure the things you learned in this lesson are in your mind as you learn the technologies to apply them. For instance, ask yourself these questions as you go through the rest of the book.
What systems in my organization could I tie together using SSIS?
What types of analysis could be made easier with the tools in Analysis Services?
What users would benefit from the types of reporting I can do in Reporting Services and Excel?
What types of data in my organization would be useful for dashboard- and portal-based analytics in SharePoint and PerformancePoint?
If you can keep these things in mind as you’re learning and developing, you will succeed in harnessing the goals of a Business Intelligence implementation as you move through the data in your organization.
missing image fileAs this chapter is just an introductory overview, it does not have an accompanying video.
Chapter 2
Dimensional Modeling
Dimensional modeling is the process you use to convert your existing OLTP data model to a model that is more business-centric and easier for Business Intelligence tools to work with. Tools like SSIS, Analysis Services, and the others you’ll learn about in this book are geared specifically toward variations of this type of model. In this lesson you will learn what makes a dimensional model different and then have the opportunity to convert a simple model yourself.
As seen in Figure 2-1, the OLTP model is highly normalized. This is to enhance the quick insertion and retrieval of data. The goal in designing a data warehouse or star schema is to denormalize the model in order to simplify it and to provide wider, more straightforward tables for joining and data-retrieval speed. This denormalization allows you to model
the database in a business-focused way that users can understand, and dramatically increases performance of the types of analytical queries that we’re performing.
Why do you need to do this denormalization in order to report on your data, you may ask? The largest reason is that you need to consolidate some of the redundancy between tables. Consolidating redundancy will put the database into a star schema layout, which has a central fact table surrounded by a layer of dimension tables, as shown in Figure 2-2.
As you can see in Figure 2-2, we have abstracted out tables such as DimProduct, DimCustomer, DimPromotion, and DimDate and put the additive and aggregative data, like sales amounts, costs, and so on into a single fact table, FactInternetSales (more on fact tables in Lesson 3; for now focus on the dimensions). This abstraction allows you to implement a number of important elements that will provide great design patterns for dealing with the challenges discussed later in this chapter.
Figure 2-1
f0201.tifMoving from the OLTP model to a dimensional model is important for a number of reasons, not the least of which is performance, but within the dimensional model we can handle many situations with the data that are very difficult, if not impossible, to handle in a more typical OLTP third-normal-form model. Some of these situations are:
Slowly changing dimensions: How do you handle historical changes and reporting, for instance, if someone’s last name changes and you need to show the old last name on the historical reporting, and the current on the new reporting? How can you efficiently handle that situation in a highly normalized fashion? This would involve multiple tables and require some complicated updates. We will go over this later in the lesson.
Figure 2-2
f0202.tifRole-playing dimensions: How are you going to handle multiple calendars with multiple relationships to different tables if the calendars change? For instance, your fiscal calendar or accounting periods may change from year to year, but will need to maintain relationships to the previous methods for historical and trend analysis. More on this later.
Integrating primary key structures from multiple systems: Some systems will have alphanumeric keys, and others will have integer or composite keys. Integrating different key types is a real challenge that often leads to hybrid, system-crossing keys kept in sync with some complicated ETL. Hybrid keys are not necessary in the dimensional model because you abstract all of that.
Key Dimensional Modeling Elements
The key elements that make up the dimensional model system are as follows:
The dimensions in the model provide a single complete and historically accurate source for the data. For instance, in the example we discuss in this lesson the customer dimension has a record of all the customers and their historically accurate information based on the dates for which the record was accurate.
The solution you will see later in this lesson supports changes with the StartDate and EndDate columns in DimProduct implemented to track the effective dates of the rows in the table.
You’ll notice a new key on the dimension table called ProductKey and a column called ProductAlternateKey. These are added to support the new key structure put in place. There will be more on how to do this shortly. This structure provides portability for the warehouse and the ability to integrate numerous systems despite their key differences.
How Does Dimensional Modeling Work?
Before you try some dimensional modeling for yourself, we want to show you an example. For our example, we use the AdventureWorks2008R2 sample databases from Microsoft available at www.codeplex.com. We create a simple star schema from the reseller sales information in the OLTP version of the database. The tables we use from OLTP will be as follows:
Customer
Person
Address
The table we will create will be called DimCustomer.
First, take notice of the differences and key elements in Figures 2-3 and 2-4. Figure 2-3 shows the OLTP tables, and Figure 2-4 shows the new dimension table. We’ll walk you through the numbered items in Figure 2-4 to show you what key design elements we employed to make this a successful transition from normalized dimension data to a set of dimension tables.
1. New CustomerKey column to provide SurrogateKey. We are using a new column we created to provide the primary key for this new dimension table. Best practice is to add a suffix of SK
to the name of this column, so it would read CustomerSK or CustomerKeySK.
2. We have modified the primary key column that is coming over from the source OLTP system to act as the alternate key. All this means is that if we need to bring in data from several systems whose primary keys have overlapped or are in different formats, we can do it with a combination of our alternate (or business) key and our surrogate key CustomerKey.
3. Much of the demographic data and store sales data was also tapped to get columns like DateFirstPurchase and CommuteDistance so you can find out more about your customers. Some of these columns could be calculated in the ETL portion of your processing by comparing information like a work and home address, for example.
Figure 2-3
f0203.tifFigure 2-4
f0204.epsOnce the dimension tables are in place, you can easily see why this is a better model for working with large analytical queries and analysis. For instance, now if you refer to multiple customers in a single order, you need only one customer dimension with a fact table row that has multiple key relationships to the customer table. This is much better than having a bill-to customer table and a ship-to customer table to handle subsidiaries or other issues.
Multiple dates are also very common in most fact tables; for instance, an inventory fact table may have a product’s arrival date, ship date, expiration date, and return date. This requires multiple links to a product table for multiple columns; instead we can link directly to DimDate for these values with our numeric surrogate keys. Remember, these keys keep all the tables in the warehouse linked as your new key system.
You can see the StartDate and EndDate columns and how they control the historical loading. (The mechanics of historical loading are discussed in the SSIS lessons in Section II of this book.) These columns allow you to expire a row when a historical change is required. For instance, when a product line gets a new account manager, you would expire the current product line row and insert into the dimension table a new row with an EndDate of null that links to the new product manager. This way, all your historical reporting is accurate, and your current reporting is accurate as well. Otherwise, historical reporting could mistakenly tie sales to the wrong manager.
There are three main types of slowly changing dimensions:
Type I: Updates changing data only, no historical tracking
Type II: Tracks historical records, expires the row, and puts a new one into the table
Type III: Same as Type II, but only tracks a certain number of revisions
It is common to have columns from each type in the same table; for instance, if you need to track history on last names for employees, but not on their addresses, you may have a Type II LastName column and a Type I Address column. This is perfectly acceptable and common.
This design has also been proven to improve performance significantly since the main goal of a data warehouse or BI system is to extract data as quickly as possible. The more denormalized type of this model lends itself to the quick retrieval of data from the tables to serve to populate a cube, run a report, or load data into Excel. You’ll do all of these things in later lessons!
Here are some general design tips for working with your dimension tables:
Try to avoid unnecessary normalizing. In a star schema, this practice is called snowflaking and while sometimes it is warranted, it usually isn’t, unless your dimension table is so large it needs to be physically separated on the storage array.
CamelCasing, the capitalization of each word with no spaces or underscores, will help you down the road. You’re not working with Analysis Services yet, but you will be very soon, and CamelCasing will help streamline your work in SSAS. For now, you have to trust us.
Begin your table name with Dim (Dim
Don’t over index. Until you know what queries you’ll be running against the tables, don’t assume you need lots of indexes on all your tables.
Try It
In this Try It you’re going to take what you’ve just read about and apply it to create your own product dimension table with columns from a typical source OLTP system.
Lesson Requirements
The columns you put in your table are up to you, but your dimension will need to track history. Also, the dimension table will be getting data from other sources, so it will need to be able to handle that. You will create your table in SQL Server Management Studio.
Hints
Make sure to remember to use the right types of key columns for your new dimension table.
Remember the concepts this lesson discussed that are required for handling data coming from multiple systems.
Step-by-Step
1. The first thing you should do is identify some columns you might want in your table. Table 2-1 has a number of standard product dimension columns that you can pick from.
Table 2-1
Table 2-12. Now, in order to make these into a proper dimension table, you need to review your requirements. Your first requirement was to make sure you can track history, so you need to make sure you have a StartDate and EndDate column so you can expire rows as they become updated.
3. Your next requirement was to make sure the dimension table could handle data from multiple systems either now or in the future, which means you need to apply the best practice you learned about surrogate keys. This will add a ProductKeySK and a ProductAlternateKey column to the table as well.
The finished product should look something like Figure 2-5.
Figure 2-5
f0205.epsThis table will work with multiple systems with its surrogate key structure and will perform well if the rest of the warehouse follows similar best practices for design.
Congratulations, you have just designed your first dimension table. Don’t forget to remember these concepts and refer to them as they become relevant in the lessons in the rest of the book. Great job!
cd.aiPlease select Lesson 2 on the DVD with the print book, or watch online at www.wrox.com/go/vid24bi to view the video that accompanies this lesson.
Chapter 3
Fact Table Modeling
A fact table is modeled to be the center of the star schema in a data warehouse. It consists of two primary types of data:
Key columns, which point to the associated dimension tables
Data columns, which normally contain additive or numeric data representing a transaction or snapshot of a situation at a point in time. Numeric data is more common in financial or inventory situations.
You need fact tables because they allow you to link the denormalized versions of the dimension tables and provide a largely, if not completely, numeric table for Analysis Services to consume and aggregate. In other words, the fact table is the part of the model that holds the dollars or count type of data that you would want to see rolled up by year, grouped by category, or so forth. The fact table holds just the facts
and the keys to relate the needed dimension tables. Since many OLAP tools, like Analysis Services, look for a star schema model and are optimized to work with it, the fact table is a critical piece of the puzzle.
The process of designing your fact table will take several steps:
1. Decide on the data you want to analyze.
Will this be sales data, inventory data, or financial data? Each type comes with its own design specifics. For instance, you may have to load different amounts of data based on the type of analysis you’re doing.
2. Once you’ve identified your data type, pick the level of granularity that you’re seeking.
When you consider the question of granularity, or the grain
of each row, in the fact table, you want each row to represent a certain level of granularity. This means that you need to decide on the lowest level of analysis you want to perform. For example, each row may represent a line item on a receipt, a total amount for a receipt, or the status of a particular product in inventory for a particular day.
3. Decide how you will load the fact table (more on this in Lesson 9).
Transactions are loaded at intervals to show what is in the OLTP system. An inventory or snapshot fact table will load all the rows of inventory or snapshot-style data for the day, always allowing the user to see the current status and information based on the date the information was loaded.
Fact tables are often designed to be index light, meaning that indexes should be placed only to support reporting and cube processing that is happening directly on that table. It is a good idea to remember that your fact tables will often be much larger in row count and data volume than your dimensions. This means you can apply several strategies to manage the tables and improve your performance and scalability.
Implementing table partitioning: Table partitioning can significantly help your management of this type of larger fact table. Implementing a sliding-window partitioning scheme, where you roll off old partitions and roll on new ones periodically, can drive IO and query times down and access speeds up since you will be accessing only the specific areas of data on the disk that are needed for your query. Processing speeds for data loading and cube processing will also be faster since the new versions of SQL Server 2008 and 2008R2