Ebook436 pages4 hours

Fundamentals of Data Warehouses

Name: Fundamentals of Data Warehouses
Brand: Springer
Rating: 4.0 (1 reviews)

By Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou and Panos Vassiliadis

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

This book presents the first comparative review of the state of the art and the best current practices of data warehouses. It covers source and data integration, multidimensional aggregation, query optimization, metadata management, quality assessment, and design optimization. A conceptual framework is presented by which the architecture and quality of a data warehouse can be assessed and improved using enriched metadata management combined with advanced techniques from databases, business modeling, and artificial intelligence.

Skip carousel

LanguageEnglish

PublisherSpringer

Release dateMar 9, 2013

ISBN9783662051535

Author

Matthias Jarke

Related authors

Skip carousel

Related to Fundamentals of Data Warehouses

Related ebooks

Skip carousel

Computer Science and Ambient Intelligence
Ebook
Computer Science and Ambient Intelligence
byGaëlle Calvary
Rating: 0 out of 5 stars
0 ratings
Object-Oriented Technology and Computing Systems Re-Engineering
Ebook
Object-Oriented Technology and Computing Systems Re-Engineering
byH. S. M. Zedan
Rating: 0 out of 5 stars
0 ratings
Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture
Ebook
Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture
byXichuan Zhou
Rating: 0 out of 5 stars
0 ratings
Predictive Maintenance in Smart Factories: Architectures, Methodologies, and Use-cases
Ebook
Predictive Maintenance in Smart Factories: Architectures, Methodologies, and Use-cases
byTania Cerquitelli
Rating: 0 out of 5 stars
0 ratings
Virtual Research Environments: From Portals to Science Gateways
Ebook
Virtual Research Environments: From Portals to Science Gateways
byRobert N. Allan
Rating: 0 out of 5 stars
0 ratings
Dynamic Modelling of Information Systems
Ebook
Dynamic Modelling of Information Systems
byK.M. van Hee
Rating: 0 out of 5 stars
0 ratings
Requirements Engineering
Ebook
Requirements Engineering
byJeremy Dick
Rating: 2 out of 5 stars
2/5
Top-Down Digital VLSI Design: From Architectures to Gate-Level Circuits and FPGAs
Ebook
Top-Down Digital VLSI Design: From Architectures to Gate-Level Circuits and FPGAs
byHubert Kaeslin
Rating: 0 out of 5 stars
0 ratings
Taking the LEAP: The Methods and Tools of the Linked Engineering and Manufacturing Platform (LEAP)
Ebook
Taking the LEAP: The Methods and Tools of the Linked Engineering and Manufacturing Platform (LEAP)
byDimitris Kiritsis
Rating: 0 out of 5 stars
0 ratings
Data Analysis in the Cloud: Models, Techniques and Applications
Ebook
Data Analysis in the Cloud: Models, Techniques and Applications
byDomenico Talia
Rating: 0 out of 5 stars
0 ratings
Computation and Storage in the Cloud: Understanding the Trade-Offs
Ebook
Computation and Storage in the Cloud: Understanding the Trade-Offs
byDong Yuan
Rating: 5 out of 5 stars
5/5
Fault-Tolerant Systems
Ebook
Fault-Tolerant Systems
byIsrael Koren
Rating: 0 out of 5 stars
0 ratings
Industrial Sensors and Controls in Communication Networks: From Wired Technologies to Cloud Computing and the Internet of Things
Ebook
Industrial Sensors and Controls in Communication Networks: From Wired Technologies to Cloud Computing and the Internet of Things
byDong-Seong Kim
Rating: 0 out of 5 stars
0 ratings
Model Management and Analytics for Large Scale Systems
Ebook
Model Management and Analytics for Large Scale Systems
byBedir Tekinerdogan
Rating: 0 out of 5 stars
0 ratings
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
Ebook
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
byEMC Education Services
Rating: 0 out of 5 stars
0 ratings
Multicomputer Vision
Ebook
Multicomputer Vision
byS. Levialdi
Rating: 0 out of 5 stars
0 ratings
Embedded Systems: Analysis and Modeling with SysML, UML and AADL
Ebook
Embedded Systems: Analysis and Modeling with SysML, UML and AADL
byFabrice Kordon
Rating: 0 out of 5 stars
0 ratings
5G Networks: Planning, Design and Optimization
Ebook
5G Networks: Planning, Design and Optimization
byChristofer Larsson
Rating: 0 out of 5 stars
0 ratings
Managing the Web of Things: Linking the Real World to the Web
Ebook
Managing the Web of Things: Linking the Real World to the Web
byMichael Sheng
Rating: 0 out of 5 stars
0 ratings
Internet of Things: Technologies and Applications for a New Age of Intelligence
Ebook
Internet of Things: Technologies and Applications for a New Age of Intelligence
byVlasios Tsiatsis
Rating: 0 out of 5 stars
0 ratings
A Framework for Visualizing Information
Ebook
A Framework for Visualizing Information
byE.H. Chi
Rating: 0 out of 5 stars
0 ratings
Application of FPGA to Real‐Time Machine Learning: Hardware Reservoir Computers and Software Image Processing
Ebook
Application of FPGA to Real‐Time Machine Learning: Hardware Reservoir Computers and Software Image Processing
byPiotr Antonik
Rating: 0 out of 5 stars
0 ratings
Wireless Public Safety Networks 2: A Systematic Approach
Ebook
Wireless Public Safety Networks 2: A Systematic Approach
byDaniel Camara
Rating: 0 out of 5 stars
0 ratings
Spatio-temporal Design: Advances in Efficient Data Acquisition
Ebook
Spatio-temporal Design: Advances in Efficient Data Acquisition
byJorge Mateu
Rating: 0 out of 5 stars
0 ratings
Cognitive Radio Communication and Networking: Principles and Practice
Ebook
Cognitive Radio Communication and Networking: Principles and Practice
byRobert Caiming Qiu
Rating: 0 out of 5 stars
0 ratings
Failure Analysis: A Practical Guide for Manufacturers of Electronic Components and Systems
Ebook
Failure Analysis: A Practical Guide for Manufacturers of Electronic Components and Systems
byMarius Bazu
Rating: 0 out of 5 stars
0 ratings
Machine Learning and Data Science in the Power Generation Industry: Best Practices, Tools, and Case Studies
Ebook
Machine Learning and Data Science in the Power Generation Industry: Best Practices, Tools, and Case Studies
byPatrick Bangert
Rating: 0 out of 5 stars
0 ratings
Smart Inspection Systems: Techniques and Applications of Intelligent Vision
Ebook
Smart Inspection Systems: Techniques and Applications of Intelligent Vision
byDuc T. Pham
Rating: 0 out of 5 stars
0 ratings
Probability and Random Processes: With Applications to Signal Processing and Communications
Ebook
Probability and Random Processes: With Applications to Signal Processing and Communications
byScott Miller
Rating: 4 out of 5 stars
4/5
Computational Frameworks: Systems, Models and Applications
Ebook
Computational Frameworks: Systems, Models and Applications
byMamadou Kaba Traore
Rating: 0 out of 5 stars
0 ratings

Databases For You

Skip carousel

CompTIA DataSys+ Study Guide: Exam DS0-001
Ebook
CompTIA DataSys+ Study Guide: Exam DS0-001
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Spring in Action, Sixth Edition
Ebook
Spring in Action, Sixth Edition
byCraig Walls
Rating: 5 out of 5 stars
5/5
COBOL Basic Training Using VSAM, IMS and DB2
Ebook
COBOL Basic Training Using VSAM, IMS and DB2
byRobert Wingate
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
Ebook
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
bySteve Williams
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
Ebook
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
byAJIT DASH
Rating: 3 out of 5 stars
3/5
HTML, CSS, Bootstrap, Php, Javascript and MySql: All you need to know to create a dynamic site
Ebook
HTML, CSS, Bootstrap, Php, Javascript and MySql: All you need to know to create a dynamic site
byOlga Maria Stefania Cucaro
Rating: 4 out of 5 stars
4/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
SQL Clearly Explained
Ebook
SQL Clearly Explained
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Building a Scalable Data Warehouse with Data Vault 2.0
Ebook
Building a Scalable Data Warehouse with Data Vault 2.0
byDaniel Linstedt
Rating: 4 out of 5 stars
4/5
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
Data Mining: Concepts and Techniques
Ebook
Data Mining: Concepts and Techniques
byJiawei Han
Rating: 4 out of 5 stars
4/5
Oracle DBA Mentor: Succeeding as an Oracle Database Administrator
Ebook
Oracle DBA Mentor: Succeeding as an Oracle Database Administrator
byBrian Peasland
Rating: 0 out of 5 stars
0 ratings
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Ebook
Blockchain Basics: A Non-Technical Introduction in 25 Steps
byDaniel Drescher
Rating: 5 out of 5 stars
5/5
Getting Started with SQL Server 2014 Administration
Ebook
Getting Started with SQL Server 2014 Administration
byGethyn Ellis
Rating: 0 out of 5 stars
0 ratings
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
Ebook
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
byJohn Ladley
Rating: 4 out of 5 stars
4/5
The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL
Ebook
The SQL Workshop: Learn to create, manipulate and secure data and manage relational databases with SQL
byFrank Solomon
Rating: 0 out of 5 stars
0 ratings
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
A Concise Guide to Object Orientated Programming
Ebook
A Concise Guide to Object Orientated Programming
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
Ebook
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
byDan Clark
Rating: 0 out of 5 stars
0 ratings
Go in Action
Ebook
Go in Action
byErik St. Martin
Rating: 5 out of 5 stars
5/5
Python and SQLite Development
Ebook
Python and SQLite Development
byAgus Kurniawan
Rating: 0 out of 5 stars
0 ratings
The Visual Imperative: Creating a Visual Culture of Data Discovery
Ebook
The Visual Imperative: Creating a Visual Culture of Data Discovery
byLindy Ryan
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

37. Sean Knapp - The brave new world of data engineering
Podcast episode
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
A Central Piece of the GenAI Puzzle
Podcast episode
A Central Piece of the GenAI Puzzle
byThoughts on the Market
0 ratings
0% found this document useful
State of Containers in the Public Cloud
Podcast episode
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Automated Data Labeling for AI Apps
Podcast episode
Automated Data Labeling for AI Apps
byThe Cloudcast
0 ratings
0% found this document useful
Building A Reliable And Performant Router For Observability Data - Episode 97: An interview about building the Vector project to unify delivery of logs and metrics for better system observability
Podcast episode
Building A Reliable And Performant Router For Observability Data - Episode 97: An interview about building the Vector project to unify delivery of logs and metrics for better system observability
byData Engineering Podcast
0 ratings
0% found this document useful
Network Analysis At The Speed Of C With The Power Of Python Using NetworKit: An interview with Eugenio Angriman about the NetworKit library and how you can use it to gain insights into large volumes of networked data
Podcast episode
Network Analysis At The Speed Of C With The Power Of Python Using NetworKit: An interview with Eugenio Angriman about the NetworKit library and how you can use it to gain insights into large volumes of networked data
byThe Python Podcast.__init__
0 ratings
0% found this document useful
Proposing Annoyance Mining: A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems. This episode explores why such a (seemingly obvious) flaw might make sense from an engineering...
Podcast episode
Proposing Annoyance Mining: A recent episode of the Skeptics Guide to the Universe included a slight rant by Dr. Novella and the rouges about a shortcoming in operating systems. This episode explores why such a (seemingly obvious) flaw might make sense from an engineering...
byData Skeptic
0 ratings
0% found this document useful
The Modern Data Stack vs Hyperscale Data Warehousing: The modern data stack is a collection of cloud-based tools and technologies used to collect, store, process, and analyze data in a scalable way. It is a departure from traditional data stacks, which were often based on on-premises infrastructure and...
Podcast episode
The Modern Data Stack vs Hyperscale Data Warehousing: The modern data stack is a collection of cloud-based tools and technologies used to collect, store, process, and analyze data in a scalable way. It is a departure from traditional data stacks, which were often based on on-premises infrastructure and...
byDM Radio
0 ratings
0% found this document useful
Overcoming the next hurdle to get to 800G pluggable optics, with Mark Nowell, 3 of 4: What are the industry’s technical experts in plug…
Podcast episode
Overcoming the next hurdle to get to 800G pluggable optics, with Mark Nowell, 3 of 4: What are the industry’s technical experts in plug…
byCisco Podcast Network
0 ratings
0% found this document useful
Understanding Graph Database Patterns
Podcast episode
Understanding Graph Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
Podcast episode
How Column-Aware Development Tooling Yields Better Data Models: Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design. In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process.
byData Engineering Podcast
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Building Vector Search Applications
Podcast episode
Building Vector Search Applications
byThe Cloudcast
0 ratings
0% found this document useful
A murder mystery: who killed our user experience?: On this sponsored episode of the Stack Overflow Podcast, we talk with Greg Leffler of Splunk about the keys to instrumenting an observable system and how the OpenTelemetry standard makes observability easier, even if you aren’t using Splunk’s product.
Podcast episode
A murder mystery: who killed our user experience?: On this sponsored episode of the Stack Overflow Podcast, we talk with Greg Leffler of Splunk about the keys to instrumenting an observable system and how the OpenTelemetry standard makes observability easier, even if you aren’t using Splunk’s product.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
How evolution of sensor technology has empowered complex engineering projects: Fast paced development of electronics technology over the last decade has enable wireless sensors to go from being unproven but innovative solutions for monitoring complex engineering projects to become the go-to option for many schemes. The benefits...
Podcast episode
How evolution of sensor technology has empowered complex engineering projects: Fast paced development of electronics technology over the last decade has enable wireless sensors to go from being unproven but innovative solutions for monitoring complex engineering projects to become the go-to option for many schemes. The benefits...
byThe Engineers Collective
0 ratings
0% found this document useful
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
Podcast episode
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed: As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also explains why data security is distinct from application security and some methods for reducing the challenge of working across different data systems.
byData Engineering Podcast
0 ratings
0% found this document useful
Building ML Apps
Podcast episode
Building ML Apps
byThe Cloudcast
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
Podcast episode
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
New Trends in Serverless
Podcast episode
New Trends in Serverless
byThe Cloudcast
0 ratings
0% found this document useful
Data Observability
Podcast episode
Data Observability
byThe Cloudcast
0 ratings
0% found this document useful
An Event-Driven Apps Look Ahead for 2021
Podcast episode
An Event-Driven Apps Look Ahead for 2021
byThe Cloudcast
0 ratings
0% found this document useful
Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus: An interview with Frank Liu about the open source vector database Milvus and how its native storage of vector embeddings reduces the friction involved in building and deploying machine learning models.
Podcast episode
Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus: An interview with Frank Liu about the open source vector database Milvus and how its native storage of vector embeddings reduces the friction involved in building and deploying machine learning models.
byData Engineering Podcast
0 ratings
0% found this document useful
ATLAS with Dr. Mario Lassnig: Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN!
Podcast episode
ATLAS with Dr. Mario Lassnig: Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN!
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Easily Find Electronic Components for Your Next PCB Design: What is the current state of the electronic industry? Dan Schoenfelder joins us today to discuss the most extreme problems the PCB design industry is currently facing: supply shortages, particularly the semi-conductor products and microcontrollers. T...
Podcast episode
Easily Find Electronic Components for Your Next PCB Design: What is the current state of the electronic industry? Dan Schoenfelder joins us today to discuss the most extreme problems the PCB design industry is currently facing: supply shortages, particularly the semi-conductor products and microcontrollers. T...
byOnTrack: The PCB Design Podcast
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
Network Reliability Engineering
Podcast episode
Network Reliability Engineering
byThe Cloudcast
0 ratings
0% found this document useful
Neural Augmentation for Wireless Communication with Max Welling - #398: Today we’re joined by Max Welling, Vice President of Technologies at Qualcomm Netherlands, and Professor at the University of Amsterdam. In case you missed it, Max joined us last year to discuss his work on - the 2nd most popular episode of...
Podcast episode
Neural Augmentation for Wireless Communication with Max Welling - #398: Today we’re joined by Max Welling, Vice President of Technologies at Qualcomm Netherlands, and Professor at the University of Amsterdam. In case you missed it, Max joined us last year to discuss his work on - the 2nd most popular episode of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
eBPF Cloud-native Networking
Podcast episode
eBPF Cloud-native Networking
byThe Cloudcast
0 ratings
0% found this document useful
Gilles Fedak: iExec – Blockchain-Based Fully Distributed Cloud Computing Infrastructure: Gilles Fedak, a researcher Inria, joins us to discuss iExec, a new project which aims to build a high-performance distributed cloud infrastructure marketplace. iExec utilizes Ethreum to organize a peer-to-peer marketplace of computing resources, allowing anyone to rent their idle resources to grid networks.
Podcast episode
Gilles Fedak: iExec – Blockchain-Based Fully Distributed Cloud Computing Infrastructure: Gilles Fedak, a researcher Inria, joins us to discuss iExec, a new project which aims to build a high-performance distributed cloud infrastructure marketplace. iExec utilizes Ethreum to organize a peer-to-peer marketplace of computing resources, allowing anyone to rent their idle resources to grid networks.
byEpicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies
0 ratings
0% found this document useful

Skip carousel

Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
Article
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Futurity
Article
Team Encodes Digital ‘Hello’ Into Lab-made DNA
Mar 26, 2019
4 min read
“You Don’t Need A Computer, Let Alone One With 75,000 Processor Cores, To Think About The Parts Of A Problem”
PC Pro Magazine
Article
“You Don’t Need A Computer, Let Alone One With 75,000 Processor Cores, To Think About The Parts Of A Problem”
Dec 10, 2020
9 min read
How Technology Commons Revolutionise Industry Foundations
The European Business Review
Article
How Technology Commons Revolutionise Industry Foundations
Feb 11, 2022
9 min read
Five Technology Tips For Dark Factories Installation
Techfastly
Article
Five Technology Tips For Dark Factories Installation
Jun 1, 2021
6 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Data Centers Aren’t The Energy Hogs We Thought
Futurity
Article
Data Centers Aren’t The Energy Hogs We Thought
Feb 28, 2020
2 min read
Invest In The Best Of Asia’s Innovators
MoneyWeek
Article
Invest In The Best Of Asia’s Innovators
Jan 21, 2022
Asia is home to a wealth of innovative businesses generating valuable intellectual property. This often translates to defensible economic moats and higher profitability. At Cerno Capital, we look for companies in structurally growing industries provi
2 min read
Moore’s Law Is About to Get Weird: Never mind tablet computers. Wait till you see bubbles and slime mold.
Nautilus
Article
Moore’s Law Is About to Get Weird: Never mind tablet computers. Wait till you see bubbles and slime mold.
Feb 12, 2015
I’ve never seen the computer you’re reading this story on, but I can tell you a lot about it. It runs on electricity. It uses binary logic to carry out programmed instructions. It shuttles information using materials known as semiconductors. Its brai
7 min read
Chinese Students' Dream Device Defeats Japan's Most Powerful Supercomputer In World Contest
Post Magazine
Article
Chinese Students' Dream Device Defeats Japan's Most Powerful Supercomputer In World Contest
Jun 15, 2022
A small computer developed by Chinese students outperformed Japan's most powerful machine in solving a major complex data problem related to artificial intelligence, according to the latest global ranking. Supercomputer Fugaku in Japan has nearly 4 m
3 min read
How Quantum Computing Can Fight Climate Change
PC Pro Magazine
Article
How Quantum Computing Can Fight Climate Change
Oct 8, 2022
8 min read
태도가 건축이 될 때 When Attitude Becomes Architecture
Space
Article
태도가 건축이 될 때 When Attitude Becomes Architecture
Dec 5, 2023
12 min read
Software And It Logistics To Shape Future Of Warehouse Management
Facility Management
Article
Software And It Logistics To Shape Future Of Warehouse Management
Jun 24, 2018
1 min read
How Quantum Computing Can Fight Climate Change
APC
Article
How Quantum Computing Can Fight Climate Change
Nov 28, 2022
8 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
High-Frequency Chip Makes Fastest Internet Speeds Look Slow
Futurity
Article
High-Frequency Chip Makes Fastest Internet Speeds Look Slow
Sep 1, 2017
1 min read
Naga Chandrasekaran
HWM Singapore
Article
Naga Chandrasekaran
Dec 6, 2022
Micron’s 232-layer NAND technology provided the high-performance storage necessary to support advanced solutions and real-time services required in data centre and automotive applications, thanks to benefits like longer battery life, better performan
3 min read
The Future Is All Quantum
Techfastly
Article
The Future Is All Quantum
Oct 1, 2021
2 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Why The Future Needs Optical Data Centres
PC Pro Magazine
Article
Why The Future Needs Optical Data Centres
Sep 10, 2020
9 min read
Quantum Simulators An Overview
Techfastly
Article
Quantum Simulators An Overview
Oct 1, 2021
4 min read
새로운 유형의 건축 New Types of Architecture
Space
Article
새로운 유형의 건축 New Types of Architecture
Dec 5, 2023
시대의 단면을 보여주는 건축 유형이 있다. 종교 중심 사회였던 중세시대 유럽의 대성당, 산업혁명기 대량생산을 위해 등장한 공장과 노동자를 위한 공동주택이 그렇다. 이러한 관점을 그대로 옮겨 기술과 정보가 중심이 된 현대를 보여주는 새로운 유형의 건축을 꼽으라면 데이터센터가 아닐까. 정부, 기업, 연구 조직에서 운영되던 전산 공간은 인터넷 상용화와 모바일 기술의 발전을 배경으로 급격히 성장하면서 오늘날 ‘데이터센터’라는 이름의 필수 기반시설이 됐
12 min read
Here’s A Feasible Way For Our Devices To Send Data With Light
Futurity
Article
Here’s A Feasible Way For Our Devices To Send Data With Light
Apr 25, 2018
Researchers have developed a method to fabricate silicon chips that can communicate with light and are no more expensive than current chip technology. The new microchip technology capable of optically transferring data could solve a severe bottleneck
3 min read
Quantum Computing Is Here… With One Small Caveat
APC
Article
Quantum Computing Is Here… With One Small Caveat
Feb 5, 2024
8 min read
New World Record For Internet Speed Is 4.5 Million Times Faster Than Broadband
The Independent
Article
New World Record For Internet Speed Is 4.5 Million Times Faster Than Broadband
Mar 27, 2024
1 min read
To Build Amazing Computers, Mimic The Brain?
Futurity
Article
To Build Amazing Computers, Mimic The Brain?
Mar 4, 2020
5 min read

Related categories

Skip carousel

Reviews for Fundamentals of Data Warehouses

Rating: 4 out of 5 stars

4/5

1 rating0 reviews

Book preview

Fundamentals of Data Warehouses - Matthias Jarke

1 Data Warehouse Practice: An Overview

Mattias Jarke¹ , Maurizio Lenzerini² , Yannis Vassiliou³ and Panos Vassiliadis³

(1)

Dept. of Computer Science V, RWTH Aachen, Ahornstraße 55, 52056, Aachen, Germany

(2)

Dipartimento di Informatica e Sistemistica, Università di Roma La Sapienza, Via Saleria 113, 00198, Rome, Italy

(3)

Dept. of Electrical and Computer Engineering, Computer Science Division, National Technical University of Athens, 15773, Zographou, Athens, Greece

Mattias Jarke

Email: jarke@informatik.rwth-aachen.de

Maurizio Lenzerini

Email: lenzerini@dis.uniromal.it

Yannis Vassiliou

Email: yv@cs.ntua.gr

Since the beginning of data warehousing in the early 1990s, an informal consensus has been reached concerning the major terms and components involved in data warehousing. In this chapter, we first explain the main terms and components. Data warehouse vendors are pursuing different strategies in supporting this basic framework. We review a few of the major product families and the basic problem areas data warehouse practice and research are faced with today.

A data warehouse (DW) is a collection of technologies aimed at enabling the knowledge worker (executive, manager, and analyst) to make better and faster decisions. It is expected to have the right information in the right place at the right time with the right cost in order to support the right decision. Traditional online transaction processing (OLTP) systems are inappropriate for decision support and high-speed networks cannot, by themselves, solve the information accessibility problem. Data warehousing has become an important strategy to integrate heterogeneous data sources and to enable online analytic processing (OLAP).

A report from the META Group in 1996 predicted data warehousing would be a US$ 13 000 million industry within two years ($8000 million on hardware, $5000 million on services and systems integration), while 1995 represented $ 2000 million in expenditures. In 1998, reality had exceeded these figures, reaching sales of $14 600 million. By 2000, the subsector of OLAP alone exceeded $ 2500 million. Table 1.1 differentiates the trends by product sector.

Table 1.1.

Estimated sales in millions of dollars [ShTy98] (* Estimates are from [PeCr00])

The number and complexity of projects — with project sizes ranging from a few hundred thousand to multiple millions of dollars — is indicative of the difficulty of designing good data warehouses. Their expected duration highlights the need for documented quality goals and change management. The emergence of data warehousing was initially a consequence of the observation by W. Inmon and E. F. Codd in the early 1990s that operational-level online transaction processing (OLTP) and decision support applications (OLAP) cannot coexist efficiently in the same database environment, mostly due to their very different transaction characteristics. Meanwhile, data warehousing has taken a much broader role, especially in the context of reengineering legacy systems or at least saving legacy data. Here, DWs are seen as a strategy to bring heterogeneous data together under a common conceptual and technical umbrella and to make them available for new operational or decision support applications.

A data warehouse caches selected data of interest to a customer group, so that access becomes faster, cheaper, and more effective (Fig. 1.1). As the long-term buffer between OLTP and OLAP, data warehouses face two essential questions: how to reconcile the stream of incoming data from multiple heterogeneous legacy sources, and how to customize the derived data storage to specific OLAP applications. The trade-off driving the design decisions concerning these two issues changes continuously with business needs. Therefore, design support and change management are of greatest importance if we do not want to run DW projects into dead ends.

Fig. 1.1.

Data warehouses: a buffer between transaction processing and analytic processing

Vendors agree that data warehouses cannot be off-the-shelf products but must be designed and optimized with great attention to the customer situation. Traditional database design techniques do not apply since they cannot deal with DW-specific issues such as data source selection, temporal and aggregated data, and controlled redundancy management. Since the wide variety of product and vendor strategies prevents a low-level solution to these design problems at acceptable costs, serious research and development efforts continue to be necessary.

1.1 Data Warehouse Components

Figure 1.2 gives a rough overview of the usual data warehouse components and their relationships. Many researchers and practitioners share the understanding that a data warehouse architecture can be understood as layers of materialized views on top of each other. Since the research problems are largely formulated from this perspective, we begin with a brief summary description.

Fig. 1.2.

A generic data warehouse architecture

A data warehouse architecture exhibits various layers of data in which data from one layer are derived from data of the lower layer. Data sources, also called operational databases, form the lowest layer. They may consist of structured data stored in open database systems and legacy systems or unstructured or semistructured data stored in files. The data sources can be either part of the operational environment of an organization or external, produced by a third party. They are usually heterogeneous, which means that the same data can be represented differently, for instance through different database schemata, in the sources.

The central layer of the architecture is the global data warehouse, sometimes called primary or corporate data warehouse. According to Inmon [Inmo96], it is a collection of integrated, nonvolatile, subject-oriented databases designed to support the decision support system (DSS) function, where each unit of data is relevant to some moment in time, it contains atomic data and lightly summarized data. The global data warehouse keeps a historical record of data. Each time it is changed, a new integrated snapshot of the underlying data sources from which it is derived is placed in line with the previous snapshots. Typically, the data warehouse may contain data that can be many years old (a frequently cited average age is two years). Researchers often assume (realistically) that the global warehouse consists of a set of materialized relational views. These views are defined in terms of other relations that are themselves constructed from the data stored in the sources.

The next layer of views are the local warehouses, which contain highly aggregated data derived from the global warehouse, directly intended to support activities such as informational processing, management decisions, long-term decisions, historical analysis, trend analysis, or integrated analysis. There are various kinds of local warehouses, such as the data marts or the OLAP databases. Data marts are small data warehouses that contain only a subset of the enterprise-wide data warehouse. A data mart may be used only in a specific department and contains only the data which is relevant to this department. For example, a data mart for the marketing department should include only customer, sales, and product information whereas the enterprise-wide data warehouse could also contain information on employees, departments, etc. A data mart enables faster response to queries because the volume of the managed data is much smaller than in the data warehouse and the queries can be distributed between different machines. Data marts may use relational database systems or specific multidimensional data structures.

There are two major differences between the global warehouse and local data marts. First, the global warehouse results from a complex extraction-integration-transformation process. The local data marts, on the other hand, result from an extraction/aggregation process starting from the global warehouse. Second, data in the global warehouse are detailed, voluminous (since the data warehouse keeps data from previous periods of time), and lightly aggregated. On the contrary, data in the local data marts are highly aggregated and less voluminous. This distinction has a number of consequences both in research and in practice, as we shall see throughout the book.

In some cases, an intermediate layer, called an operational data store (ODS), is introduced between the operational data sources and the global data warehouse. An ODS contains subject-oriented, collectively integrated, volatile, current valued, and detailed data. The ODS usually contains records that result from the transformation, integration, and aggregation of detailed data found in the data sources, just as for a global data warehouse. Therefore, we can also consider that the ODS consists of a set of materialized relational views. The main differences with a data warehouse are the following. First, the ODS is subject to change much more frequently than a data warehouse. Second, the ODS only has fresh and current data. Finally, the aggregation in the ODS is of small granularity: for example, the data can be weakly summarized. The use of an ODS, according to Inmon [Inmo96], is justified for corporations that need collective, integrated operational data. The ODS is a good support for activities such as collective operational decisions, or immediate corporate information. This usually depends on the size of the corporation, the need for immediate corporate information, and the status of integration of the various legacy systems. Figure 1.2 summarizes the different layers of data.

All the data warehouse components, processes, and data are — or at least should be — tracked and administered from a metadata repository. The metadata repository serves as an aid both to the administrator and the designer of a data warehouse. Since the data warehouse is a very complex system, its architecture (physical components, schemata) can be complicated; the volume of data is vast; and the processes employed for the extraction, transformation, cleaning, storage, and aggregation of data are numerous, sensitive to changes, and vary in time.

1.2 Designing the Data Warehouse

The design of a data warehouse is a difficult task. There are several problems designers have to tackle. First of all, they have to come up with the semantic reconciliation of the information lying in the sources and the production of an enterprise model for the data warehouse. Then, a logical structure of relations in the core of data warehouse must be obtained, either serving as buffers for the refreshment process or as persistent data stores for querying or further propagation to data marts. This is not a simple task by itself; it becomes even more complicated since the physical design problem arises: the designer has to choose the physical tables, processes, indexes, and data partitions, representing the logical data warehouse schema and facilitating its functionality. Finally, hardware selection and software development is another process that has to be planned from the data warehouse designer [AdVe98, ISIA97, Simo98].

It is evident that the schemata of all the data stores involved in a data warehouse environment change rapidly: the changes of the business rules of a corporation affect both the source schemata (of the operational databases) and the user requirements (and the schemata of the data marts). Consequently, the design of a data warehouse is an ongoing process, which is performed iteratively throughout the lifecycle of the system [KRRT98].

There is quite a lot of discussion about the methodology for the design of a data warehouse. The two major methodologies are the top-down and the bottom-up approaches [Kimb96, KRRT98, Syba97]. In the top-down approach, a global enterprise model is constructed, which reconciles the semantic models of the sources (and later, their data). This approach is usually costly and time-consuming; nevertheless it provides a basis over which the schema of the data warehouse can evolve. The bottom-up approach focuses on the more rapid and less costly development of smaller, specialized data marts and their synthesis as the data warehouse evolves.

No matter which approach is followed, there seems to be agreement on the general idea concerning the final schema of a data warehouse. In a first layer, the ODS serves as an intermediate buffer for the most recent and detailed information from the sources. The data cleaning and transformation is performed at this level. Next, a database under a denormalized star schema usually serves as the central repository of data. A star schema is a special-purpose schema in data warehouses that is oriented towards query efficiency at the cost of schema normalization (cf. Chap. 5 for a detailed description). Finally, more aggregated views on top of this star schema can also be precalculated. The OLAP tools can communicate either with the upper levels of the data warehouse or with the customized data marts: we shall detail this issue in the following sections.

1.3 Getting Heterogeneous Data into the Warehouse

Data warehousing requires access to a broad range of information sources:

Database systems (relational, object-oriented, network, hierarchical, etc.)

External information sources (information gathered from other companies, results of surveys)

Files of standard applications (e.g., Microsoft Excel, COBOL applications)

Other documents (e.g., Microsoft Word, World Wide Web)

Wrappers, loaders, and mediators are programs that load data of the information sources into the data warehouse. Wrappers and loaders are responsible for loading, transforming, cleaning, and updating the data from the sources to the data warehouse. Mediators integrate the data into the warehouse by resolving inconsistencies and conflicts between different information sources. Furthermore, an extraction program can examine the source data to find reasons for conspicuous items, which may contain incorrect information [BaBM97].

These tools — in the commercial sector classified as Extract-Transform-Load (ETL) tools — try to automate or support tasks such as [Gree97]:

Extraction (accessing different source databases)

Cleaning (finding and resolving inconsistencies in the source data)

Transformation (between different data formats, languages, etc.)

Loading (loading the data into the data warehouse)

Replication (replicating source databases into the data warehouse)

Analyzing (e.g., detecting invalid/unexpected values)

High-speed data transfer (important for very large data warehouses)

Checking for data quality, (e.g., for correctness and completeness)

Analyzing metadata (to support the design of a data warehouse)

1.4 Getting Multidimensional Data out of the Warehouse

Relational database management systems (RDBMS) are most flexible when they are used with a normalized data structure. Because normalized data structures are non-redundant, normalized relations are useful for the daily operational work. The database systems used for this role, so called OLTP systems, are optimized to support small transactions and queries using primary keys and specialized indexes.

While OLTP systems store only current information, data warehouses contain historical and summarized data. These data are used by managers to find trends and directions in markets, and supports them in decision making. OLAP is the technology that enables this exploitation of the information stored in the data warehouse.

Due to the complexity of the relationships between the involved entities, OLAP queries require multiple join and aggregation operations over normalized relations, thus overloading the normalized relational database.

Typical operations performed by OLAP clients include [ChDa97]:

Roll up (increasing the level of aggregation)

Drill down (decreasing the level of aggregation)

Slice and dice (selection and projection)

Pivot (reorienting the multidimensional view)

Beyond these basic OLAP operations, other possible client applications on data warehouses include:

Report and query tools

Geographic information systems (GIS)

Data mining (finding patterns and trends in the data warehouse)

Decision support systems (DSS)

Executive information systems (EIS)

Statistics

The OLAP applications provide users with a multidimensional view of the data, which is somewhat different from the typical relational approach; thus their operations need special, customized support. This support is given by multidimensional database systems and relational OLAP servers.

The database management system (DBMS) used for the data warehouse itself and/or for data marts must be a high-performance system, which fulfills the requirements for complex querying demanded by the clients. The following kinds of DBMS are used for data warehousing [Weld97]:

Super-relational database systems

Multidimensional database systems

Super-relational database systems. To make RDBMS more useful for OLAP applications, vendors have added new features to the traditional RDBMS. These so-called super-relational features include support for extensions to storage formats, relational operations, and specialized indexing schemes. To provide fast response time to OLAP applications, the data are organized in a star or snowflake schema (see also Chap. 5).

The resulting data model might be very complex and hard to understand for end users. Vendors of relational database systems try to hide this complexity behind special engines for OLAP. The resulting architecture is called Relational OLAP (ROLAP). In contrast to predictions in the mid-1990s, ROLAP architectures have not been able to capture a large share of the OLAP market. Within this segment, one of the leaders is MicroStrategy [MStr97] whose architecture is shown in Fig. 1.4. The RDBMS is accessed through VLDB (very large databases) drivers, which are optimized for large data warehouses.

Fig. 1.3.

MicroStrategy solution [MStr97]

Fig. 1.4.

MDDB in a data warehouse environment

The DSS Architect translates relational database schemas to an intuitive multidimensional model, so that users are shielded from the complexity of the relational data model. The mapping between the relational and the multidimensional data models is done by consulting the metadata. The system is controlled by the DSS Administrator. With this tool, system administrators can fine-tune the database schema, monitor the system performance, and schedule batch routines.

The DSS Server is a ROLAP server, based on a relational database system. It provides a multidimensional view of the underlying relational database. Other features are the ability to cache query results, the monitoring and scheduling of queries, and generating and maintaining dynamic relational data marts. DSS Agent, DSS Objects, and DSS Web are interfaces to end users, programming languages, or the World Wide Web.

Other ROLAP servers are offered by Red Brick [RBSI97] (subsequently acquired by Informix, then passed on to IBM) and Sybase [Syba97]. The Red Brick system is characterized by an industry-leading indexing and join technology for star schemas (Starjoin); it also includes a data mining option to find patterns, trends, and relationships in very large databases. They argue that data warehouses need to be constructed in an incremental, bottom-up fashion. Therefore, such vendors focus on support of distributed data warehouses and data marts.

Multidimensional database systems (MDDB) support directly the way in which OLAP users visualize and work with data. OLAP requires an analysis of large volumes of complex and interrelated data and viewing that data from various perspectives [Kena95]. MDDB store data in n-dimensional cubes. Each dimension represents a user perspective. For example, the sales data of a company may have the dimensions product, region, and time. Because of the way the data is stored, there are no join operations necessary to answer queries which retrieve sales data by one of these dimensions. Therefore, for OLAP applications, MDDB are often more efficient than traditional RDBMS [Coll96]. A problem with MDDB is that restructuring is much more expensive than in a relational database. Moreover, there is currently no standard data definition language and query language for the multidimensional data model.

In practical multidimensional OLAP products, two market segments can be observed [PeCr00]. At the low end, desktop OLAP systems such as Cognos Power-Play, Business Objects, or Brio focus on the efficient and user-friendly handling of relatively small data cubes on client systems. Here, the MDBS is implemented as a data retailer [Sahi96]: it gets its data from a (relational) data warehouse and offers analysis functionality to end users. As shown in Fig. 1.5, ad-hoc queries are sent directly to the data warehouse, whereas OLAP applications work on the more appropriate, multidimensional data model of the MDDB. Market leaders in this segment support hundreds of thousands of workplaces.

Fig. 1.5.

Example of a DW environment for integrated financial reporting and planning

At the high end, hybrid OLAP (HOLAP) solutions aim to provide full integration of relational data warehouse solutions (aiming at scalability) and multidimensional solutions (aiming at OLAP efficiency) in complex architectures. Market leaders include Hyperion Essbase, Oracle Express, and Microsoft OLAP.

Application-oriented OLAP. As pointed out by Pendse and Creeth [PeCr00], only a few vendors can survive on generic server tools as mentioned above. Many more market niches can be found for specific application domains. Systems in this sector often provide lots of application-specific functionality in addition to (or on top of) multidimensional OLAP (MOLAP) engines. Generally speaking, application domains can be subdivided into four business functions:

Reporting and querying for standard controlling tasks

Problem and opportunity analysis (often called Business Intelligence)

Planning applications

One-of-a-kind data mining campaigns or analysis projects

Two very important application domains are sales analysis and customer relationship management on the one hand, and budgeting, financial reporting, and consolidation on the other. Interestingly, only a few of the tools on the market are able to integrate the reporting and analysis stage for the available data, with the planning tasks for the future.

As an example, Fig. 1.6 shows the b2brain architecture by Thinking Networks AG [Thin01], a MOLAP-based environment for financial reporting and planning data warehouses. It shows some typical features of advanced application-oriented OLAP environments such as efficient custom-tailoring to new applications within a domain using metadata, linkage to heterogeneous sources and clients also via the Internet, and seamless integration of application-relevant features such as heterogeneous data collection, semantics-based consolidation, data mining and planning. Therefore, the architecture demonstrates the variety of physical structures encountered in high-end data warehousing as well as the importance of metadata, both to be discussed in the following subsections.

Fig. 1.6.

Central architecture

1.5 Physical Structure of Data Warehouses

There are three basic architectures for a data warehouse [Weld97, Muck96]:

Centralized

Federated

Tiered

In a centralized architecture, there exists only one data warehouse which stores all data necessary for business analysis. As already shown in the previous section, the disadvantage is the loss of performance compared to distributed approaches. All queries and update operations must be processed in one database system.

On the other hand, access to data is uncomplicated because only one data model is relevant. Furthermore, building and maintaining a central data warehouse is easier than in a distributed environment. A central data warehouse is useful for companies, where the existing operational framework is also centralized (Fig. 1.7).

Fig. 1.7.

Federated architecture

A decentralized architecture is only advantageous if the operational environment is also distributed. In a federated architecture, the data is logically consolidated but stored in separate physical databases at the same or at different physical sites (Fig. 1.8). The local data marts store only the relevant information for a department. Because the amount of data is reduced in contrast to a central data warehouse, the local data mart may contain all levels of detail so that detailed information can also be delivered by the local system.

Fig. 1.8.

Tiered architecture

An important feature of the federated architecture is that the logical warehouse is only virtual. In contrast, in a tiered architecture (Fig. 1.9), the central data warehouse is also physical. In addition to this warehouse, there exist local data marts on different tiers which store copies or summaries of the previous tier but not detailed data as in a federate architecture.

Fig. 1.9.

Distribution of data warehouse project costs [Inmo97]

There can be also different tiers at the source side. Imagine, for example, a super market company collecting data from its branches. This process cannot be done in one step because many sources have to be integrated into the warehouse. On the first level, the data of all branches in one region is collected, and in the second level, the data from the regions is integrated into one data warehouse.

The advantages of the distributed architecture are (a) faster response time because the data is located closer to the client applications and (b) reduced volume of data to be searched. Although, several machines must be used in a distributed architecture, this may result in lower hardware and software costs because not all data must be stored at one place and queries are executed on different machines. A scalable architecture is very important for data warehousing. Data warehouses are not static systems but evolve and grow over time. Because of this, the architecture chosen to build a data warehouse must be easy to extend and to restructure.

1.6 Metadata Management

Metadata play an important role in data warehousing. Before a data warehouse can be accessed efficiently, it is necessary to understand what data is available in the warehouse and where is the data located In addition to locating the data that the end users require, metadata repositories may contain [AdCo97, MStr95, Micr96]:

Data dictionary: contains definitions of the databases being maintained and the relationships between data elements

Data flow: direction and frequency of data feed

Data transformation: transformations required when data is moved

Version control: changes to metadata are stored

Data usage statistics: a profile of data in the warehouse

Alias information: alias names for a field

Security: who is allowed to

Enjoying the preview?

Page 1 of 1

Fundamentals of Data Warehouses

About this ebook

Matthias Jarke

Related authors

Related to Fundamentals of Data Warehouses

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Fundamentals of Data Warehouses

What did you think?

Book preview

Fundamentals of Data Warehouses - Matthias Jarke

1

Data Warehouse Practice: An Overview

1.1 Data Warehouse Components

1.2 Designing the Data Warehouse

1.3 Getting Heterogeneous Data into the Warehouse

1.4 Getting Multidimensional Data out of the Warehouse

1.5 Physical Structure of Data Warehouses

1.6 Metadata Management