Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Banking on Cloud Data Platforms: A Guide
Banking on Cloud Data Platforms: A Guide
Banking on Cloud Data Platforms: A Guide
Ebook314 pages2 hours

Banking on Cloud Data Platforms: A Guide

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book explores the evolution of data platforms over the last five decades, spanning from data warehousing to big data and cloud technologies. It discusses architecture, guiding principles, technology, and various use cases in the banking industry. The role of fintech and meeting digital payment demands with modern platforms is addressed. Techniques for handling PII/SPDI data in the cloud, ingestion frameworks, real-time and streaming data, and data availability are discussed practically. Additionally, it covers the increasing roles of CDOs, governance, data security, and DPDP. These chapters serve as valuable references for banks and financial institutions, drawing from real-world data sources and global events.

LanguageEnglish
Release dateOct 20, 2023
ISBN9798890086259
Banking on Cloud Data Platforms: A Guide

Related to Banking on Cloud Data Platforms

Related ebooks

Information Technology For You

View More

Related articles

Reviews for Banking on Cloud Data Platforms

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Banking on Cloud Data Platforms - Dillip Kumar

    PART 1: FIRST GENERATION

    Chapter 1: Data warehouse

    The fundamental idea behind a Data Warehouse is to provide a unified and reliable source of truth for a company, aiding in decision-making and predictive analysis. A Data Warehouse is an organized system that stores historical and aggregated data from one or multiple sources. It streamlines the process of generating reports and conducting analyses for organizations.

    1.1 Three-Tier Architecture:

    Bottom tier: The bottom tier comprises the database server, which is responsible for retrieving data from various sources, including transactional databases utilized by front-end applications.

    Middle tier: The middle tier houses an OLAP server, which transforms the data into a structure better suited for analysis and complex querying. The OLAP server can work in two ways: either as an extended relational database management system that maps the operations on multidimensional data to standard relational operations (Relational OLAP), or using a multidimensional OLAP model that directly implements the multidimensional data and operations.

    Top tier: The top tier is the client layer. This tier holds the tools used for high-level data analysis, querying reporting, and data mining

    Fig 1.2: Three-tier Architecture of Data Warehouse

    Two Models inspire most to DWH - Kimball vs. Inmon

    Ralph Kimball's methodology places a strong emphasis on the significance of data marts, which serve as dedicated repositories for data pertaining to specific lines of business within an organization. In this framework, the data warehouse essentially functions as an amalgamation of various data marts, thereby facilitating comprehensive reporting and in-depth analysis. The Kimball approach is characterized by its bottom-up orientation, wherein data marts are initially developed and subsequently integrated into the overarching data warehouse structure.

    Bill Inmon's perspective views the data warehouse as the central repository for all enterprise data. In this approach, an organization initiates the process by constructing a normalized data warehouse model, which serves as the foundation. Dimensional data marts are then derived from and aligned with this warehouse model. This methodology is commonly referred to as the top-down approach to data warehousing.

    Bottom-Up Design Approach

    In the context of data warehousing, the Bottom-Up approach entails conceptualizing a data warehouse as a specialized architecture designed for query and analytical purposes, predicated on the establishment of a star schema. This approach prioritizes the initial creation of data marts, tailored to fulfill the specific reporting and analytical requirements pertinent to distinct business processes or subject domains. Consequently, it underscores a business-centric perspective, distinguishing it from Inmon's data-centric methodology.

    Data marts encompass granular transactional data, with the possibility of incorporating aggregated data as needed. Instead of adhering to the principles of normalization as seen in traditional databases, this approach espouses the utilization of denormalized dimensional databases, tailored to meet the data provisioning requisites inherent to data warehousing. Underpinning this strategy is the imperative of designing data marts with the foresight of incorporating conformed dimensions, ensuring that common entities are consistently represented across disparate data marts. These conformed dimensions serve as the linchpin that unifies the data marts into a cohesive data warehouse, frequently referred to as a virtual data warehouse.

    One of the prominent advantages of the Bottom-Up design approach resides in its expeditious return on investment (ROI). The development of a data mart, constituting a discrete data warehouse catering to a singular subject, demands significantly less time and resources compared to the endeavor of constructing an all-encompassing enterprise-wide data warehouse. Furthermore, this methodology mitigates the risk of project failure. Inherently incremental in nature, it facilitates an adaptive learning process, enabling the project team to accumulate knowledge and expertise iteratively.

    Fig 1.3: Bottom-Up Approach

    Advantages of the bottom-up design approach include the rapid generation of documents, flexibility in extending the data warehouse to accommodate new business units, and the straightforward process of developing new data marts and integrating them with existing ones. However, a notable disadvantage lies in the reversal of the traditional locations of the data warehouse and data marts in the bottom-up approach design.

    Top-down Design Approach

    In the Top-Down design approach, a data warehouse is delineated as a subject-oriented, time-variant, non-volatile, and integrated data repository encompassing enterprise-wide data originating from diverse sources. This data undergoes validation, reformatting, and is stored in a normalized database, adhering to third normal form (3NF) principles, serving as the foundational structure of the data warehouse. This repository contains atomic information, representing data at its most granular level, forming the basis from which dimensional data marts can be constructed, selectively extracting data pertinent to specific business subjects or departmental requirements. This approach is characterized as data-driven, emphasizing the initial consolidation and integration of data before formulating business-specific prerequisites for constructing data marts. Notably, this method confers the advantage of fostering a singular, integrated data source, ensuring consistency in data marts derived from it, particularly when areas of overlap exist.

    Advantages of the top-down design approach encompass the streamlined loading of Data Marts directly from the central data warehouse, simplifying the process of creating new data marts. However, this method exhibits inflexibility when it comes to accommodating evolving departmental requirements and often incurs high implementation costs.

    Fig 1.4: Top-down Approach

    1.2 Data Warehouse Models

    A virtual data warehouse is a collection of distinct databases that can be seamlessly queried together, offering users the ability to access all data as if it were consolidated into a single data warehouse. Conversely, a data mart model is employed for business-focused reporting and analysis. In this data warehousing model, data is aggregated from various source systems that pertain specifically to a particular business domain, such as sales or finance. In contrast, the enterprise data warehouse model prescribes that the data warehouse should contain aggregated data spanning the entire organization. This model envisions the data warehouse as the central hub of the enterprise's information system, housing integrated data originating from all business units.

    Data Modelling Life Cycle:

    It is a straightforward process of transforming the business requirements to fulfill the goals for storing, maintaining, and accessing the data within IT systems. The result is a logical and physical data model for an enterprise data warehouse.

    The objective of the data modeling life cycle is primarily the creation of a storage area for business information. That area comes from the logical and physical data modeling stages

    Fig 1.5: Data Modeling Life Cycle

    Conceptual Data Model:

    A conceptual data model recognizes the highest-level relationships between the different entities. Characteristics of the conceptual data model as below:

    • It contains the essential entities and the relationships among them.

    • No attribute is specified.

    • No primary key is specified.

    We can see that the only data shown via the conceptual data model is the entities that define the data and the relationships between those entities. No other data, as shown through the conceptual data model.

    Fig 1.6: Example of Conceptual Data Model

    Logical Data Model:

    A logical data model outlines information structures comprehensively, without consideration for their physical database implementation. Its main goal is to document business data structures, processes, rules, and relationships within a unified logical data model view.

    Features of a logical data model include:

    • It involves all entities and relationships among them.

    • All attributes for each entity are specified.

    • The primary key for each entity is stated.

    • Referential Integrity is specified (FK Relation).

    • Specify primary keys for all entities.

    • List the relationships between different entities.

    • List all attributes for each entity.

    • Normalization.

    • No data types are listed

    Fig 1.7: Example of Logic Data Model

    Physical Data Model

    A physical data model details the database's presentation, including table structures, column names, data types, constraints, primary keys, foreign keys, and table relationships. Its purpose is to map the logical data model to the physical structures of the hosting RDBMS system in a data warehouse, encompassing the definition of physical RDBMS structures like tables and data types for data storage, and potentially introducing new data structures to improve query performance.

    • Characteristics of a physical data model

    • Specification all tables and columns.

    • Foreign keys are used to recognize relationships between tables.

    • The steps for physical data model design which are as follows:

    • Convert entities to tables.

    • Convert relationships to foreign keys.

    • Convert attributes to columns.

    Fig 1.8: Example of Physical Model

    Star Schema vs. Snowflake Schema

    The star schema and snowflake schema are two ways to structure the data in the data warehouse.

    The star schema has a centralized data repository, stored in a fact table. The schema splits the fact table into a series of denormalized dimension tables. The fact table contains aggregated data to be used for reporting purposes while the dimension table describes the stored data.

    Denormalized designs are less complex because the data is grouped. The fact table uses only one link to join to each dimension table. The star schema’s simpler design makes it much easier to write complex

    Enjoying the preview?
    Page 1 of 1