Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

The Digital Journey of Banking and Insurance, Volume III: Data Storage, Data Processing and Data Analysis
The Digital Journey of Banking and Insurance, Volume III: Data Storage, Data Processing and Data Analysis
The Digital Journey of Banking and Insurance, Volume III: Data Storage, Data Processing and Data Analysis
Ebook482 pages4 hours

The Digital Journey of Banking and Insurance, Volume III: Data Storage, Data Processing and Data Analysis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book, the third one of three volumes, focuses on data and the actions around data, like storage and processing. The angle shifts over the volumes from a business-driven approach in “Disruption and DNA” to a strong technical focus in “Data Storage, Processing and Analysis”, leaving “Digitalization and Machine Learning Applications” with the business and technical aspects in-between. In the last volume of the series, “Data Storage, Processing and Analysis”, the shifts in the way we deal with data are addressed.

LanguageEnglish
Release dateOct 27, 2021
ISBN9783030788216
The Digital Journey of Banking and Insurance, Volume III: Data Storage, Data Processing and Data Analysis

Related to The Digital Journey of Banking and Insurance, Volume III

Related ebooks

Industries For You

View More

Related articles

Reviews for The Digital Journey of Banking and Insurance, Volume III

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    The Digital Journey of Banking and Insurance, Volume III - Volker Liermann

    Part IBig Data and Special Databases

    Data availability and data technology stimulate each other continuously. The internet has made mass data available for almost every important (and unimportant) subject. The volume forced Google to develop a concept to deal with such amounts of data: the BigTable¹ and the MapReduce² concept as a part of the Google File System. The availability of technology (especially as a cost-efficient open-source implementation³) then opens up for other Big Data processing use cases, such as customer clustering analysis or (when including the time dimension) the prediction of a customer journey.

    Driven by the business requirements, topic-specific database variants like graph databases or other NoSQL databases (document store, key-value store, object database, …) have been established in the market. There is no perfect NoSQL database. Every type of database has advantages and disadvantages depending on the subject it is applied to. The evolution of specific types of database shows the demand for application-specific database types (in-memory DB, cluster DB, graph DB, document DB, …). Once the technology is implemented and available (ideally as open source⁴) new use cases are mapped, and sometimes surprising applications can arise from a tool in the right hands. For example: graph databases are used in the context of anti-money laundering (AML) to analyze connected persons and accounts. This application is certainly not the most obvious application for graphs (nodes and edges), but formulation of the challenge AML as a graph delivers stable and reliable results.

    The first chapter in this part (Freche, den Heijer und Wormuth, 2021) tackles the subject of data lineage. The Basel Committee on Banking Supervision regulation announced its Principles for effective risk data aggregation and risk reporting in 2013 (BCBS 239, 2013). Principles include requirements demanding a data lineage.⁵ The chapter discusses the regulatory requirements but also explains the need for data lineage for internal and external business requirements. The chapter closes with an overview of the most common tools for data lineage.

    The second chapter (Bialek 2021) explores and analyzes the need for organizational flexibility and how to achieve it in a cloud-based environment. Most institutions face the situation of a silo-oriented environment. The chapter describes MongoDB as a solution to offer organizational flexibility to an institute. The chapter illustrates different paths leading to the cloud and a scalable environment for data modeling and data management.

    The part’s third chapter (Bajer et al. 2021) looks at a special database class: the graph database. The chapter first explains why graphs and their ability to document and analyze connections are an important tool in our connected world. The chapter provides the technical background, such as data model, storage and visualization of graphs as well as providers of tools for this special database. The discussion of use cases in the graph context closes the chapter.

    The final chapter of this part (Morawski and Schmidt 2021) provides a summary of data tiering options in SAP HANA given different surroundings. Motivated by the cost-pressure originating from in-memory databases (IMDB), the chapter explains the need for an application-driven provision of storage capacities. The chapter presents blueprints for how to implement data tiering of SAP HANA databases with different Hadoop environments (e.g. spark and SAP Vora). The tools presented include the SAP Data Lifecycle Manager.

    Literature

    Akhgarnush, Eljar, Lars Bröckers, and Thorsten Jakoby. 2019. Hadoop—a standard framework for computer clusters. In The Impact of Digital Transformation and Fintech on the Finance Professional, edited by Volker Liermann and Claus Stegmann. New York: Palgrave Macmillan.

    Bajer, Krystyna, Sascha Steltgens, Anne Seidlitz, and Bastian Wormuth. 2021. Graph Databases. In The Digital Journey of Banking and Insurance, Volume III—Data Storage, Processing, and Analysis, edited by Volker Liermann and Claus Stegmann. New York: Palgrave Macmillan.

    BCBS 239. 2013. Basel Committee on Banking Supervision (BCBS) 239. Accessed December 15, 2020. https://​www.​bis.​org/​publ/​bcbs239.​pdf.

    Bialek, Boris. 2021. Digitization and MongoDB. In The Digital Journey of Banking and Insurance, Volume III—Data Storage, Processing, and Analysis, edited by Volker Liermann and Claus Stegmann. New York: Palgrave Macmillan.

    Chang, Fay, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A Distributed Storage System for Structured Data. Mountain View, CA: Google Inc.

    Dean, Jeffrey, and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. Mountain View, CA: Google Inc.

    Freche, Jens, Milan den Heijer, and Bastian Wormuth. 2021. Data Lineage. In The Digital Journey of Banking and Insurance, Volume III—Data Storage, Processing, and Analysis, edited by Claus Stegmann and Volker Liermann. New York: Palgrave Macmillan.

    Morawski, Michael, and Georg Schmidt. 2021. Data Tiering Options with SAP HANA and Usage in a Hadoop Scenario. In The Digital Journey of Banking and Insurance, Volume III—Data Storage, Processing, and Analysis, edited by Volker Liermann and Claus Stegmann. New York: Palgrave Macmillan.

    Footnotes

    1

    See (Chang et al. 2006).

    2

    MapReduce (see Dean and Ghemawat 2004) is strongly associated with Hadoop, incorrectly. While the Hadoop ecosystem has already further developed with spark and Databricks, the MapReduce concept is no longer of importance.

    3

    See details on Hadoop in (Akhgarnush, Bröckers and Jakoby 2019).

    4

    Which happens quite often if the technology is relevant.

    5

    Even if the term data lineage is not mentioned explicitly.

    © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021

    V. Liermann, C. Stegmann (eds.)The Digital Journey of Banking and Insurance, Volume IIIhttps://doi.org/10.1007/978-3-030-78821-6_1

    Data Lineage

    Jens Freche¹  , Milan den Heijer¹   and Bastian Wormuth¹  

    (1)

    ifb SE, Grünwald, Germany

    Jens Freche (Corresponding author)

    Email: Jens.Freche@ifb-group.com

    Milan den Heijer

    Email: Milan.denHeijer@ifb-group.com

    Bastian Wormuth

    Email: Bastian.Wormuth@ifb-group.com

    Keywords

    Data lineageRegulatory requirementsPowerDesignerDocumentationImpact analysisEnd-to-end documentation

    1 Introduction and Motivation

    In the financial sector, the main driver for a firm to maintain a high standard of data lineage documentation is compliance. On the one hand, external compliance is required by law by the regulator, and on the other, large organizations often have internal best practices to facilitate compliance which are necessary due to the scale of their IT landscapes. Apart from considering data lineage as a burden, investing in detailed and transparent documentation of a firm’s data and data flows in its IT landscape on different levels can offer significant

    Enjoying the preview?
    Page 1 of 1