Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI
Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI
Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI
Ebook631 pages3 hours

Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Have you ever tried to learn to code or to use advanced visualization tools? If so, I am sure you know how daunting it is to learn by yourself.

Generally, tools and books follow an encyclopedism approach, i.e., books attempt to teach every feature about a coding language or tool. This implies hundreds, if not thousands of pages simply to tackle a single topic, whether SQL, Python, MS Excel, MS PowerBI, you name it. The journey from zero to hero to become proficient using numerical and visualization tools to take your career to the next level becomes an ordeal that requires years and thousands of pages just to begin putting the pieces of the puzzle together. However, the reality is that you do not need to learn absolutely every available feature to use those tools and deliver a superior project.

Rather than teaching you about the forest, I will discuss specific trees. Why? Because once you become familiar and confident nurturing a few trees, growing a forest becomes a simple process of planting new trees. This book provides the fundamental blocks so that you can learn about financial data science and take these tools and start using them tomorrow. The scope of the selected tools will empower you to see a considerable improvement in your financial modeling skills.

The book is designed to provide corporate finance professionals the ability to start immediately using advance tools for concrete real-world tasks.

Therefore, this book is all about functionalism. It is about providing you with tools that will put you to work and dramatically change the way you analyze data. Once you see the benefits, it will become natural to keep expanding your domain knowledge, leveraging today's endless available educational resources.

LanguageEnglish
Release dateDec 7, 2021
ISBN9781649527226
Data Analysis for Corporate Finance: Building financial models using SQL, Python, and MS PowerBI

Related to Data Analysis for Corporate Finance

Related ebooks

Corporate Finance For You

View More

Related articles

Reviews for Data Analysis for Corporate Finance

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Data Analysis for Corporate Finance - Mariano F. Scandizzo CFA CQF

    cover.jpg

    Data Analysis for Corporate Finance

    Building financial models using SQL, Python, and MS PowerBI

    Mariano F. Scandizzo, CFA, CQF

    Copyright © 2021 Mariano F. Scandizzo, CFA, CQF

    All rights reserved

    First Edition

    Fulton Books, Inc.

    Meadville, PA

    Published by Fulton Books 2021

    ISBN 978-1-64952-721-9 (paperback)

    ISBN 978-1-64952-722-6 (digital)

    Printed in the United States of America

    Table of Contents

    Becoming a Financial Data Scientist

    Relational Databases

    Financial Data Science

    Numpy

    Pandas

    Corporate Finance Tools

    Data Visualization

    Matplotlib

    Python Primer

    To Maria, Marco, and Ava, who bless me every day with endless moments of perfection. To my parents, Marta and Delfor, and to my grandparents Marco, Susana and Carmencita for their love, guidance, and unconditional support.

    About the Author

    Mariano carries more than 20 years of experience working in Investment management, corporate strategy, private equity, and business consulting.

    Mariano’s professional path has taken him to work in a broad range of multicultural industries and sectors in Latin America, the Middle East, as well as in the US.

    He has worked in hundreds of transactions building financial models from the ground up. The tools he shares in this book are the tools he has researched, crafted, and implemented to deliver state of the art financial modeling solutions to empower decision makers looking for data backed strategies supported by artificial intelligence, and data science to provide real-time actionable tools to ensure company’s growth or to understand and mitigate risk. His approach is geared explicitly towards practitioners seeking actionable tools to address real-life situations and transactions.

    Mr. Scandizzo is a Chartered Financial Analyst by the CFA Institute. He also holds a Certification in Quantitative Finance by the CQF Institute as well as a financial modeling and valuation analyst certification from Corporate Finance Institute. He holds a Certification in Artificial intelligence from Lambda School. He has a Postgraduate in Finance from Universidad de San Andres and a Bachelor of Arts in Economics, Universidad Argentina de la Empresa.

    Companion Github page

    Dear reader, the repositories associated to the code shown in this book can be found at:

    https://github.com/mscandizzo/CorpFinBook

    Chapter 1

    Becoming a Financial Data Scientist

    Simplicity is complex. It’s never simple to keep things simple. Simple solutions require the most advanced thinking.

    —Richie Norton

    From encyclopedism to functionalism

    Have you ever tried to learn coding or use advanced visualization tools? If so, I am sure you know how daunting it is to learn by yourself.

    Generally, tools and books follow an encyclopedism approach, i.e., books attempt to teach you each and every feature of a coding language or tool. This implies hundreds if not thousands of pages simply to tackle a single topic, whether SQL, Python, MS Excel, MS PowerBI, you name it. The journey from zero to hero in order to become proficient using numerical and visualization tools to take your career to the next level becomes an ordeal that requires years and thousands of pages just to begin putting the pieces of the puzzle together. And if that´s not enough, by the time you start grasping the surface of the topic you are trying to master, updates are released, and once again, you are behind the curve. However, the reality is that you do not need to learn absolutely every available feature to use those tools and deliver a superior project.

    Have you ever attempted to learn a second language? If so, I assume that you did not go into memorizing the entire encyclopedia written in that language. Although every language has thousands of words, you can achieve proficiency if you manage to learn a couple of hundred words while grasping a good understanding of the grammar. That would be more than enough to communicate and to live and work in a different country.

    The purpose of this book is to help you become functional in using data science tools. The objective is to set the foundations so that you can quickly learn and start applying different tools which will dramatically enhance your modelling techniques.

    Rather than teaching you about the forest, I will discuss about specific trees. Why? Because once you become familiar and confident nurturing a few specific trees, growing a forest becomes a simple process of planting new trees. In our financial jargon, this refers to a bottom-up approach. This book provides the fundamental blocks so that you can learn about financial data science and take these tools and start using them tomorrow. The scope of the selected tools will empower you to see a considerable improvement in your financial modeling. This book is a collection of first steps in your journey of a thousand miles as it will be an important part of your journey to further expand this initial knowledge.

    The main objective is your ability to quickly start using advance tools for concrete real-world tasks. I do not want you to go for months reading and reading and still unable to implement the concepts when needed.

    Therefore, this book is all about functionalism, it is about providing you with tools that will put you to work and dramatically change the way you analyze data. Once you see the benefits, it will become natural to keep expanding your domain knowledge, leveraging the endless available educational resources.

    The first part of this books describes the basic challenges that every financial professional faces almost on an daily basis.

    Part two introduces the reader to relational databases. It provides examples and applications about how to design financial models using a relational database architecture.

    Part three dives into Python, NumPy and Pandas libraries as well as how to integrate a relational database into Python analysis.

    Part three describes how to build scalable financial charts and dashboards in Microsoft PowerBI.

    Part four puts together all the previous building blocks to create valuation methods where we will integrate MS Excel, Database techniques, Python, and MS PowerBI.

    Finally, two sizeable appendices are provided for those who seek a deeper understanding about Matplotlib visualization library and Pure Python.

    Finding a better way

    Always desire to learn something useful.

    —Sophocles

    After dedicating the last two decades of my life to finance, I have reached an undeniable conclusion: No matter how good you are at building financial models, stakeholders will always request information which was not initially contemplated by your model. Yes, I know it is frustrating, you work for weeks on building the most sophisticated and reliable financial model, and when you finally get to present the conclusions, decision makers find aspects they would like to assess which your model did not include. So you go back to your workbench to improve the model.

    The average financial model covers one or more of the following aspects:

    It is built using the three basic financial statements (balance sheet statement, income statement, and cash flow statement).

    It uses financial inputs to:

    conduct scenarios and sensitivity analysis;

    model the expected outcomes of strategies under analysis;

    track key performance indicators;

    model the expected future behavior of specific variables (sales, cost, etc.); and

    analyzes historical performance.

    These models typically follow a grid format. Output information includes financial ratios, scenario and sensitivity analysis, and Monte Carlo simulations in some cases.

    Every model up to a certain point in time and complexity was probably manageable and easy to follow. However, as new requirements are included following the request of stakeholders, the model starts to mutate.

    That is the moment when the buildup of Frankenstein begins. Yes, Frankenstein, you heard me right. Every model begins with an elegant well-structured logic used when conceived. Sadly, as you start using the model and begin introducing patches, that logic fades away. The unequivocal outcome is new tabs linked to other tabs, the formula of the formula of the formula, scattered cells across the models with random formulas or hardcoded inputs, etc.

    The result is a model that works, but it is very rigid and extremely cumbersome. Every change not only makes the model far more complicated and difficult to read but also requires long hours to make it work. Needless to mention, even when nowadays the world has reached a total population of 7.8 billion people, the only person on earth capable of reading or using the model is you.

    Does it have to be that way?

    No, that is why I want to share my journey with you. After proudly building cumbersome models for years (Which made me very proud and respected among my peers), I finally reached a point where it was impossible to create bottom-up comprehensive models without crashing my computer. Even then, I persisted thinking that the problem was processing power. So I went into buying the most powerful, NASA style computers. That journey took me to using CPUs with server processing specifications as my personal computer, trying to cope with the complexity of my models. Of course, it did not work. My models continued to crash. However, that was not the worst part. Being an economist, I knew that I could have more robust models if I compiled and processed more data while including multiple sources. But in order to do so, I had to manage multiple tables and data sources. The complexity grew exponentially since I had to preprocess data to get it in the right format, manipulate several tables, run complex calculations, and finally produce an outcome, not only readable but also in a friendly format. Coming from an MS Excel and Visual basic background, I simply was not able to put together a model with the degree of complexity needed to achieve the best possible representation of reality.

    That was the moment when I realized that I needed help. Luckily, I always was in good terms with the guys in the IT department. The software engineer in charge of the Oracle ERP quickly looked at me and said, Mariano, you can’t build a model using that architecture, you have to build it using a relational database structure.

    That was the beginning of my journey. So let me share with you my findings.

    The blueprint

    Every well-built house started in the form of a definite purpose plus a definite plan in the nature of a set of blueprints.

    —Napoleon Hill

    We have agreed that models grow and mutate as time goes by. Given that condition, our approach has to change at its genesis. In a way, while working with Excel, we tend to make every new cell and formula an additional brick that we add in our construction process. Now, the picture of the tower we are trying to build is not always clear; hence when the job is done, as the saying goes, a broken glass works as a glass, but it is broken.

    So the building we have created works, but it is oddly shaped, and the apartment layouts are not that efficient.

    To revert that logic, we have to think in terms of what is that ultimate product we want to deliver. We have to identify those features which are a must while leaving room for features that might not be necessary today but could be needed in the future, without fundamentally changing the general structure and foundations of the product.

    If you have kids or you are young enough, you probably remember how while building Legos, the first step is to identify the different pieces to have a clear idea about how to assemble the toy by taking into consideration shapes and characteristics. We are about to do the same here, which is to assess and relate the different tools we will use to ensure their integration and implementation not only follows a logical sequence but also provides the flexibility to keep adding and plugging new features and tools in the future.

    Let me share the initial framework of the different tools, i.e., the pieces of the puzzle we are trying to put together:

    Fig. 1: Financial Data Science Pipeline

    My goal is to empower you to become a financial data scientist. I define a financial data analyst as an individual who is fully proficient in finance and has the ability to use and apply data science tools such as coding in Python and SQL to carry financial analysis, employing a large array of structured and unstructured data to deliver interactive visualizations to decision makers.

    While MS Excel is a cornerstone piece of any financial modeling attempt we make, it does not make sense to try to do everything with a single tool, just like we would not try to build a house using exclusively either a hammer or a screwdriver. It does not make sense either to become a PhD at using the hammer or the screwdriver.

    The functional financial data scientist must have adequate knowledge to use all the aforementioned tools to the required degree to perform the job. Once the initial minimum threshold has been reached, the journey should be to continue to build on top of the initial foundations in a modular form driven by complementing the initial skill set in a rather modular way. What do I mean by modular way? Let´s say you want to explore the use of SQL. Trust me when I tell you that the most practical use of SQL you will do if you work in finance is to query and join tables. My approach would be to only learn to query and join tables and start using SQL to optimize the way you work. Once the aforementioned practice has become second nature, you might want to aggregate data in SQL rather than to aggregate the data you have gathered using Python or MS Excel. That is great. You see what I mean. You get actionable content, you start using it, and only then you add more content.

    In terms of core data science skills, the following chart shows the progression I will follow when it comes to expanding the knowledge base.

    Fig. 2 : Data Science Domain Knowledge Progression

    The challenge

    The journey of a thousand miles begins with one step.

    —Lao Tzu

    Moving from grid report design to table design.

    Are you familiar with something like this?

    Fig. 1: Grid Design Layout

    Of course you are, that is the kind of model you have probably been building most of your career!

    Is there anything wrong with it? Absolutely not. Nevertheless, these kinds of models have a fundamental flaw. That is, they are not scalable.

    The vast majority of data tools you will find in Excel (I would risk to say, all of them) only work if the data follows a tabular design.

    A tabular design just means a table where the information is organized in columns.

    The report below has a two-dimensional grid design where each value represents the intersection of two variables.

    Fig. 2: Two-Dimensional Row-Column Intersection

    The layout also has subtotals scattered across the grid table.

    It is undeniable that the layout is quite nice and elegant, a great end product to present to stakeholders. The problem with using the aforementioned layout/design is that the model is completely inflexible. Each and every new feature addition will have to be done manually.

    As time goes by, you will start adding ratios, cells which are sums of scattered cells, etc. You know exactly what I am talking about. Wouldn´t it be nice to have a clean model where you have input tables and output reports?

    Imagine a model where every time you want to include a new calculation, you either add a new field in your input tables or create a custom formula, then just map where to insert the new field in the dashboard report, and voilà, it works.

    One final unique selling point for your consideration. The traditional Excel-driven models also carry a huge pitfall. The creator is the only person able to use and manage the model, so it is quite difficult to collaborate. On the other hand, a model based on relational tables is using a well-spread and proven logic with fifty years of history. The architecture is cleaner and linear, so users can take the model embrace and use it, becoming an open-source model rather than a cumbersome black box.

    Understanding relational databases

    You might be wondering, why do you need to know about relational databases? If you typically use several Excel files which contain a variety of tables, you are already managing a database. Now, a well-structured database must follow certain well-defined rules and employ a specific architecture. Even more important, it must be built using tables, not grid layouts.

    Most of our work as I mentioned before will revolve about querying precise fields from a specific table or joined tables. To understand a database, we do need to apply a top-down approach where we first understand its full architecture to then determine the location of the information we seek. I cannot emphasize enough on this point. To query is to search for data, even if you are comfortable writing queries in SQL you will not go too far if you don’t know where to look. It is imperative to first develop adequate domain knowledge about the database/s where your data is stored before moving into searching for data. I know it sounds obvious, but you will be surprise about how often this first step is overlooked.

    Warm-up question. Look at the two tables below, which one do you think is a table?

    The answer is the one on the left-hand side. Why do you think it is important? Let us look at the definition of a database table.

    Fig. 3: Table Design versus Grid Design

    A table is a set of data elements (values) using a model of vertical columns (identifiable by name) and horizontal rows. The cell is the unit where a row and column intersect. A table has a specified number of columns but can have any number of rows. Each row is identified by one or more values appearing in a particular column subset.

    Each row represents a specific record while each column constitutes features/ attributes of that record. It is important to ensure the data consistency of the table in order to avoid duplicated records. A specific column or choice of columns which uniquely identify rows is called the primary key.

    Tables are the basic pillar of any analysis. Absolutely 100 percent of any application used to manipulate, analyze, and model data has been structured to handle tables. I have deliberately omitted non-relational databases since they are an exception to the aforementioned concept. Having said that, a good understanding of the relational database model will help navigate the nonrelational database model.

    To ensure a model is built as a collection of tables will save the user from making the most basic, most common, most damaging mistake an analyst could make: Not to structure data using a tabular format. Not doing so warrants that the model is doomed to be unscalable, driven by a sequence of spaghetti formulas manually inserted. Not to mention, the analyst will have to spend a couple of nights in the office every time new features not yet captured by the model must be added. No matter how complex and comprehensive is the software the analyst has at his disposal (Tableau, MS PowerBI, Plotly, etc.), if the data is not organized in tables, the analyst won’t be able to apply more than 10 percent of its features simply because the data is not organized in a processable format, hence, software tools can’t be applied to the data structure.

    A well-crafted model should follow the following structure:

    A screenshot of a stereo Description automatically generated

    Fig. 4: Financial Model Project Pipeline

    Each component should be clearly defined and separated. Examples such as the following:

    Fig. 5: Grid Design Financial Model

    These are great in that they carry a nice format and all the information is in one place. Nevertheless, it is limited by the fact that a single grid contains raw data, data processing, and output all in one.

    Now, it is time to understand how a relational database works so you can start building models regardless which tool (Excel, Access, PowerBi, Python) you decide to use so that you can apply 100 percent of its capabilities.

    Chapter 2

    Relational Databases

    Know where to find the information and how to use it. That’s the secret of success

    —Albert Einstein

    Introduction

    When learning about SQL, often the learning process starts by writing and applying queries. Nonetheless, I do not think that’s necessarily the most useful way to go about it. Our scope of work is not to become software developers or database administrators but rather to identify, gather, and filter data to be used for financial modeling. The most important learning objective is to truly understand the logic and rules of a relational database. I can go even further and say that if you work with relatively small data sets (tens of columns and a few thousand rows), you could just use Excel rather than SQL Server as a database, and you will be more than fine. Having said that, you won’t be able to efficiently use and relate tables in SQL Server, MS Excel, or any other data storage tool if you do not understand and apply the rules needed to structure an efficient relational database.

    A relational database is a type of database that stores and provides access to data points which are related to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables (I will fully describe the model in the following pages). In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.

    Allow me to introduce you to a database diagram. The objective of this chapter is not only to discuss relational databases and how they work, but also to provide you with cases which closely resemble the database that your company most likely has in terms of size and complexity. Thus, I have chosen the AdventureWorks database.

    Diagram, schematic Description automatically generated

    Fig. 6: AdventureWorks Schema

    AdventureWorks Database is a Microsoft sample database. AdventureWorks Database supports a fictitious multinational manufacturing company called Adventure Works Cycles.

    Let´s begin by getting familiar with some jargon.

    Database schema refers to the organization of data as a blueprint of how the database is constructed, which entails dividing the data into separate tables in the case of a relational database as well as defining how tables are related to each other to conform the relational aspect of the database.

    The previous image describes the AdventureWorks schema. You might be asking yourself, why is it so important to understand the database schema? Because the relationships among tables will provide the connections to obtain and manipulate data coming from different tables.

    Installing MS SQL Server

    The first challenge will be to install Microsoft SQL Server¹ and SQL Server Management Studio² (SSMS) as well as the AdventureWorks³ database (available online for free). All we need is freely available to download and install (the links to the different websites are provided for reference).

    The process of installing SQL Server and SSMS is relatively straightforward, although I know it might sound daunting.

    As part of the skills which are required in the data science field, to learn to effectively conduct debugging is among the most essential ones. The initial setup of a database, glitches due to issues with software versions, etc., are a reality that you will deal with, and no book will provide you with the exact answer that will meet the configuration of your computer as well as the specific software version you are installing. Your two greatest allies to successfully navigate debugging issues are the following:

    www.google.com

    www.stackoverflow.com

    There is almost no question you could have regarding bugs and how to which are still unanswered. It is a good practice to get familiar with the use of both Google and Stackoverflow to guide you when dealing with problems you cannot fix.

    Hopefully by now, you have the SQL Server, SSMS, and AdventureWorks installed in your computer and are ready to start working with them.

    The Relational Model

    In the early years of databases, every application stored data in its unique structure. When developers wanted to build applications to use that data, they had to know a lot about the different data structures to find the data they needed. These data structures were inefficient, hard to maintain, and difficult to optimize for delivering good application performance. The relational database model was

    Enjoying the preview?
    Page 1 of 1