Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform
Ebook432 pages2 hours

Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform

Rating: 0 out of 5 stars

()

Read preview

About this ebook

"How is the Data Science project to be implemented?" has never been more conceptually sounding, thanks to the work presented in this book. This book provides an in-depth look at the current state of the world's data and how Data Science plays a pivotal role in everything we do.

This book explains and implements the entire Data Science lifecycle using well-known data science processes like CRISP-DM and Microsoft TDSP. The book explains the significance of these processes in connection with the high failure rate of Data Science projects.

The book helps build a solid foundation in Data Science concepts and related frameworks. It teaches how to implement real-world use cases using data from the HMDA dataset. It explains Azure ML Service architecture, its capabilities, and implementation to the DS team, who will then be prepared to implement MLOps. The book also explains how to use Azure DevOps to make the process repeatable while we're at it.

By the end of this book, you will learn strong Python coding skills, gain a firm grasp of concepts such as feature engineering, create insightful visualizations and become acquainted with techniques for building machine learning models.
LanguageEnglish
Release dateJan 17, 2022
ISBN9789391392956
Practitioner’s Guide to Data Science: Streamlining Data Science Solutions using Python, Scikit-Learn, and Azure ML Service Platform

Related to Practitioner’s Guide to Data Science

Related ebooks

Computers For You

View More

Related articles

Reviews for Practitioner’s Guide to Data Science

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Practitioner’s Guide to Data Science - Nasir Ali Mirza

    CHAPTER 1

    Data Science for Business

    Data Science is an emerging requirement for successful businesses. Data strategy is becoming a critical factor for business success and growth. Dependency on data usage for businesses has moved way beyond traditional departmental or enterprise reporting. Competitive usage of enterprise data has entered the realm of building and utilizing Data Science solutions to drive operational efficiency, deliver context-based personalization of services and offerings, and have Machine Learning ( ML )/ Artificial Intelligence ( AI ) model-assisted decision support systems and automations. This chapter provides an overview of Data Science and the scope of its application in business. It also touches upon its responsible use and implementation in terms of ethical and legal principles.

    Structure

    In this chapter, we will discuss the following topics:

    Application programmer to Data Science professional

    What is Data Science?

    Unprecedented scope of Data Science

    Data Science application

    Big Data, Data Mining (DM), Machine Learning (ML), Deep Learning (DL), Artificial Intelligence (AI), and Data Science

    Legal, ethical, and security aspects of Data Science

    Methodology used in organizing this book

    Objectives

    At the end of this chapter, you should be able to:

    Describe and differentiate ML, DL, AI, Big Data, and Data Science.

    Understand the scope and application of Data Science.

    Explain the responsible use of Data Science.

    Application programmer to Data Science professional

    As an application programmer, we deal with data for things like recording data, adding validations, performing aggregations, efficiently storing and retrieving data, developing transactional reports, performing data fixes, making sure data types are right and optimal.

    Data Science is another level of working with data than regular application development. It is about finding insights from the data that are not otherwise evident. We perform a statistical study of data and explore the relations among different data elements present in the dataset. As an example, in regular working with data, we will capture, store, and report on sales data, whereas in Data Science the focus is on how sales data are influenced by related internal and external factors, for example, how are sales affected by price, season, geography, customer demography, promotional schemes, and competitors offers.

    While developing a Data Science solution, we start with studying the business goals, then define hypotheses and theories, determine data collection needs, and how to obtain required data (Data Engineering). We look at the range of values in the dataset, their mean, mode, median, standard deviation, data distribution, multicollinearity, and variance. We will conduct visual and statistical exploration of data to find out correlations among the data elements. These correlations help identify causes and effects using statistical procedures and measurements. Once influential data elements (features) in the dataset are identified, the solution will utilize these features with appropriate algorithms for training the model(s) for building insights and predicting the future outcomes. Data Science solution development also involves refining and enriching raw data into more valuable features that will make model training more efficient in determining data relations and their weightage. As a part of solution development, we need to deal with data problems like mixed scaling, skewness, class imbalances.

    Data Science is the next level of working with data, performing its statistical analysis and building predictive models.

    The purpose of the Data Science programming is to determine patterns, correlations, predictor weightages in the existing data such that relevant models can be built for predicting future outcomes. There are well-developed methods to evaluate the performance of these models in terms of their correctness and error margins when used with the newly available data.

    Table 1.1 shows how scope, purpose, and evaluation paths are different when working with data as an application programmer and as Data Science professional:

    Table 1.1: Different aspects of application programming and Data Science solutioning

    What is Data Science?

    Data Science is the application of a scientific methodology to the study of data for the purpose of extracting insights and making predictions with trained models.

    Its full scope comprises of defining theories and hypothesis, data collection, its statistical analysis, raw data enrichment, model building, its optimization, and evaluation. During the study of data, scientific methodology is applied to accept or reject the hypothesis based on the statistical calculations and measurements, things like significance tests, confidence intervals, measurement of correctness, and errors in making predictions.

    Data scientists possess a strong passion for devising interesting questions and obtaining raw data to answer those questions. They have personalities full of curiosity, inquisitiveness, and imagination. Their skills include conducting statistical analysis of data, its numerical interpretation, utilization of programming languages like Python and R, effective verbal and written communication skills, ability to comprehend business domain to the extent of Data Science work and research and investigative mindset.

    Data Science is the application of a scientific methodology to the study of data for the purpose of extracting insights and making predictions with trained models.

    As a Data Science professional, you would understand the business goals for conducting the data science work and review the business data landscape. You will devise the plan for collecting business data, prepare and clean this data, and perform its detailed exploration and analysis. Once data validity and reliability are established, work on building the models, their testing, optimization, and evaluation. Finally, deploy the model for use with the newly available data.

    The unprecedented scope of Data Science

    Data Science as an area of study and application has been there for a long time albeit not with the same name. Since it is based on the concepts and principles of statistics and mathematics, it did exist in academic, and business fields for a long time. What has given birth to the unprecedented scope of Data Science application in the last couple of decades owes to the fact of tremendous advancement in Computer Science and related technologies.

    Data Science applies to every field and situation where data either already exist or can be obtained, essentially every industry, and subject.

    Earlier there were limitations in capturing, storing, and processing data. In all these aspects, there was manual human labor involved with very limited automation. During the last two to three decades (1990 onwards), there has been exponential advancement in automatic data capturing, storage, and processing technologies. For example, a number of IoT-connected devices in 2015 were 15.4 billion that increased to 30.7 billion by 2020 and is estimated to increase to 150 billion by 2025. A number of internet users by 1995 were 16 million (0.4% of the world population) and by 2020 it was 4,833 million, that is, 62% of the world population. The volume of data created in the year 2010 alone was 2 Zettabytes (ZB) (2 trillion GBs approximately). World data in 2018 was 33 ZB and it is estimated to grow to 175 ZB by 2025.

    On one hand, there has been a huge progress in storage technologies, and on another their cost has been reduced significantly to make it affordable at an individual and small business levels as well. Storage cost in 1985 was $ 100,000 per GB, in 2000 it reduced to $7.5 per GB, in 2015 $ 0.038 per GB and by 2020 it was $ 0.01 per GB.

    In the field of automatic data capturing, the huge amount of data gets recorded 24×7 about our personal and public life. In 2010, there were 298 data interactions daily per person, and by the year 2025, this is estimated to increase to 4909 interactions daily per person. Right from gadgets and appliances that we use are IoT enabled meaning they are generating and transmitting data, things like refrigerators, washing machines, cookers, security cameras, elevators, watches we wear, cell phones we use. Our communication over social media, net surfing we do, our likes, dislikes, preferences all of this gets recorded automatically. Cell phones and applications running on our cell phones collect a variety of data round the clock about us. Alone Google Maps platform is consumed by more than a billion people and 5 million active apps and websites regularly, this generates very large location data 24×7 that is of significant interest for multiple business use cases. Smartwatches and phones capture our health and fitness data. Our spending behaviors, consumption and credit data, travel data are all getting recorded automatically. In the public life, the vehicles we use, the routes we take, public surveillance systems generate, and capture a large volume of audio-visual data. The knowledge produced in the past is getting digitized with the help of tools and technologies like image to text, searchable video contents. Telemetry data from manufacturing, transportation, utility services, and so on continue to add a large volume of real-time data that gets captured, transmitted, and saved automatically.

    The global datasphere will grow from 33 ZB in 2018 to 175 ZB by 2025. Nearly 30% of the world’s data will need real-time processing. (Data Age 2025)

    This level of automatic data capturing is equally matched with enormous growth in low-cost storage, and compute power. These three factors − automatic data capturing, low-cost storage, and compute power − together with advancement in other areas of Computer Science have facilitated the application of Data Science factually everywhere. The recent advent and faster adoption of cloud computing has removed the physical barriers further, and taken this automatic data capturing, storing, and processing to new heights.

    Data Science application

    Data Science applies to every field and situation where data either already exist or can be obtained, essentially every industry and subject. This is because the Data Science deals with the study of data for the purpose of extracting meaningful insights, finding answers to interesting questions, and being able to make predictions.

    The advancement in automatic data capturing, low-cost storage, and compute power has opened opportunities for applying Data Science in an unprecedented scenario. Let’s take a brief look at some of those scenarios:

    Health services will have data about patients visits to clinics, medical diagnostics data, medications prescribed by doctors, improvements in the patients’ health, data about diseases, and patient demographics. Data Science is applied here for a large variety of scenarios like building insights about the effectiveness of various treatments, predicting the high-risk patients for specific diseases, and preventive care that can be adopted.

    Educational services like schools, colleges, universities possess data about different educational programs, student’s socio-economic data, teachers, enrollment data, student and teacher performance, dropouts, job placements. Data Science is applied for building insights about student’s engagement, teacher’s effectiveness, improvements in educational programs, determining dropout risk to help plan some preventive measures. Performance evaluations and effective formation of educational programs.

    Sales and services companies like telecom providers, insurance providers, hospitality industry, manufacturing. Service providers have data about various services and products they offer, data about existing customers, data about the bundling of services, promotions, customer service consumption, feedback, and so on. Data Science is applied for driving customer satisfaction campaigns, proactive maintenance of infrastructure, profitable grouping of services, adjustments, and alterations for different geographies and seasons, effective stock management, early identification of risk of customer churn.

    Governance, different state, and central governments possess very large valuable data about cities and populations at large, their socio-economic data, demographic data, educational and employment data, health data, criminal data, developmental data. Data Science can greatly help in extracting insights for objective evaluation of successes and challenges in all the important areas that affect the well-being of society at large. These insights and predictions will help in the efficient allocation of resources and execution of developmental projects.

    Financial sector is another important field where a very large number of financial transactions take place daily. In 2019, daily average transactions in the Forex market were to the tune of 6.6 trillion U.S. dollars. NPCI platform for retail payments and settlements in India recorded 4,428 million transactions in 2014-15 which reached 42,660 million transactions in 20192020. Extracting insights and patterns from this huge volume of transactions is possible only with the use of Data Science. The determination of new opportunities, cross-selling, and detection of fraudulent transactions and much more can be achieved with the application of Data Science.

    Research and analysis is closely related to the scientific methodology of Data Science. Research work revolves around building theories and hypotheses, conducting experiments, collecting, and analyzing data to prove or reject the hypothesis. Hence, the research and data science are very closely related.

    Health, education, governance, trade, and commerce are some examples where Data Science can be applied for extracting insights and making

    Enjoying the preview?
    Page 1 of 1