Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

From Big Data to Intelligent Data: An Applied Perspective
From Big Data to Intelligent Data: An Applied Perspective
From Big Data to Intelligent Data: An Applied Perspective
Ebook187 pages1 hour

From Big Data to Intelligent Data: An Applied Perspective

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book addresses many of the gaps in how industry and academia are currently tackling problems associated with big data. It introduces novel concepts, describes the end-to-end process, and connects the various pieces of the puzzle to offer a holistic view. In addition, it explains important concepts for a wide audience, using accessible language, diagrams, examples and analogies to do so. The book is intended for readers working in industry who want to expand their knowledge or pursue a related degree, and employs an industry-centered perspective.


LanguageEnglish
PublisherSpringer
Release dateJun 26, 2021
ISBN9783030769901
From Big Data to Intelligent Data: An Applied Perspective

Related to From Big Data to Intelligent Data

Titles in the series (100)

View More

Related ebooks

Business For You

View More

Related articles

Reviews for From Big Data to Intelligent Data

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    From Big Data to Intelligent Data - Fady A. Harfoush

    © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021

    F. A. HarfoushFrom Big Data to Intelligent DataManagement for Professionalshttps://doi.org/10.1007/978-3-030-76990-1_1

    1. Introduction

    Fady A. Harfoush¹  

    (1)

    CME Business Analytics Lab, Loyola University Chicago, Chicago, IL, USA

    Keywords

    Business intelligenceBusiness analyticsBig dataParadigm shiftIntelligent informationSingularity

    1.1 The Business Value Proposition

    Twitter generates an average of 500 million tweets per day. This number is increasing by the day. Getting access to Twitter’s full daily tweets (the firehose) is costly, and limited to those who really need it, and can afford it. It is estimated that access to the firehose costs somewhere in the few hundred thousands of dollars per year. Is this a reasonable price to pay? What do we get in return? Is this a good business value proposition for Twitter? Do the benefits or business rewards outweigh the costs? What is the value added for a business to access the full firehose of tweets if the company does not have the infrastructure and the know-how to capture, and analyze the tweets to extract the valuable insights—and do it in almost real time? Not a small undertaking. Before rushing and jumping on the big data bandwagon these are some of the questions every business should address first, both from the technological and from the business perspectives.

    It is safe to assume that only 1% of all daily tweets contain valuable or intelligent information that can be translated into actionable insights.

    What is the added business value to access more than 500 Million tweets per day and what is a good price? If 99% of big data is dirty data, which remaining 1% is good data?

    Finding the 1% is like searching for a needle in a haystack. How is a business supposed to assess the price of a subscription to the tweets, and the return on the investment if one cannot easily distinguish between a valuable or true tweet and a junk or fake tweet? Should a consumer be charged for the 1% only and which 1%? Who decides the right percentage? We should be charged for the quality of service we receive. Not the case with data. In almost all cases the data provider or vendor will have a disclaimer of the sort "XYZ assumes no responsibility for errors or omissions. The user assumes the entire risk associated with its use of these data." A good comparison, albeit in a different context, is when purchasing fruits from a grocery store. The price is set by the weight, and not by how much juice can be extracted from the fruit. In principle the price should be set by the amount of juice content. This may sound like a far-fetched scenario but thanks to evolving new sensing technologies, it is possible in the foreseeable future we could be charged by the content of juice, and not by the weight.

    Clearly a business model based on providing access to the tweets’ firehose is not a good business value proposition for everyone. The real value proposition resides in the analytics and actionable items that can be extracted from the tweets. For these reasons, Twitter has partnered with few in the industry among which is my co-founded SMA (www.​socialmarketanal​ytics.​com) to help businesses in different sectors access the analytics derived from the tweets.

    1.2 The Enhanced Value

    A good introduction to the topic is a scene from the movie "The Circle" (2017). In the scene the lead actor, played by Tom Hanks, is presenting the company’s new product (a cheap tiny camera with real-time broadcasting capabilities) to the employees and the new recruiters. During the presentation, he quotes two details critical to creating the enhanced value.

    The first quote is about linking the information from different sources to create a unified view by which an enhanced level of intelligent information is obtained. This ties very well, as we will see in later chapters with the topic of IoT (Internet of all Things).

    Knowing is good, but knowing everything is better

    The second quote emphasizes the need to process the data and run the analytics in real time.

    "Real Time Analytics Process"

    Both quotes have major implications we will discuss throughout this book. They represent the biggest challenges and the most compelling competitive edge: the ability to link the data from the different sources to create a unified view, and to run the analytics in almost real time.

    The world has evolved from having limited and controlled information, to having unlimited and open information. Interestingly both outcomes are equivalent considering that most of big data (99%) is dirty data. In many applications (i.e., engineering) the signal-to-noise ratio (SNR) is a good metric to measure performance and assess quality. Simply put, we want to enhance the signal (the numerator) and reduce the noise (the denominator) to achieve a high SNR. The SNR can be viewed as a representative measure of good-to-bad data, with the signal representing the good data and the noise the bad data. The two extreme cases of very small limited (close to zero) good data and the case of very large unlimited (close to infinity) bad data lead to a SNR close to zero with little or no real business value.

    Without the proper tools and the know-how to separate the good data from the bad data, having very limited (close to zero) data or unlimited (close to infinity) data both provide little or no intelligent information.

    What industry needs are the tools and the know-how to mine the unlimited information, connect the dots, and do it almost instantaneously. Those able to do so will gain the competitive edge, maintain the industry superiority, and eventually create a monopoly. A familiar example is Google. A less publicly known company called Palantir ( www.palantir.com ) has for years specialized in creating a unified view leveraging data from different data sources. It is used mainly by government agencies. The company went public in September 2020.

    Throughout this book we will address the fundamental question described in Fig. 1.1: how to turn big data into intelligent data, to extract the actionable insights, and to do it fast. Information here is used in general terms to convey data, to share an opinion, describe an event or an observation. It is not a statement of intelligence. We will later explain the distinction.

    ../images/513481_1_En_1_Chapter/513481_1_En_1_Fig1_HTML.png

    Fig. 1.1

    The business value proposition

    Information does not necessarily imply intelligent information!

    Questions raised earlier about Twitter’s business value proposition apply to other social media channels such as Facebook, Instagram, and LinkedIn. What is their business value proposition? Is it the platform or is it the data? The data represent a major part of these businesses and their valuations.

    Data is a dirty business, and content is king. But how to monetize data? What data and what content are we competing and paying for?

    1.3 The Age of Big Data

    Welcome to the age of big data. Big data has been described as the new oil, the new gold rush, and at times compared to the advent of the Internet revolution. While to some extent these analogies are correct there is a level of exaggeration, coupled with the lack of historical context, and a hype associated with the rush to capture the market opportunities. To begin we need to set the record straight on what big data is and is not and put matters in the right perspectives.

    Many private and national research labs have for years been working with what we now call big data. Drawing from my own experience, examples can be cited from research work in high energy physics at places such as Fermi National Accelerator Lab (FNAL) in Batavia, Illinois, and the European Centre National des Recherches Nucléaires (CERN) located on the border between Switzerland and France. Experiments conducted at these particle physics accelerator labs have the mission to search for new particles and confirm (or reject) integrity of established theories (i.e., the standard theory in physics). The amount of data collected from experiments conducted at these labs easily range in the hundreds of petabytes (2⁵⁰ bytes) per year. It can sometimes take more than a year to analyze the data using high performance computing servers to detect any traces of a new particle. It is like looking for a needle in a haystack. Many other national labs in the USA (Sandia, Lawrence Livermore, Argonne, Jet Propulsion Lab, etc.) and globally (CERN) have also for years been working with big data. Similar experiments, large data collection and analysis are conducted in astronomy in the study of cosmos, and in searching for signs of extraterrestrial life. In non-government private industries, Walmart, for example, has been in the business of collecting and analyzing big data for years looking at transactional purchases. The financial industry has been working with big data analyzing years of historical stock tick data collected sometimes in the microsecond time interval range. One can quickly appreciate the large amount of data collected. In summary it is safe to say that big data is not a new phenomenon and has been around for many years.

    So, what is new? Like many knowledge and technology transfer between research labs and industry, what makes big data new is its democratization and its commercialization. Its wide adoption has been facilitated by the rapid advances in technology making it cheaper and easier to generate, collect, and analyze data.

    Big data is not a new phenomenon. What is new is the democratization, the commercialization, and the wide adoption of big data.

    As depicted in Fig. 1.2 data is now generated by many sources like social networks, mobile devices, and smart sensor technologies used in IoT (Internet of Things). Quite often the data collected is made available freely for everyone to view and to analyze. The real value resides not in the data itself, but in the intelligent information extracted from the

    Enjoying the preview?
    Page 1 of 1