Navigating Big Data Analytics: Strategies for the Quality Systems Analyst
()
About this ebook
In this book, William Mawby examines the claims of big data analysis in detail. Using examples to illustrate potential problems that may lead to inefficient and inaccurate results, Mawby helps practitioners avoid potential pitfalls and offers application methods to incorporate big data analytics into your company that will enhance your analytic efforts.
William D. Mawby, Ph.D. has extensive consulting, teaching, and project experience and has taught more than 200 courses on many subjects in statistics and mathematics. He is currently writing, teaching courses on climate change and big data, and volunteering at the American Association for the Advancement of Science and the Union of Concerned Scientists.
William D. Mawby
William D. Mawby, Ph.D. has extensive consulting, teaching, and project experience and has taught more than 200 courses on many subjects in statistics and mathematics. He is currently writing, teaching courses on climate change and big data, and volunteering at the American Association for the Advancement of Science and the Union of Concerned Scientists.
Related to Navigating Big Data Analytics
Related ebooks
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications Rating: 4 out of 5 stars4/5Python for Data Analytics Rating: 0 out of 5 stars0 ratingsPYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners) Rating: 0 out of 5 stars0 ratingsHadoop Big Data Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series Rating: 0 out of 5 stars0 ratingsData-Driven Business Strategies: Understanding and Harnessing the Power of Big Data Rating: 0 out of 5 stars0 ratingsPYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course) Rating: 0 out of 5 stars0 ratingsInformation Management: Strategies for Gaining a Competitive Advantage with Data Rating: 0 out of 5 stars0 ratingsData Analytics with Python: Data Analytics in Python Using Pandas Rating: 3 out of 5 stars3/5Big Data for Beginners: Data at Scale. Harnessing the Potential of Big Data Analytics Rating: 0 out of 5 stars0 ratingsBuilding Big Data Applications Rating: 0 out of 5 stars0 ratingsPYTHON DATA SCIENCE: Harnessing the Power of Python for Comprehensive Data Analysis and Visualization (2023 Guide for Beginners) Rating: 0 out of 5 stars0 ratingsBig Data: Opportunities and challenges Rating: 0 out of 5 stars0 ratingsAll About Data Science: Learn Data Science from scratch Rating: 0 out of 5 stars0 ratingsBig Data: Unleashing the Power of Data to Transform Industries and Drive Innovation Rating: 0 out of 5 stars0 ratingsWhat Is Data Analytics? A Complete Guide For Beginners Rating: 0 out of 5 stars0 ratingsData Mining: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsPractical DataOps: Delivering Agile Data Science at Scale Rating: 0 out of 5 stars0 ratingsBig Data: Statistics, Data Mining, Analytics, And Pattern Learning Rating: 0 out of 5 stars0 ratingsData Science Career Guide Interview Preparation Rating: 0 out of 5 stars0 ratingsPYTHON FOR DATA ANALYSIS: A Practical Guide to Manipulating, Cleaning, and Analyzing Data Using Python (2023 Beginner Crash Course) Rating: 0 out of 5 stars0 ratingsBig Data: Understanding How Data Powers Big Business Rating: 2 out of 5 stars2/5Big Data Analytics for Beginners Rating: 0 out of 5 stars0 ratingsData Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Big Data Tips 1-2-3 Rating: 0 out of 5 stars0 ratingsSpreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsFull Value of Data: Unlocking the Power and Potential of Big Data to Drive Business Growth. Part 1 Rating: 0 out of 5 stars0 ratingsAnalytics and Big Data for Accountants Rating: 0 out of 5 stars0 ratingsData Rating: 0 out of 5 stars0 ratings
Computers For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsElon Musk Rating: 4 out of 5 stars4/5The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet Rating: 4 out of 5 stars4/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsThe ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsThe Best Hacking Tricks for Beginners Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Practical Lock Picking: A Physical Penetration Tester's Training Guide Rating: 5 out of 5 stars5/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5The Designer's Web Handbook: What You Need to Know to Create for the Web Rating: 0 out of 5 stars0 ratingsLearning the Chess Openings Rating: 5 out of 5 stars5/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5Web Designer's Idea Book, Volume 4: Inspiration from the Best Web Design Trends, Themes and Styles Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsRemote/WebCam Notarization : Basic Understanding Rating: 3 out of 5 stars3/5Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands Rating: 5 out of 5 stars5/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5
Reviews for Navigating Big Data Analytics
0 ratings0 reviews
Book preview
Navigating Big Data Analytics - William D. Mawby
1
An Introduction to Big Data Analytics
Big data analytics is defined as the use of algorithms on large data sets to drive decisions that are of value to a company or organization.² Often the power of a big data analytics approach is emphasized by describing it as having three V
words: volume, velocity, and variety.
• Volume refers to the sheer number of data points that are captured and stored. The size of the data sets that are collected can run into terabytes of information—or even larger in some cases.
• Velocity implies that the data are collected more frequently than they have been in the past.
• Variety implies that more kinds of data can be collected and used, including textual and graphical information.
We only need to look at videos that are uploaded to social media to understand the allure of using non-numeric data. The potential of using this kind of data has a rich appeal. Once these vast repositories of data are built, then the promise is that we can mine them, automatically, to detect patterns that can drive decisions to lend value to a company’s activities. The applications of big data analytics run the gamut from customer management through product development through supply chain management.
Consider, for example, the kinds of applications to which big data approaches can be applied to advantage.³
• The Bank of England is reported to have instituted a big data approach toward the integration of various macroeconomics and microeconomics data sets to which it has access.
• General Electric has invested a lot of effort into creating systems that are efficient at analyzing sensory data so they can integrate production control.
• Xiaomi, a Chinese telephone company, has reportedly used big data to determine the right marketing strategies for its business.
Indeed, organizations that have access to substantial data are trying, in some fashion, to leverage this information to their advantage through big data approaches.
It is also possible to gain an understanding of the scope and size of these big data and data sets by looking at some examples online. Readers can access some typical public data sets that have proved to be useful in this arena.⁴ Of course, most business data sets are proprietary and confidential and only accessible to those who are employed by the same companies. In this book, we will depend primarily on artificially constructed data sets in order to focus on the essentials of the problem with big data analytics to prevent us from becoming mired in the details that might be associated with other applications.
For example, the Modified National Institute of Standards and Technology (MNIST) database contains more than 60,000 examples of handwritten digits that can be used in an analysis. Internet Movie Database (IMDb) reviews can provide around 50,000 text-based movie reviews. These examples clearly show how the variety and volume of these different big data sets can be dramatic. The same features that provide big data analysis with some of its most unique applications can also make it impossible to show all the issues that are involved with such efforts.
Many purveyors of big data analysis go even further in their claims by arguing that traditional statistical analyses are likely to be inadequate when applied to very large data sets. They argue that those inadequacies necessitate the development of new data analysis approaches.⁵ Most of these new analytic approaches are computationally intensive and extremely flexible in the ways you can use them to interrogate the data. The application of these new methodologies to uniquely large data sets often is accomplished through the activities of a data scientist whose skill set seems to be a combination of statistics and computer science. Job growth in the area of data science has increased in the last few decades, becoming one of the most highly sought-after positions. All this evidence seems to support the conclusion that big data is becoming essential to the operations of any modern company. It is easy to believe that solutions will appear, as if by magic, once the genie of big data is unleashed.
Deep Learning
At the leading edge of this push to leverage big data is the development of the new field of deep learning.⁶ Deep learning is a direct attempt to replace human cognition with a computer⁷ that usually relies on using a multilayered neural network to mimic the human brain’s complex structure of synaptic connections. Although deep learning seems to be making some progress, it is nowhere near its ultimate objective to achieve strong artificial intelligence that will replace humans. The dream of artificial intelligence seems to be a world in which the human analytics practitioners have nothing to do but slowly sip their lattes while the algorithm solves all of their problems.
This book aims to address the legitimacy of the claim that big data supporters make: large data sets will be sufficient to accomplish a company’s objectives. We will take a deep dive into the issues that are involved with these approaches and attempt to delineate some apparent boundaries of the big data approach. By providing detailed examples of challenges that can occur commonly in real applications of data analysis, we will belie the conclusion that simply having large data sets will ever be sufficient to replace the human analyst.
When to Use This Technology
Interest in big data has certainly not gone unnoticed by the analysts who are employed in business and industry for the twin purposes of quality and productivity. There is little doubt that most companies are trying hard to find ways to milk this promising new source of information. Anything that can be used to help in solving process problems and improving performance is always of vital interest to these sorts of professionals. Many times, however, it is not clear how to use these new techniques to gain the most value. While not an idle concern, since the speed of modern industry continues to challenge most departments, it is no wonder that many quality practitioners are tempted to think big data analysis is the answer to their prayers. It seems too good to be true that you could get so much out of so little effort. But is this a justified belief? Perhaps things are being over-marketed to some extent, and the best course is to practice caution in adopting these new approaches.
It should be made clear from the outset that this book is not trying to dispute that the use of digital computers has transformed our world in all sorts of ways. This assertion is supported by the many valuable computer algorithms that are being employed today for the purposes of selling tickets, managing sports teams, helping people find the perfect mate, and many other activities. Except for the occasional Luddite who feels that the world is spinning out of control, most people would agree that computerization makes things better. It would be the rare analytics practitioner who would be willingly to give up his or her computer. Most people are after the newest and fastest computer available, but does this practical advantage also provide evidence that is strong enough to lend credence to the extravagant claims of big data? Or could there be some instances or situations in which the naïve big data approach would not only fail to replace the human expert driven analysis, but actually could lead to subpar performance? This is an important and timely question for practitioners as they seek to forge a pathway into the future. Making the wrong decision can affect a person’s analytic potential for a long time. Quality experts want to get ahead, not fall behind, in their never-ending quest for continuous improvement. The task we have in this text is to demonstrate that it is, indeed, the case that something more than just data must be used to get satisfactory results in many instances.
We are also not trying to argue that more and better data cannot be useful. Collecting more data and using them in a more automated fashion are lynchpins in the new Industry 4.0 and Quality 4.0 initiatives promoted by the American Society for Quality (ASQ) and others.⁸ There is a clear benefit to be gained if we can collect pertinent data, collate them, and use them well without using up too many valuable resources. This book verifies the potential value of this approach, and, in addition, shows that understanding these data sources can be critical to obtaining their full value for the quality practitioner. Just as we need to perform due diligence while assessing and maintaining the quality of the data that are used for analysis, we also need to understand the more intimate features of the data that are caused by the details of collection and manipulation. There are many challenges that can arise when data sets become larger that must be countered to make real progress. It is the objective of this book to warn quality managers and practitioners against the naïve view that more data, by themselves, are sufficient for success. It should probably come as no surprise to veterans in this field that it is critical for human expertise to be integrated into the analysis process to be successful, even in the largest big data endeavors.
Defining the Problem
The fundamental question is whether big data, by itself, can lead to analyses that are equal, or even superior, to those made by a human analyst. Humans were able to solve complicated problems long before computers existed, so computers are not absolutely essential to problem-solving. As one example, the invention of the general-purpose digital computer itself did not require the assistance of computers. On the other hand, computers can speed up the analysis process. Even common household budgeting tasks would take orders of magnitude more time if they were done without the aid of computers. One could certainly argue that some tasks, simply because of their complexity, would not even be attempted if computers were not available to assist humans. However, it is not the practical advantages of computers that are of interest here, but rather the issue of whether big data is intrinsically equivalent to good human-based analysis. There could be some kind of technical threshold that, once passed, will enable big data alone to match the best efforts of human analysis.⁹
If and when the computer is able to produce results that equal those coming from human minds, we can also examine the interesting question of whether computers can go even further to outstrip us completely. But that is not a question that is considered in this text. Rather, we will stick with the (apparently) simpler question as to whether big data approaches can even match the results of the typical human analyst. We will seek to show that overreliance on big data can actually lead to poorer conclusions than those that can be reached by a typical human analyst. We will seek to demonstrate that there are serious limitations to what can be achieved through the big data approach, and there is good reason to believe there will be a vital role for the human analyst well into the foreseeable future.
A Note About Technology
The issues presented in this book can be contentious. Perhaps, as is the case with many other prickly areas of human discourse, the major problems may be resolved with a clear definition of the terms of the argument.