Ebook566 pages5 hours

Statistics for Big Data For Dummies

Name: Statistics for Big Data For Dummies
Author: Alan Anderson
ISBN: 9781118940020

By Alan Anderson and David Semmelroth

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The fast and easy way to make sense of statistics for big data

Does the subject of data analysis make you dizzy? You've come to the right place! Statistics For Big Data For Dummies breaks this often-overwhelming subject down into easily digestible parts, offering new and aspiring data analysts the foundation they need to be successful in the field. Inside, you'll find an easy-to-follow introduction to exploratory data analysis, the lowdown on collecting, cleaning, and organizing data, everything you need to know about interpreting data using common software and programming languages, plain-English explanations of how to make sense of data in the real world, and much more.

Data has never been easier to come by, and the tools students and professionals need to enter the world of big data are based on applied statistics. While the word "statistics" alone can evoke feelings of anxiety in even the most confident student or professional, it doesn't have to. Written in the familiar and friendly tone that has defined the For Dummies brand for more than twenty years, Statistics For Big Data For Dummies takes the intimidation out of the subject, offering clear explanations and tons of step-by-step instruction to help you make sense of data mining—without losing your cool.

Helps you to identify valid, useful, and understandable patterns in data
Provides guidance on extracting previously unknown information from large databases
Shows you how to discover patterns available in big data
Gives you access to the latest tools and techniques for working in big data

If you're a student enrolled in a related Applied Statistics course or a professional looking to expand your skillset, Statistics For Big Data For Dummies gives you access to everything you need to succeed.

Skip carousel

Computers

LanguageEnglish

PublisherWiley

Release dateAug 11, 2015

ISBN9781118940020

Author

Alan Anderson

Allen and Linda Anderson are speakers and authors of a series of twelve books about the spiritual relationships between people and animals. Their mission is to help people discover and benefit from the miraculous powers of animals. In 1996 they co-founded the Angel Animals Network to increase love and respect for all life through the power of story. In 2004 Allen and Linda Anderson were recipients of a Certificate of Commendation from Governor Tim Pawlenty in recognition of their contributions as authors in the state of Minnesota. In 2007 their book Rescued: Saving Animals from Disaster won the American Society of Journalists and Authors Outstanding Book award. Allen and Linda's work has been featured on NPR, the Washington Post, USA Today, NBC's Today show, The Montel Williams Show, ABC Nightly News, Cat Fancy, Dog Fancy, national wire services, London Sunday Times, BBC Radio, Beliefnet, ivillage, Guideposts, and other national, regional, and international media and news outlets. The Andersons both teach writing at The Loft Literary Center in Minneapolis. They share their home with a dog, two cats, and a cockatiel. They donate a portion of revenue from their projects to animal shelters and animal-welfare organizations and speak at fundraisers. You are welcome to visit Allen and Linda's website at www.angelanimals.net and send them stories and letters about your experiences with animals. At the website you may enter new contests for upcoming books and request a subscription to the free email newsletter, Angel Animals Story of the Week, featuring an inspiring story each week.

Related to Statistics for Big Data For Dummies

Related ebooks

Skip carousel

Python for Data Science For Dummies
Ebook
Python for Data Science For Dummies
byJohn Paul Mueller
Rating: 0 out of 5 stars
0 ratings
Data Mining For Dummies
Ebook
Data Mining For Dummies
byMeta S. Brown
Rating: 4 out of 5 stars
4/5
Statistics All-in-One For Dummies
Ebook
Statistics All-in-One For Dummies
byDeborah J. Rumsey
Rating: 0 out of 5 stars
0 ratings
Predictive Analytics For Dummies
Ebook
Predictive Analytics For Dummies
byAnasse Bari
Rating: 3 out of 5 stars
3/5
Statistics II For Dummies
Ebook
Statistics II For Dummies
byDeborah J. Rumsey
Rating: 3 out of 5 stars
3/5
Business Statistics For Dummies
Ebook
Business Statistics For Dummies
byAlan Anderson
Rating: 5 out of 5 stars
5/5
Data Science Strategy For Dummies
Ebook
Data Science Strategy For Dummies
byUlrika Jägare
Rating: 0 out of 5 stars
0 ratings
R For Dummies
Ebook
R For Dummies
byAndrie de Vries
Rating: 4 out of 5 stars
4/5
SPSS Statistics Workbook For Dummies
Ebook
SPSS Statistics Workbook For Dummies
byJesus Salcedo
Rating: 0 out of 5 stars
0 ratings
Machine Learning For Dummies
Ebook
Machine Learning For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
Data Science Programming All-in-One For Dummies
Ebook
Data Science Programming All-in-One For Dummies
byJohn Paul Mueller
Rating: 0 out of 5 stars
0 ratings
Statistics: 1,001 Practice Problems For Dummies (+ Free Online Practice)
Ebook
Statistics: 1,001 Practice Problems For Dummies (+ Free Online Practice)
byConsumer Dummies
Rating: 3 out of 5 stars
3/5
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
Ebook
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
byMichele Chambers
Rating: 0 out of 5 stars
0 ratings
Statistical Analysis with R For Dummies
Ebook
Statistical Analysis with R For Dummies
byJoseph Schmuller
Rating: 0 out of 5 stars
0 ratings
SPSS Statistics for Dummies
Ebook
SPSS Statistics for Dummies
byKeith McCormick
Rating: 3 out of 5 stars
3/5
TensorFlow For Dummies
Ebook
TensorFlow For Dummies
byMatthew Scarpino
Rating: 0 out of 5 stars
0 ratings
Data Visualization For Dummies
Ebook
Data Visualization For Dummies
byMico Yuk
Rating: 2 out of 5 stars
2/5
Tableau For Dummies
Ebook
Tableau For Dummies
byMolly Monsey
Rating: 4 out of 5 stars
4/5
People Analytics For Dummies
Ebook
People Analytics For Dummies
byMike West
Rating: 5 out of 5 stars
5/5
Data Fluency: Empowering Your Organization with Effective Data Communication
Ebook
Data Fluency: Empowering Your Organization with Effective Data Communication
byZach Gemignani
Rating: 2 out of 5 stars
2/5
Health Analytics: Gaining the Insights to Transform Health Care
Ebook
Health Analytics: Gaining the Insights to Transform Health Care
byJason Burke
Rating: 0 out of 5 stars
0 ratings
Statistics Workbook For Dummies with Online Practice
Ebook
Statistics Workbook For Dummies with Online Practice
byDeborah J. Rumsey
Rating: 0 out of 5 stars
0 ratings
Excel Sales Forecasting For Dummies
Ebook
Excel Sales Forecasting For Dummies
byConrad Carlberg
Rating: 4 out of 5 stars
4/5
U Can: Statistics For Dummies
Ebook
U Can: Statistics For Dummies
byDeborah J. Rumsey
Rating: 3 out of 5 stars
3/5
Microsoft 365 For Dummies
Ebook
Microsoft 365 For Dummies
byJennifer Reed
Rating: 0 out of 5 stars
0 ratings
Data Science For Dummies
Ebook
Data Science For Dummies
byLillian Pierson
Rating: 3 out of 5 stars
3/5
Excel Data Analysis For Dummies
Ebook
Excel Data Analysis For Dummies
byPaul McFedries
Rating: 0 out of 5 stars
0 ratings
Deep Learning For Dummies
Ebook
Deep Learning For Dummies
byJohn Paul Mueller
Rating: 0 out of 5 stars
0 ratings
Big Data For Dummies
Ebook
Big Data For Dummies
byAlan Nugent
Rating: 4 out of 5 stars
4/5
International Finance For Dummies
Ebook
International Finance For Dummies
byAyse Evrensel
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Programming: Using Python
Ebook
Fundamentals of Programming: Using Python
byBruce Embry
Rating: 5 out of 5 stars
5/5
People Skills for Analytical Thinkers
Ebook
People Skills for Analytical Thinkers
byGilbert Eijkelenboom
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
Ebook
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
byChris Mason
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Ebook
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
bySherwyn Allibang
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
Ebook
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
byTommy Swindali
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
The Power Of Your Subconscious By Joseph Murphy | Episode 4 |
Podcast episode
The Power Of Your Subconscious By Joseph Murphy | Episode 4 |
byThe Audiobooks Podcast
0 ratings
0% found this document useful
Episode 8 - 5 Fundraising X Tests To Get Ahead of the Trends
Podcast episode
Episode 8 - 5 Fundraising X Tests To Get Ahead of the Trends
byDynamic Nonprofits
0 ratings
0% found this document useful
Why privacy is about to become the hottest marketing issue: If you read through one piece of research this year it might just be a new report from the University of Pennsylvania on marketers, consumers, and privacy. The research is important not just for its surprising revelations about privacy but also for...
Podcast episode
Why privacy is about to become the hottest marketing issue: If you read through one piece of research this year it might just be a new report from the University of Pennsylvania on marketers, consumers, and privacy. The research is important not just for its surprising revelations about privacy but also for...
byThe Marketing Companion
0 ratings
0% found this document useful
Discovering Hidden Customer Insights in New Places is Critical for Future Success!: Customer Science is a vital part of any organization’s future success with their customer experience. The combination of AI, behavioral sciences, and data is a powerful way to gain insight into your customers behavior so you can provide a proactive...
Podcast episode
Discovering Hidden Customer Insights in New Places is Critical for Future Success!: Customer Science is a vital part of any organization’s future success with their customer experience. The combination of AI, behavioral sciences, and data is a powerful way to gain insight into your customers behavior so you can provide a proactive...
byThe Intuitive Customer - Helping You Improve Your Customer Experience To Gain Growth
0 ratings
0% found this document useful
Budgeting and Forecasting: All your Questions Answered: In this special LinkedIn Live/ FP&A Today, join Paul Barnhurst and our all-star panel to answer your questions for budget season based on their decades of experience. 1) The best piece of advice you give - or have been given for a successful budget...
Podcast episode
Budgeting and Forecasting: All your Questions Answered: In this special LinkedIn Live/ FP&A Today, join Paul Barnhurst and our all-star panel to answer your questions for budget season based on their decades of experience. 1) The best piece of advice you give - or have been given for a successful budget...
byFP&A Today
0 ratings
0% found this document useful
#41 | 5 Steps To Using Your Podcast Analytics & Metrics To Grow An Audience - Deep Dive
Podcast episode
#41 | 5 Steps To Using Your Podcast Analytics & Metrics To Grow An Audience - Deep Dive
byWhy Your Podcast Isn't Growing: A Get More Listeners Podcast For Podcasters
0 ratings
0% found this document useful
CM 066: Cathy O’Neil on the Human Cost of Big Data: Algorithms make millions of decisions about us every day. For example, they determine our insurance premiums, whether we get a mortgage, and how we perform on the job. Yet, what is more alarming is that data scientists also write the code that fires ...
Podcast episode
CM 066: Cathy O’Neil on the Human Cost of Big Data: Algorithms make millions of decisions about us every day. For example, they determine our insurance premiums, whether we get a mortgage, and how we perform on the job. Yet, what is more alarming is that data scientists also write the code that fires ...
byCurious Minds at Work
0 ratings
0% found this document useful
The Single-Mandate Fed Says There Are Too Many Employed People: This episode is sponsored by , and . On Wednesday, the U.S. Federal Reserve raised the benchmark federal funds rate by 75 basis points, the third consecutive hike of that size. The market expected this, but reacted negatively to...
Podcast episode
The Single-Mandate Fed Says There Are Too Many Employed People: This episode is sponsored by , and . On Wednesday, the U.S. Federal Reserve raised the benchmark federal funds rate by 75 basis points, the third consecutive hike of that size. The market expected this, but reacted negatively to...
byThe Breakdown
0 ratings
0% found this document useful
Overcoming Cognitive Bias: How to recognize and avoid the many belief, behavioral, and social biases, which flaw much human decision making.
Podcast episode
Overcoming Cognitive Bias: How to recognize and avoid the many belief, behavioral, and social biases, which flaw much human decision making.
byRocketship.fm
0 ratings
0% found this document useful
Big Data, Data Lakes, and Blockchain with Rahul Pathak, Executive at Amazon Web Services: Everyone knows that data is exploding. What most people don’t realize is the pace and ways in which data is changing our everyday lives. According to , we’re seeing a “roughly 10x increase in data every 5 years, and the types of data that’s...
Podcast episode
Big Data, Data Lakes, and Blockchain with Rahul Pathak, Executive at Amazon Web Services: Everyone knows that data is exploding. What most people don’t realize is the pace and ways in which data is changing our everyday lives. According to , we’re seeing a “roughly 10x increase in data every 5 years, and the types of data that’s...
byMission Daily
0 ratings
0% found this document useful
060: 4 Ways to Use Data to Improve Your Safety Culture: How a Digital Safety Dashboard Can Help You
Podcast episode
060: 4 Ways to Use Data to Improve Your Safety Culture: How a Digital Safety Dashboard Can Help You
byThe Safety Pro Podcast
0 ratings
0% found this document useful
233: From Belarus to Canada ✈ How this Entrepreneur Grew his HuGE Private... Network | Yegor Sak of WindScribe VPN: Yegor Sak is the Founder of Windscribe, which is a privacy company that offers VPN service for everyone. Yegor was born in Belarus but is currently based in Canada. As he was growing up, Yegor developed a passion for programming and creating websites. As...
Podcast episode
233: From Belarus to Canada ✈ How this Entrepreneur Grew his HuGE Private... Network | Yegor Sak of WindScribe VPN: Yegor Sak is the Founder of Windscribe, which is a privacy company that offers VPN service for everyone. Yegor was born in Belarus but is currently based in Canada. As he was growing up, Yegor developed a passion for programming and creating websites. As...
byEntrepreneur Stories 4⃣ Inspiration
0 ratings
0% found this document useful
Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify
Podcast episode
Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify
byData Engineering Podcast
0 ratings
0% found this document useful
#4 Finance Systems Survey with Dan Wells: In April 2020 GrowCFO undertook a major finance systems survey. Dan Wells surveyed the entire GrowCFO membership, so the survey combines the experiences and opinions of several hundred CFOs. The GrowCFO membership includes CFOs from many industries and...
Podcast episode
#4 Finance Systems Survey with Dan Wells: In April 2020 GrowCFO undertook a major finance systems survey. Dan Wells surveyed the entire GrowCFO membership, so the survey combines the experiences and opinions of several hundred CFOs. The GrowCFO membership includes CFOs from many industries and...
byGrowCFO Show
0 ratings
0% found this document useful
Edan Krolewicz - Data for customer acquisition - The light and the dark side of the global data industry - S3 Ep12
Podcast episode
Edan Krolewicz - Data for customer acquisition - The light and the dark side of the global data industry - S3 Ep12
byChampagne Strategy
0 ratings
0% found this document useful
Blind Spots: We talk with Laura Klein author of UX for Lean Startups and Build Better Products
Podcast episode
Blind Spots: We talk with Laura Klein author of UX for Lean Startups and Build Better Products
byRocketship.fm
0 ratings
0% found this document useful
The Best Free Investing Tools
Podcast episode
The Best Free Investing Tools
byMany Happy Returns
0 ratings
0% found this document useful
How to Use Data to Help You Achieve Hyper Growth w/ Tim Campos: Tim Campos and he is former CIO of Facebook and current CEO of Woven where he is re-imagining how people use their calendars so they spend more time on what really matters. He's also an expert in understanding and leveraging data to enhance a...
Podcast episode
How to Use Data to Help You Achieve Hyper Growth w/ Tim Campos: Tim Campos and he is former CIO of Facebook and current CEO of Woven where he is re-imagining how people use their calendars so they spend more time on what really matters. He's also an expert in understanding and leveraging data to enhance a...
byGrowth Experts with Dennis Brown
0 ratings
0% found this document useful
Now at 9.1%, Inflation Just Keeps Going Up: This episode is sponsored by , , and . Nothing is more important to markets right now than how the Fed responds to inflation. On today’s show, NLW discusses the latest consumer price index numbers for June. He argues that they make it likely that...
Podcast episode
Now at 9.1%, Inflation Just Keeps Going Up: This episode is sponsored by , , and . Nothing is more important to markets right now than how the Fed responds to inflation. On today’s show, NLW discusses the latest consumer price index numbers for June. He argues that they make it likely that...
byThe Breakdown
0 ratings
0% found this document useful
Episode 175: Product Trends with Dru Riley: Knowing the newest product trends is pivotal as they can be applied across many industries. But what’s the best way to identify and watch trends? Our guest today is Dru Riley, the founder of Trends.vc. You’ll learn the story behind his famous newsletter, how he creates his reports, and what trends you can apply to your business in 2020 and beyond.
Podcast episode
Episode 175: Product Trends with Dru Riley: Knowing the newest product trends is pivotal as they can be applied across many industries. But what’s the best way to identify and watch trends? Our guest today is Dru Riley, the founder of Trends.vc. You’ll learn the story behind his famous newsletter, how he creates his reports, and what trends you can apply to your business in 2020 and beyond.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Digital Banking and Finances for Couples: Learn how you can stay on top of your money with some of the best apps while also protecting your finances, privacy, and data! Finding the Best Money Apps for Couples Chances are if you’re listening to this podcast you do a lot of your banking and...
Podcast episode
Digital Banking and Finances for Couples: Learn how you can stay on top of your money with some of the best apps while also protecting your finances, privacy, and data! Finding the Best Money Apps for Couples Chances are if you’re listening to this podcast you do a lot of your banking and...
byCouple Money Podcast
0 ratings
0% found this document useful
We're 90% Confident We've Lost All Confidence: All links and images for this episode can be found on CISO Series () I don't think we're doing enough to protect ourselves against cyberattacks and I'm also pretty sure we're clueless as to what our third party vendors are doing. This week’s...
Podcast episode
We're 90% Confident We've Lost All Confidence: All links and images for this episode can be found on CISO Series () I don't think we're doing enough to protect ourselves against cyberattacks and I'm also pretty sure we're clueless as to what our third party vendors are doing. This week’s...
byCISO Series Podcast
0 ratings
0% found this document useful
Donald Farmer, Wayne Eckerson, and Tom Davenport on Data and Analytics Trends to Watch in 2021: This week we have a very special episode featuring insights from three data and analytics leaders on what to expect in 2021. You’ll hear from Donald Farmer, Wayne Eckerson, and Tom Davenport. They discuss everything from how to remain relevant in the rapidly evolving data and analytics industry, what technologies will have the biggest impact on our lives, and what the future of the workplace will look like and what those changes mean for your business. Plus, enjoy the lightning rounds on Super Bowl predictions, snow, and best books to read!
Podcast episode
Donald Farmer, Wayne Eckerson, and Tom Davenport on Data and Analytics Trends to Watch in 2021: This week we have a very special episode featuring insights from three data and analytics leaders on what to expect in 2021. You’ll hear from Donald Farmer, Wayne Eckerson, and Tom Davenport. They discuss everything from how to remain relevant in the rapidly evolving data and analytics industry, what technologies will have the biggest impact on our lives, and what the future of the workplace will look like and what those changes mean for your business. Plus, enjoy the lightning rounds on Super Bowl predictions, snow, and best books to read!
byThe Data Chief
0 ratings
0% found this document useful
Collaboration and Strategy // Vin Vashishta // #176
Podcast episode
Collaboration and Strategy // Vin Vashishta // #176
byMLOps.community
0 ratings
0% found this document useful
Data Driven Product Development: We catch up with Richard White, the CEO of UserVoice, about what it means to be “Data-informed.”
Podcast episode
Data Driven Product Development: We catch up with Richard White, the CEO of UserVoice, about what it means to be “Data-informed.”
byRocketship.fm
0 ratings
0% found this document useful
The Business of Tech Lounge - Wednesday, March 13 2024: In this episode of the Business of Tech Lounge, host Dave Sobel discusses a variety of topics relevant to the tech industry. The episode includes an analysis of a lawsuit involving a law firm and its IT provider, featuring insights from attorney Brad Gross. It delves into IDC data concerning the evolving role of the Chief Information Security Officer (CISO) and examines the allocation of budgets for cloud storage. Listeners will learn about Claude 3 and its ongoing testing phase. The episode also recaps Dave's conversation with Tabitha Scott and teases an upcoming interview with Mike Semel.
Podcast episode
The Business of Tech Lounge - Wednesday, March 13 2024: In this episode of the Business of Tech Lounge, host Dave Sobel discusses a variety of topics relevant to the tech industry. The episode includes an analysis of a lawsuit involving a law firm and its IT provider, featuring insights from attorney Brad Gross. It delves into IDC data concerning the evolving role of the Chief Information Security Officer (CISO) and examines the allocation of budgets for cloud storage. Listeners will learn about Claude 3 and its ongoing testing phase. The episode also recaps Dave's conversation with Tabitha Scott and teases an upcoming interview with Mike Semel.
byBusiness of Tech
0 ratings
0% found this document useful
1116: Exploring The Data Privacy Spotlight Report by Dynata: I invited Jackie Lorch, VP, Global Knowledge Management at Dynata onto the Tech Talks Daily Podcast to talk about the global research around data privacy.
Podcast episode
1116: Exploring The Data Privacy Spotlight Report by Dynata: I invited Jackie Lorch, VP, Global Knowledge Management at Dynata onto the Tech Talks Daily Podcast to talk about the global research around data privacy.
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
A Data-driven approach - KPIs that Work for Amazon Sellers!: Data-driven decision-making is a process that helps you identify the key performance indicators (KPIs) that best describe your business. It also helps you identify the metrics that best reflect the impact of the changes you're making in your business.
Podcast episode
A Data-driven approach - KPIs that Work for Amazon Sellers!: Data-driven decision-making is a process that helps you identify the key performance indicators (KPIs) that best describe your business. It also helps you identify the metrics that best reflect the impact of the changes you're making in your business.
byAmazing FBA Amazon and ECommerce Podcast, for Amazon Private Label Sellers, Shopify, Magento or Woocommerce business owners, and other e-commerce sellers and digital entrepreneurs.
0 ratings
0% found this document useful
Effective Payroll Data Gathering for DEI with Caroline Drake #71
Podcast episode
Effective Payroll Data Gathering for DEI with Caroline Drake #71
byThe Payroll Podcast
0 ratings
0% found this document useful

Skip carousel

The Data-Empowered Organization
Rotman Management
Article
The Data-Empowered Organization
Sep 1, 2022
A FEW YEARS BACK, the media was full of articles about how Big Data would solve a perrennial challenge: gaining valuable customer insights. Today, it is everywhere because of the growth of devices recording data and the connectivity between those dev
6 min read
Privacy Vs Personalisation: A Marketer’s Dilemma Or A Brand-positioning Choice?
NZ Marketing
Article
Privacy Vs Personalisation: A Marketer’s Dilemma Or A Brand-positioning Choice?
Sep 22, 2021
4 min read
“How Do You Launch A Product Without Alienating Or Damaging Your Customers?”
PC Pro Magazine
Article
“How Do You Launch A Product Without Alienating Or Damaging Your Customers?”
Feb 10, 2022
6 min read
52 Super Deals and Discounts for 2020
Kiplinger
Article
52 Super Deals and Discounts for 2020
Jun 5, 2020
Preferred stocks, which are issued largely by banks, insurance companies and utilities, generally pay fixed dividends like bonds but trade like stocks. Stocks in Standard & Poor's U.S. Preferred Stock index pay an average dividend yield of 5.6%. Thou
13 min read
What Does It Take To Win In A DATA-RICH WORLD?
NZBusiness and Management
Article
What Does It Take To Win In A DATA-RICH WORLD?
Jan 16, 2020
3 min read
Charting A New Path for Your Organization: The 4Ps
Rotman Management
Article
Charting A New Path for Your Organization: The 4Ps
Sep 1, 2020
WHERE DO WE GO FROM HERE? That is the question being asked by just about everyone, everywhere. With the onset of the COVID-19 pandemic, business priorities immediately shifted from ‘how will we grow?’ to ‘how will we survive?’ As our medical and gove
6 min read
The Million Dollar Question
The European Business Review
Article
The Million Dollar Question
May 22, 2018
More than a decade ago, the metaphor “data is the new oil” shook the world and left organisations scrambling for ways on how they could translate it into tangible value for their business. While others already reaped and are reaping the benefits emer
2 min read
Why Can't Americans Find Out What Facebook Knows About Them?
The Atlantic
Article
Why Can't Americans Find Out What Facebook Knows About Them?
May 28, 2014
5 min read
11 Sources of Disruption
Rotman Management
Article
11 Sources of Disruption
Jan 1, 2021
You have observed a troubling tendency that often leads to the disruption of business models. Please describe it. All too often, business strategies fail to effectively account for external change in the world. When faced with deep uncertainty, leade
6 min read
Welcome To The Age Of Privacy Nihilism
The Atlantic
Article
Welcome To The Age Of Privacy Nihilism
Aug 23, 2018
10 min read
The Three Types Of Big Data That Matter For Cmos
The European Business Review
Article
The Three Types Of Big Data That Matter For Cmos
May 22, 2018
4 min read
On Data, Value And Consent
Marketing
Article
On Data, Value And Consent
May 15, 2019
5 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
What’s In-store For The Future Of Retail?
NZ Marketing
Article
What’s In-store For The Future Of Retail?
Jun 22, 2022
5 min read
Signals Of Change: how To Evolve For The New Global Reality
Rotman Management
Article
Signals Of Change: how To Evolve For The New Global Reality
May 1, 2022
11 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
LOW CONSUMER TRUST: Should We Be Okay With It?
NZBusiness and Management
Article
LOW CONSUMER TRUST: Should We Be Okay With It?
Jun 24, 2019
Interactions and transactions today are becoming digital. This is especially true of New Zealand, one of the most digitally active nations in one of the world’s most digitally active regions. In the Asia Pacific retail sector alone, consumers are alr
3 min read
The Future Of Media
NZ Marketing
Article
The Future Of Media
Jul 7, 2019
It’s quite confronting being forced to think five years in the future let alone what might not even be around in that time. A quick glimpse into the future is both exciting and terrifying all at once. I think the biggest things that will suffer/impro
7 min read
Finding Your Data
APC
Article
Finding Your Data
Sep 9, 2019
4 min read
The Democratization of Judgment
Rotman Management
Article
The Democratization of Judgment
Jan 1, 2018
8 min read
Building ‘data relationships’ with Kiwis: FROM DATA HATERS TO DATA LOVERS
NZ Marketing
Article
Building ‘data relationships’ with Kiwis: FROM DATA HATERS TO DATA LOVERS
Sep 21, 2022
5 min read
Why Transparency Is Falling Down
Marketing
Article
Why Transparency Is Falling Down
Jun 6, 2018
Marketers the world over are diversifying their media allocation across more channels and touch points to keep pace with the connected consumer. But where is the industry at the moment with trust in data sources and transparency with partners? How mu
3 min read
Power Partner Awards 2023
Inc.
Article
Power Partner Awards 2023
Oct 24, 2023
44 min read
Understanding 'Big Data' and What It Means to Your Business
Entrepreneur
Article
Understanding 'Big Data' and What It Means to Your Business
May 1, 2013
2 min read
Essentially There Many Fundamental Questions
The European Business Review
Article
Essentially There Many Fundamental Questions
May 25, 2021
1 What we are trying to assess? The answer appears to be no: selectors are still interested in an individual’s ability, personality and motivation as well as their integrity and health. Whilst new concepts appear every so often (e.g agility, resilien
1 min read
Digital Souls
New Philosopher
Article
Digital Souls
Sep 6, 2021
5 min read
Consumer Confidential: Equifax And FICO Are Getting Into Bed Together. 'This Should Keep Everyone Up At Night'
Los Angeles Times
Article
Consumer Confidential: Equifax And FICO Are Getting Into Bed Together. 'This Should Keep Everyone Up At Night'
Apr 9, 2019
4 min read
Leadership Forum: Making Digital Transformation A Reality
Rotman Management
Article
Leadership Forum: Making Digital Transformation A Reality
Jan 1, 2018
Glenda Crisp Senior Vice President and Chief Data Officer, TD Bank Group + Connie Bonello Associate Partner, Financial Services, IBM Canada IN MOST OF TODAY’S ORGANIZATIONS, data underpins every transaction, operation and interaction. And yet, the ab
8 min read
Pivoting To First-party Data
NZ Marketing
Article
Pivoting To First-party Data
Jun 9, 2021
5 min read
Back To Business
Business Traveller
Article
Back To Business
Oct 30, 2020
2 min read

Related categories

Skip carousel

Reviews for Statistics for Big Data For Dummies

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Statistics for Big Data For Dummies - Alan Anderson

Introduction

Welcome to Statistics For Big Data For Dummies! Every day, what has come to be known as big data is making its influence felt in our lives. Some of the most useful innovations of the past 20 years have been made possible by the advent of massive data-gathering capabilities combined with rapidly improving computer technology.

For example, of course, we have become accustomed to finding almost any information we need through the Internet. You can locate nearly anything under the sun immediately by using a search engine such as Google or DuckDuckGo. Finding information this way has become so commonplace that Google has slowly become a verb, as in I don’t know where to find that restaurant — I’ll just Google it. Just think how much more efficient our lives have become as a result of search engines. But how does Google work? Google couldn’t exist without the ability to process massive quantities of information at an extremely rapid speed, and its software has to be extremely efficient.

Another area that has changed our lives forever is e-commerce, of which the classic example is Amazon.com. People can buy virtually every product they use in their daily lives online (and have it delivered promptly, too). Often online prices are lower than in traditional brick-and-mortar stores, and the range of choices is wider. Online shopping also lets people find the best available items at the lowest possible prices.

Another huge advantage to online shopping is the ability of the sellers to provide reviews of products and recommendations for future purchases. Reviews from other shoppers can give extremely important information that isn’t available from a simple product description provided by manufacturers. And recommendations for future purchases are a great way for consumers to find new products that they might not otherwise have known about. Recommendations are enabled by one application of big data — the use of highly sophisticated programs that analyze shopping data and identify items that tend to be purchased by the same consumers.

Although online shopping is now second nature for many consumers, the reality is that e-commerce has only come into its own in the last 15–20 years, largely thanks to the rise of big data. A website such as Amazon.com must process quantities of information that would have been unthinkably gigantic just a few years ago, and that processing must be done quickly and efficiently. Thanks to rapidly improving technology, many traditional retailers now also offer the option of making purchases online; failure to do so would put a retailer at a huge competitive disadvantage.

In addition to search engines and e-commerce, big data is making a major impact in a surprising number of other areas that affect our daily lives:

Social media

Online auction sites

Insurance

Healthcare

Energy

Political polling

Weather forecasting

Education

Travel

Finance

About This Book

This book is intended as an overview of the field of big data, with a focus on the statistical methods used. It also provides a look at several key applications of big data. Big data is a broad topic; it includes quantitative subjects such as math, statistics, computer science, and data science. Big data also covers many applications, such as weather forecasting, financial modeling, political polling methods, and so forth.

Our intentions for this book specifically include the following:

Provide an overview of the field of big data.

Introduce many useful applications of big data.

Show how data may be organized and checked for bad or missing information.

Show how to handle outliers in a dataset.

Explain how to identify assumptions that are made when analyzing data.

Provide a detailed explanation of how data may be analyzed with graphical techniques.

Cover several key univariate (involving only one variable) statistical techniques for analyzing data.

Explain widely used multivariate (involving more than one variable) statistical techniques.

Provide an overview of modeling techniques such as regression analysis.

Explain the techniques that are commonly used to analyze time series data.

Cover techniques used to forecast the future values of a dataset.

Provide a brief overview of software packages and how they can be used to analyze statistical data.

Because this is a For Dummies book, the chapters are written so you can pick and choose whichever topics that interest you the most and dive right in. There’s no need to read the chapters in sequential order, although you certainly could. We do suggest, though, that you make sure you’re comfortable with the ideas developed in Chapters 4 and 5 before proceeding to the later chapters in the book. Each chapter also contains several tips, reminders, and other tidbits, and in several cases there are links to websites you can use to further pursue the subject. There’s also an online Cheat Sheet that includes a summary of key equations for ease of reference.

As mentioned, this is a big topic and a fairly new field. Space constraints make possible only an introduction to the statistical concepts that underlie big data. But we hope it is enough to get you started in the right direction.

Foolish Assumptions

We make some assumptions about you, the reader. Hopefully, one of the following descriptions fits you:

You’ve heard about big data and would like to learn more about it.

You’d like to use big data in an application but don’t have sufficient background in statistical modeling.

You don’t know how to implement statistical models in a software package.

Possibly all of these are true. This book should give you a good starting point for advancing your interest in this field. Clearly, you are already motivated.

This book does not assume any particularly advanced knowledge of mathematics and statistics. The ideas are developed from fairly mundane mathematical operations. But it may, in many places, require you to take a deep breath and not get intimidated by the formulas.

Icons Used in This Book

Throughout the book, we include several icons designed to point out specific kinds of information. Keep an eye out for them:

tip A Tip points out especially helpful or practical information about a topic. It may be hard-won advice on the best way to do something or a useful insight that may not have been obvious at first glance.

warning A Warning is used when information must be treated carefully. These icons point out potential problems or trouble you may encounter. They also highlight mistaken assumptions that could lead to difficulties.

technicalstuff Technical Stuff points out stuff that may be interesting if you’re really curious about something, but which is not essential. You can safely skip these if you’re in a hurry or just looking for the basics.

remember Remember is used to indicate stuff that may have been previously encountered in the book or that you will do well to stash somewhere in your memory for future benefit.

Beyond the Book

Besides the pages or pixels you’re presently perusing, this book comes with even more goodies online. You can check out the Cheat Sheet at www.dummies.com/cheatsheet/statisticsforbigdata.

We’ve also written some additional material that wouldn’t quite fit in the book. If this book were a DVD, these would be on the Bonus Content disc. This handful of extra articles on various mini-topics related to big data is available at www.dummies.com/extras/statisticsforbigdata.

Where to Go From Here

You can approach this book from several different angles. You can, of course, start with Chapter 1 and read straight through to the end. But you may not have time for that, or maybe you are already familiar with some of the basics. We suggest checking out the table of contents to see a map of what’s covered in the book and then flipping to any particular chapter that catches your eye. Or if you’ve got a specific big data issue or topic you’re burning to know more about, try looking it up in the index.

Once you’re done with the book, you can further your big data adventure (where else?) on the Internet. Instructional videos are available on websites such as YouTube. Online courses, many of them free, are also becoming available. Some are produced by private companies such as Coursera; others are offered by major universities such as Yale and M.I.T. Of course, many new books are being written in the field of big data due to its increasing importance.

If you’re even more ambitious, you will find specialized courses at the college undergraduate and graduate levels in subject areas such as statistics, computer science, information technology, and so forth. In order to satisfy the expected future demand for big data specialists, several schools are now offering a concentration or a full degree in Data Science.

The resources are there; you should be able to take yourself as far as you want to go in the field of big data. Good luck!

Part I

Introducing Big Data Statistics

webextra Visit www.dummies.com for Great Dummies content online.

In this part …

check.png Introducing big data and stuff it’s used for

check.png Exploring the three Vs of big data

check.png Checking out the hot big data applications

check.png Discovering probabilities and other basic statistical idea

Chapter 1 What Is Big Data and What Do You Do with It?

In This Chapter

arrow Understanding what big data is all about

arrow Seeing how data may be analyzed using Exploratory Data Analysis (EDA)

arrow Gaining insight into some of the key statistical techniques used to analyze big data

Big data refers to sets of data that are far too massive to be handled with traditional hardware. Big data is also problematic for software such as database systems, statistical packages, and so forth. In recent years, data-gathering capabilities have experienced explosive growth, so that storing and analyzing the resulting data has become progressively more challenging.

Many fields have been affected by the increasing availability of data, including finance, marketing, and e-commerce. Big data has also revolutionized more traditional fields such as law and medicine. Of course, big data is gathered on a massive scale by search engines such as Google and social media sites such as Facebook. These developments have led to the evolution of an entirely new profession: the data scientist, someone who can combine the fields of statistics, math, computer science, and engineering with knowledge of a specific application.

This chapter introduces several key concepts that are discussed throughout the book. These include the characteristics of big data, applications of big data, key statistical tools for analyzing big data, and forecasting techniques.

Characteristics of Big Data

The three factors that distinguish big data from other types of data are volume, velocity, and variety.

Clearly, with big data, the volume is massive. In fact, new terminology must be used to describe the size of these datasets. For example, one petabyte of data consists of bytes of data. That’s 1,000 trillion bytes!

tip A byte is a single unit of storage in a computer’s memory. A byte is used to represent a single number, character, or symbol. A byte consists of eight bits, each consisting of either a 0 or a 1.

Velocity refers to the speed at which data is gathered. Big datasets consist of data that’s continuously gathered at very high speeds. For example, it has been estimated that Twitter users generate more than a quarter of a million tweets every minute. This requires a massive amount of storage space as well as real-time processing of the data.

Variety refers to the fact that the contents of a big dataset may consist of a number of different formats, including spreadsheets, videos, music clips, email messages, and so on. Storing a huge quantity of these incompatible types is one of the major challenges of big data.

Chapter 2 covers these characteristics in more detail.

Exploratory Data Analysis (EDA)

Before you apply statistical techniques to a dataset, it’s important to examine the data to understand its basic properties. You can use a series of techniques that are collectively known as Exploratory Data Analysis (EDA) to analyze a dataset. EDA helps ensure that you choose the correct statistical techniques to analyze and forecast the data. The two basic types of EDA techniques are graphical techniques and quantitative techniques.

Graphical EDA techniques

Graphical EDA techniques show the key properties of a dataset in a convenient format. It’s often easier to understand the properties of a variable and the relationships between variables by looking at graphs rather than looking at the raw data. You can use several graphical techniques, depending on the type of data being analyzed. Chapters 11 and 12 explain how to create and use the following:

Box plots

Histograms

Normal probability plots

Scatter plots

Quantitative EDA techniques

Quantitative EDA techniques provide a more rigorous method of determining the key properties of a dataset. Two of the most important of these techniques are

Interval estimation (discussed in Chapter 11).

Hypothesis testing (introduced in Chapter 5).

Interval estimates are used to create a range of values within which a variable is likely to fall. Hypothesis testing is used to test various propositions about a dataset, such as

The mean value of the dataset.

The standard deviation of the dataset.

The probability distribution the dataset follows.

Hypothesis testing is a core technique in statistics and is used throughout the chapters in Part III of this book.

Statistical Analysis of Big Data

Gathering and storing massive quantities of data is a major challenge, but ultimately the biggest and most important challenge of big data is putting it to good use.

For example, a massive quantity of data can be helpful to a company’s marketing research department only if it can identify the key drivers of the demand for the company’s products. Political polling firms have access to massive amounts of demographic data about voters; this information must be analyzed intensively to find the key factors that can lead to a successful political campaign. A hedge fund can develop trading strategies from massive quantities of financial data by finding obscure patterns in the data that can be turned into profitable strategies.

Many statistical techniques can be used to analyze data to find useful patterns:

Probability distributions are introduced in Chapter 4 and explored at greater length in Chapter 13.

Regression analysis is the main topic of Chapter 15.

Time series analysis is the primary focus of Chapter 16.

Forecasting techniques are discussed in Chapter 17.

Probability distributions

You use a probability distribution to compute the probabilities associated with the elements of a dataset. The following distributions are described and applied in this book:

Binomial distribution: You would use the binomial distribution to analyze variables that can assume only one of two values. For example, you could determine the probability that a given percentage of members at a sports club are left-handed. See Chapter 4 for details.

Poisson distribution: You would use the Poisson distribution to describe the likelihood of a given number of events occurring over an interval of time. For example, it could be used to describe the probability of a specified number of hits on a website over the coming hour. See Chapter 13 for details.

Normal distribution: The normal distribution is the most widely used probability distribution in most disciplines, including economics, finance, marketing, biology, psychology, and many others. One of the characteristic features of the normal distribution is symmetry — the probability of a variable being a given distance below the mean of the distribution equals the probability of it being the same distance above the mean. For example, if the mean height of all men in the United States is 70 inches, and heights are normally distributed, a randomly chosen man is equally likely to be between 68 and 70 inches tall as he is to be between 70 and 72 inches tall. See Chapter 4 and the chapters in Parts III and IV for details.

The normal distribution works well with many applications. For example, it’s often used in the field of finance to describe the returns to financial assets. Due to its ease of interpretation and implementation, the normal distribution is sometimes used even when the assumption of normality is only approximately correct.

The Student’s t-distribution: The Student’s t-distribution is similar to the normal distribution, but with the Student’s t-distribution, extremely small or extremely large values are much more likely to occur. This distribution is often used in situations where a variable exhibits too much variation to be consistent with the normal distribution. This is true when the properties of small samples are being analyzed. With small samples, the variation among samples is likely to be quite considerable, so the normal distribution shouldn’t be used to describe their properties. See Chapter 13 for details.

Note: The Student’s t-distribution was developed by W.S. Gosset while employed at the Guinness brewing company. He was attempting to describe the properties of small sample means.

The chi-square distribution: The chi-square distribution is appropriate for several types of applications. For example, you can use it to determine whether a population follows a particular probability distribution. You can also use it to test whether the variance of a population equals a specified value, and to test for the independence of two datasets. See Chapter 13 for details.

The F-distribution: The F-distribution is derived from the chi-square distribution. You use it to test whether the variances of two populations equal each other. The F-distribution is also useful in applications such as regression analysis (covered next). See Chapter 14 for details.

Regression analysis

Regression analysis is used to estimate the strength and direction of the relationship between variables that are linearly related to each other. Chapter 15 discusses this topic at length.

tip Two variables X and Y are said to be linearly related if the relationship between them can be written in the form

where

m is the slope, or the change in Y due to a given change in X

b is the intercept, or the value of Y when X = 0

As an example of regression analysis, suppose a corporation wants to determine whether its advertising expenditures are actually increasing profits, and if so, by how much. The corporation gathers data on advertising and profits for the past 20 years and uses this data to estimate the following equation:

where

Y represents the annual profits of the corporation (in millions of dollars).

X represents the annual advertising expenditures of the corporation (in millions of dollars).

In this equation, the slope equals 0.25, and the intercept equals 50. Because the slope of the regression line is 0.25, this indicates that on average, for every $1 million increase in advertising expenditures, profits rise by $.25 million, or $250,000. Because the intercept is 50, this indicates that with no advertising, profits would still be $50 million.

This equation, therefore, can be used to forecast future profits based on planned advertising expenditures. For example, if the corporation plans on spending $10 million on advertising next year, its expected profits will be as follows:

Hence, with an advertising budget of $10 million next year, profits are expected to be $52.5 million.

Time series analysis

A time series is a set of observations of a single variable collected over time. This topic is talked about at length in Chapter 16. The following are examples of time series:

The daily price of Apple stock over the past ten years.

The value of the Dow Jones Industrial Average at the end of each year for the past 20 years.

The daily price of gold over the past six months.

With time series analysis, you can use the statistical properties of a time series to predict the future values of a variable. There are many types of models that may be developed to explain and predict the behavior of a time series.

One place where time series analysis is used frequently is on Wall Street. Some analysts attempt to forecast the future value of an asset price, such as a stock, based entirely on the history of that stock’s price. This is known as technical analysis. Technical analysts do not attempt to use other variables to forecast a stock’s price — the only information they use is the stock’s own history.

tip Technical analysis can work only if there are inefficiencies in the market. Otherwise, all information about a stock’s history should already be reflected in its price, making technical trading strategies unprofitable.

Forecasting techniques

Many different techniques have been designed to forecast the future value of a variable. Two of these are time series regression models (Chapter 16) and simulation models (Chapter 17).

Time series regression models

A time series regression model is used to estimate the trend followed by a variable over time, using regression techniques. A trend line shows the direction in which a variable is moving as time elapses.

As an example, Figure 1-1 shows a time series that represents the annual output of a gold mine (measured in thousands of ounces per year) since the mine opened ten years ago.

Figure 1-1: A time series showing gold output per year for the past ten years.

The equation of the trend line is estimated to be

where

X is the year.

Y is the annual production of gold (measured in thousands of ounces).

This trend line is estimated using regression analysis. The trend line shows that on average, the output of the mine grows by 0.9212 thousand (921.2 ounces) each year.

You could use this trend line to predict the output next year (the 11th year of operation) by substituting 11 for X, as follows:

Based on the trend line equation, the mine would be expected to produce 11,466.5 ounces of gold next year.

Simulation models

You can use simulation models to forecast a time series. Simulation models are extremely flexible but can be extremely time-consuming to implement. Their accuracy also depends on assumptions being made about the time series data’s statistical properties.

Two standard approaches to forecasting financial time series with simulation models are historical simulation and Monte Carlo simulation.

Historical simulation

Historical simulation is a technique used to generate a probability distribution for a variable as it evolves over time, based on its past values. If the properties of the variable being simulated remain stable over time, this technique can be highly accurate. One drawback to this approach is that in order to get an accurate prediction, you need to have a lot of data. It also depends on the assumption that a variable’s past behavior will continue into the future.

As an example, Figure 1-2 shows a histogram that represents the returns to a stock over the past 100 days.

Figure 1-2: A histogram of stock returns.

This histogram shows the probability distribution of returns on the stock based on the past 100 trading days. The graph shows that the most frequent return over the past 100 days was a loss of 2 percent, the second most frequent was a loss of 3 percent, and so on. You can use the information contained within this graph to create a probability distribution for the most likely return on this stock over the coming trading day.

Monte Carlo simulation

Monte Carlo simulation is a technique in which random numbers are substituted into a statistical model in order to forecast the future values of a variable. This methodology is used in many different disciplines, including finance, economics, and the hard sciences, such as physics. Monte Carlo simulation can work very well but can also be extremely time-consuming to implement. Also, its accuracy depends on the statistical model being used to describe the behavior of the time series.

As you can see, we’ve got a lot to cover in this book. But don’t worry, we take it step by step. In Part I, we look at what big data is. We also build a statistical toolkit that we carry with us throughout the rest of the book. Part II focuses on the (extremely important) process of preparing data for the application of the techniques just described. Then we get to the good stuff in Parts III and IV. Though the equations can appear a little intimidating at times, we have labored to include examples in every chapter that make the ideas a little more accessible. So, take a deep breath and get ready to begin your exploration of big data!

Chapter 2 Characteristics of Big Data: The Three Vs

In This Chapter

arrow Understanding the characteristics of big data and how it can be classified

arrow Checking out the features of the latest methods for storing and analyzing big data

The phrase big data refers to datasets (collections of data) that are too massive for traditional database management systems (DBMS) to handle properly. The rise of big data has occurred for several reasons, such as the massive increase in e-commerce, the explosion of social media usage, the advent of video and music websites, and so forth.

Big data requires more sophisticated approaches than those used in the past to handle surges of information. This chapter explores the characteristics of big data and introduces the newer approaches that have been developed to handle it.

Characteristics of Big Data

The three main characteristics that define big data are generally considered to be volume, velocity, and variety. These are the three Vs. Volume is easy to understand. There’s a lot of data. Velocity suggests that the data comes in faster than ever and must be stored faster than ever. Variety refers to the wide variety of data structures that may need to be stored. The mixture of incompatible data formats provides another challenge that couldn’t be easily managed by DBMS.

Volume

Volume refers, as you might expect, to the quantity of data being generated. A proliferation of new sources generates massive amounts of data

Enjoying the preview?

Page 1 of 1

Statistics for Big Data For Dummies

About this ebook

Alan Anderson

Read more from Alan Anderson

Related authors

Related to Statistics for Big Data For Dummies

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Statistics for Big Data For Dummies

What did you think?

Book preview

Statistics for Big Data For Dummies - Alan Anderson

Introduction

About This Book

Foolish Assumptions

Beyond the Book

Where to Go From Here

Chapter 1

What Is Big Data and What Do You Do with It?

In This Chapter

Characteristics of Big Data

Exploratory Data Analysis (EDA)

Graphical EDA techniques

Quantitative EDA techniques

Statistical Analysis of Big Data

Probability distributions

Regression analysis

Time series analysis

Forecasting techniques

Chapter 2

Characteristics of Big Data: The Three Vs

In This Chapter

Characteristics of Big Data

Volume