Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining

Ebook405 pages3 hours

Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining

Name: Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining
Author: Glenn J. Myatt
ISBN: 9781118422106

By Glenn J. Myatt and Wayne P. Johnson

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Praise for the First Edition

“...a well-written book on data analysis and data mining that provides an excellent foundation...”

—CHOICE

“This is a must-read book for learning practical statistics and data analysis...”

—Computing Reviews.com

A proven go-to guide for data analysis, Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, Second Edition focuses on basic data analysis approaches that are necessary to make timely and accurate decisions in a diverse range of projects. Based on the authors’ practical experience in implementing data analysis and data mining, the new edition provides clear explanations that guide readers from almost every field of study.

In order to facilitate the needed steps when handling a data analysis or data mining project, a step-by-step approach aids professionals in carefully analyzing data and implementing results, leading to the development of smarter business decisions. The tools to summarize and interpret data in order to master data analysis are integrated throughout, and the Second Edition also features:

Updated exercises for both manual and computer-aided implementation with accompanying worked examples
New appendices with coverage on the freely available Traceis™ software, including tutorials using data from a variety of disciplines such as the social sciences, engineering, and finance
New topical coverage on multiple linear regression and logistic regression to provide a range of widely used and transparent approaches
Additional real-world examples of data preparation to establish a practical background for making decisions from data

Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, Second Edition is an excellent reference for researchers and professionals who need to achieve effective decision making from data. The Second Edition is also an ideal textbook for undergraduate and graduate-level courses in data analysis and data mining and is appropriate for cross-disciplinary courses found within computer science and engineering departments.

Skip carousel

Mathematics

LanguageEnglish

PublisherWiley

Release dateJul 2, 2014

ISBN9781118422106

Author

Glenn J. Myatt

Related to Making Sense of Data I

Related ebooks

Skip carousel

Profit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters
Ebook
Profit From Your Forecasting Software: A Best Practice Guide for Sales Forecasters
byPaul Goodwin
Rating: 0 out of 5 stars
0 ratings
Web Developer A Complete Guide - 2019 Edition
Ebook
Web Developer A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
SAS Data Analytic Development: Dimensions of Software Quality
Ebook
SAS Data Analytic Development: Dimensions of Software Quality
byTroy Martin Hughes
Rating: 0 out of 5 stars
0 ratings
Price optimization A Clear and Concise Reference
Ebook
Price optimization A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Single customer view Second Edition
Ebook
Single customer view Second Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Customer Data and Analysis Second Edition
Ebook
Customer Data and Analysis Second Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
The PDMA Handbook of New Product Development
Ebook
The PDMA Handbook of New Product Development
byKenneth B. Kahn
Rating: 4 out of 5 stars
4/5
Solutions Manual to Accompany Introduction to Quantitative Methods in Business: with Applications Using Microsoft Office Excel
Ebook
Solutions Manual to Accompany Introduction to Quantitative Methods in Business: with Applications Using Microsoft Office Excel
byBharat Kolluri
Rating: 0 out of 5 stars
0 ratings
Interactive Data Visualization A Complete Guide - 2020 Edition
Ebook
Interactive Data Visualization A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Governance and Data Management: Contextualizing Data Governance Drivers, Technologies, and Tools
Ebook
Data Governance and Data Management: Contextualizing Data Governance Drivers, Technologies, and Tools
byRupa Mahanti
Rating: 0 out of 5 stars
0 ratings
Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics
Ebook
Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics
byBill Franks
Rating: 4 out of 5 stars
4/5
Marketing Automation: Practical Steps to More Effective Direct Marketing
Ebook
Marketing Automation: Practical Steps to More Effective Direct Marketing
byJeff LeSueur
Rating: 0 out of 5 stars
0 ratings
Organizational Readiness to E-Transformation
Ebook
Organizational Readiness to E-Transformation
byAqel M. Aqel
Rating: 0 out of 5 stars
0 ratings
Data Cleaning: The Ultimate Practical Guide
Ebook
Data Cleaning: The Ultimate Practical Guide
byLee Baker
Rating: 0 out of 5 stars
0 ratings
Analytics and Big Data for Accountants
Ebook
Analytics and Big Data for Accountants
byJim Lindell
Rating: 0 out of 5 stars
0 ratings
International Economic Indicators and Central Banks
Ebook
International Economic Indicators and Central Banks
byAnne Dolganos Picker
Rating: 0 out of 5 stars
0 ratings
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Ebook
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
byJanet Laane Effron
Rating: 0 out of 5 stars
0 ratings
Introduction to Machine Learning in the Cloud with Python: Concepts and Practices
Ebook
Introduction to Machine Learning in the Cloud with Python: Concepts and Practices
byPramod Gupta
Rating: 0 out of 5 stars
0 ratings
Concept Based Practice Questions for Tableau Desktop Specialist Certification Latest Edition 2023
Ebook
Concept Based Practice Questions for Tableau Desktop Specialist Certification Latest Edition 2023
byExam OG
Rating: 0 out of 5 stars
0 ratings
Trade Policies for International Competitiveness
Ebook
Trade Policies for International Competitiveness
byRobert C. Feenstra
Rating: 0 out of 5 stars
0 ratings
A Computational Framework for Segmentation and Grouping
Ebook
A Computational Framework for Segmentation and Grouping
byG. Medioni
Rating: 0 out of 5 stars
0 ratings
Business Process Engineering A Complete Guide - 2020 Edition
Ebook
Business Process Engineering A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Multicriteria Analysis for Environmental Decision-Making
Ebook
Multicriteria Analysis for Environmental Decision-Making
byDavide Geneletti
Rating: 0 out of 5 stars
0 ratings
Federal Data Science: Transforming Government and Agricultural Policy Using Artificial Intelligence
Ebook
Federal Data Science: Transforming Government and Agricultural Policy Using Artificial Intelligence
byFeras A. Batarseh
Rating: 0 out of 5 stars
0 ratings
The Agile Architecture Revolution: How Cloud Computing, REST-Based SOA, and Mobile Computing Are Changing Enterprise IT
Ebook
The Agile Architecture Revolution: How Cloud Computing, REST-Based SOA, and Mobile Computing Are Changing Enterprise IT
byJason Bloomberg
Rating: 0 out of 5 stars
0 ratings
AI Strategy A Complete Guide - 2019 Edition
Ebook
AI Strategy A Complete Guide - 2019 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Advanced Analytics Solutions Second Edition
Ebook
Advanced Analytics Solutions Second Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Visualization A Complete Guide - 2021 Edition
Ebook
Data Visualization A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Calculate Customer Lifetime Value A Clear and Concise Reference
Ebook
Calculate Customer Lifetime Value A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Cash Flow Analysis Complete Self-Assessment Guide
Ebook
Cash Flow Analysis Complete Self-Assessment Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
Calculus For Dummies
Ebook
Calculus For Dummies
byMark Ryan
Rating: 4 out of 5 stars
4/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Basic Math Notes
Ebook
Basic Math Notes
byErnest Bywater
Rating: 5 out of 5 stars
5/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Geometry For Dummies
Ebook
Geometry For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
Introducing Game Theory: A Graphic Guide
Ebook
Introducing Game Theory: A Graphic Guide
byIvan Pastine
Rating: 4 out of 5 stars
4/5
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
Ebook
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
byISAAC TODHUNTER
Rating: 0 out of 5 stars
0 ratings
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
A Mind for Numbers | Summary
Ebook
A Mind for Numbers | Summary
bySummary Station
Rating: 4 out of 5 stars
4/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5
GED® Math Test Tutor, 2nd Edition
Ebook
GED® Math Test Tutor, 2nd Edition
bySandra Rush
Rating: 0 out of 5 stars
0 ratings
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
Ebook
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
byJ Scott
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

CFO lessons learned in planning and forecasting - with Dan Fletcher, CFO Planful
Podcast episode
CFO lessons learned in planning and forecasting - with Dan Fletcher, CFO Planful
byMetrics that Measure Up
0 ratings
0% found this document useful
Transforming Data-Driven Decisions into Success - with Allan Willie, Founder and CEO Klipfolio
Podcast episode
Transforming Data-Driven Decisions into Success - with Allan Willie, Founder and CEO Klipfolio
byMetrics that Measure Up
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
Delek US on Proving Value as the CDO
Podcast episode
Delek US on Proving Value as the CDO
byThe Data Chief
0 ratings
0% found this document useful
#120 Data Trends & Predictions for 2023
Podcast episode
#120 Data Trends & Predictions for 2023
byDataFramed
0 ratings
0% found this document useful
Big Data: The money-making world of big data is discussed by Evan Davis and guests.
Podcast episode
Big Data: The money-making world of big data is discussed by Evan Davis and guests.
byThe Bottom Line
0 ratings
0% found this document useful
Reframing Data Strategy Alignment: Reframing Data Strategy Alignment
Podcast episode
Reframing Data Strategy Alignment: Reframing Data Strategy Alignment
byInsights Tomorrow
0 ratings
0% found this document useful
40: Should data visualization work be outsourced? w/ Mustafa Mustafa: BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don’t have proper in-house skill sets. Don’t risk your reputation because users can’t unsee a...
Podcast episode
40: Should data visualization work be outsourced? w/ Mustafa Mustafa: BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don’t have proper in-house skill sets. Don’t risk your reputation because users can’t unsee a...
byAnalytics on Fire
0 ratings
0% found this document useful
#118 How Power BI Empowers Collaboration
Podcast episode
#118 How Power BI Empowers Collaboration
byDataFramed
0 ratings
0% found this document useful
Generative AI, cybercrime, and scamability, with Stacey Edmonds
Podcast episode
Generative AI, cybercrime, and scamability, with Stacey Edmonds
byLondon Futurists
100%
100% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
Geospatial Cloud and Earth Engine with Chad Jennings and Joel Conkling: On the podcast this week, Mark Mirchandani and Carter Morgan host guests Chad Jennings and Joel Conkling in a fascinating discussion about Earth Engine and performing geospatial processing to help companies become more environmentally conscious.
Podcast episode
Geospatial Cloud and Earth Engine with Chad Jennings and Joel Conkling: On the podcast this week, Mark Mirchandani and Carter Morgan host guests Chad Jennings and Joel Conkling in a fascinating discussion about Earth Engine and performing geospatial processing to help companies become more environmentally conscious.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
[From the Archives] Ep 122: Dr. Rebekah Willson on Grounded Theory: On this episode, Katie is Joined by Dr. Rebekah Willson, a Lecturer in Information Science in the Department of Computer and Information Sciences, University of Strathclyde, Glasgow, UK. Originally from Canada, she obtained her PhD from Charles Sturt...
Podcast episode
[From the Archives] Ep 122: Dr. Rebekah Willson on Grounded Theory: On this episode, Katie is Joined by Dr. Rebekah Willson, a Lecturer in Information Science in the Department of Computer and Information Sciences, University of Strathclyde, Glasgow, UK. Originally from Canada, she obtained her PhD from Charles Sturt...
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
Delivering on the Chief Data Officer Imperatives: A Chief Data Officer (CDO) is expected to use data to continually improve internal operations and create a competitive advantage while aligning with partners, vendors, and customers. But complexities related to data quality, availability, visibility,...
Podcast episode
Delivering on the Chief Data Officer Imperatives: A Chief Data Officer (CDO) is expected to use data to continually improve internal operations and create a competitive advantage while aligning with partners, vendors, and customers. But complexities related to data quality, availability, visibility,...
byCIO Talk Network Podcast
0 ratings
0% found this document useful
SaaStr 470: PagerDuty's CMO and CPO Share How Product & Marketing Should Work Together: In today’s digital age, product management and marketing need to have an integrated and collaborative relationship. In this podcast, hear from PagerDuty CPO, Sean Scott, and CMO, Julie Herendeen, on how PagerDuty has embraced a product-led growth...
Podcast episode
SaaStr 470: PagerDuty's CMO and CPO Share How Product & Marketing Should Work Together: In today’s digital age, product management and marketing need to have an integrated and collaborative relationship. In this podcast, hear from PagerDuty CPO, Sean Scott, and CMO, Julie Herendeen, on how PagerDuty has embraced a product-led growth...
byThe Official SaaStr Podcast: SaaS | Founders | Investors
0 ratings
0% found this document useful
DoorDash’s VP of Analytics & Data Science, Jessica Lachs on Leveraging Data to Delight Customers Despite a Challenging Supply Chain: Who’s hungry? Thanks to delivery apps like DoorDash, it’s never been easier for modern consumers to satisfy almost any craving in just a few taps. At the helm of DoorDash’s data organization is VP of Analytics & Data Science, Jessica Lachs. With metrics guiding every decision at the company, a key part of her decision-making comes down to properly quantifying the value of each possible tradeoff. Learn more about her decision-making framework, plus how her career in finance evolved into entrepreneurship and ultimately led her to create the data and analytics organization at DoorDash.
Podcast episode
DoorDash’s VP of Analytics & Data Science, Jessica Lachs on Leveraging Data to Delight Customers Despite a Challenging Supply Chain: Who’s hungry? Thanks to delivery apps like DoorDash, it’s never been easier for modern consumers to satisfy almost any craving in just a few taps. At the helm of DoorDash’s data organization is VP of Analytics & Data Science, Jessica Lachs. With metrics guiding every decision at the company, a key part of her decision-making comes down to properly quantifying the value of each possible tradeoff. Learn more about her decision-making framework, plus how her career in finance evolved into entrepreneurship and ultimately led her to create the data and analytics organization at DoorDash.
byThe Data Chief
0 ratings
0% found this document useful
DataFramed Careers Series Special Announcement!
Podcast episode
DataFramed Careers Series Special Announcement!
byDataFramed
0 ratings
0% found this document useful
Rooftop solar's biggest fight is back on with Vote Solar executive director Sachu Constantine
Podcast episode
Rooftop solar's biggest fight is back on with Vote Solar executive director Sachu Constantine
byFactor This!
0 ratings
0% found this document useful
Solar’s report card is coming. And a reckoning could follow
Podcast episode
Solar’s report card is coming. And a reckoning could follow
byFactor This!
0 ratings
0% found this document useful
Ep. 92: Dr. David Montgomery - Professor and Author: This episode features Dr. David R Montgomery, professor from the University of Washington and author of three books that shine a light on the often-overlooked role of biology underground and the connections and influence these microbia have with...
Podcast episode
Ep. 92: Dr. David Montgomery - Professor and Author: This episode features Dr. David R Montgomery, professor from the University of Washington and author of three books that shine a light on the often-overlooked role of biology underground and the connections and influence these microbia have with...
byInside Winemaking - the art and science of growing grapes and crafting wine
0 ratings
0% found this document useful
195 - How 9/11 and Katrina Changed Scanning: What changed in the radio landscape in 20 years post 9/11 and Hurricane Katrina here in the US? Have you ever wondered why we have so many new P25 systems showing up in the 700 MHz band? These are the questions we are asking in today's podcast. ...
Podcast episode
195 - How 9/11 and Katrina Changed Scanning: What changed in the radio landscape in 20 years post 9/11 and Hurricane Katrina here in the US? Have you ever wondered why we have so many new P25 systems showing up in the 700 MHz band? These are the questions we are asking in today's podcast. ...
byScanner School - Everything you wanted to know about the Scanner Radio Hobby
0 ratings
0% found this document useful
Painful Lessons From Hurricane Harvey
Podcast episode
Painful Lessons From Hurricane Harvey
byThe Energy Gang
0 ratings
0% found this document useful
98| An Introduction to The Minnesota Update Conference – With Dr. Brad Roper: This episode is a conversation with Dr. Brad Roper about the Minnesota Update Conference (MNC) in neuropsychology. We discuss the history of the Houston Conference Guidelines, including how they have benefited neuropsychology and why they need to be...
Podcast episode
98| An Introduction to The Minnesota Update Conference – With Dr. Brad Roper: This episode is a conversation with Dr. Brad Roper about the Minnesota Update Conference (MNC) in neuropsychology. We discuss the history of the Houston Conference Guidelines, including how they have benefited neuropsychology and why they need to be...
byNavigating Neuropsychology
0 ratings
0% found this document useful
Battery storage is booming. Now what do we do with it?
Podcast episode
Battery storage is booming. Now what do we do with it?
byFactor This!
0 ratings
0% found this document useful
Changepoint Detection: Secret Weapon of the Data Scientist
Podcast episode
Changepoint Detection: Secret Weapon of the Data Scientist
byDataCafé
0 ratings
0% found this document useful
086. The National Equine Economic Impact Study with Julie Broadway
Podcast episode
086. The National Equine Economic Impact Study with Julie Broadway
byOn The Rail
0 ratings
0% found this document useful
Supply Chain Says No Christmas Presents or Trees This Year: On this edition of “The Breakdown Weekly Recap,” NLW looks at supply chain disruptions, including: Natural gas shortages across Europe and the U.K. that could impact electricity prices and even the global food supply Numerous dislocations that...
Podcast episode
Supply Chain Says No Christmas Presents or Trees This Year: On this edition of “The Breakdown Weekly Recap,” NLW looks at supply chain disruptions, including: Natural gas shortages across Europe and the U.K. that could impact electricity prices and even the global food supply Numerous dislocations that...
byThe Breakdown
0 ratings
0% found this document useful
Goldman Looking Into Bitcoin-Backed Lending as Jobs Report Disappoints: This episode is sponsored by . On this edition of “The Breakdown Weekly Recap,” NLW covers: The latest jobs report and what it means for Fed policy Comments from SEC Chair Gensler and a House crypto hearing next week The latest...
Podcast episode
Goldman Looking Into Bitcoin-Backed Lending as Jobs Report Disappoints: This episode is sponsored by . On this edition of “The Breakdown Weekly Recap,” NLW covers: The latest jobs report and what it means for Fed policy Comments from SEC Chair Gensler and a House crypto hearing next week The latest...
byThe Breakdown
0 ratings
0% found this document useful
Episode 17: Perfecting Polymers Processing
Podcast episode
Episode 17: Perfecting Polymers Processing
byMaterialism: A Materials Science Podcast
0 ratings
0% found this document useful

Skip carousel

Is My Data Really Safe? Your Questions About Cloud-Based Storage, Answered.
Entrepreneur
Article
Is My Data Really Safe? Your Questions About Cloud-Based Storage, Answered.
Nov 1, 2014
2 min read
Cybersecurity May Be Beating Cyber Fear
The Christian Science Monitor
Article
Cybersecurity May Be Beating Cyber Fear
Apr 5, 2018
Despite the drumbeat of data breaches, such as Facebook’s, the good news is that companies and governments are putting security first, according to a new survey.
1 min read
Recognizing 'Value Patterns'
Rotman Management
Article
Recognizing 'Value Patterns'
May 1, 2018
7 min read
Shoot Dynamic Street Scenes
TechLife
Article
Shoot Dynamic Street Scenes
Nov 15, 2021
2 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
Salesforce Buying Tableau As Businesses Embrace Data
TechLife News
Article
Salesforce Buying Tableau As Businesses Embrace Data
Jun 15, 2019
1 min read
Formulate Your Future With Numbers
MacLife
Article
Formulate Your Future With Numbers
Mar 6, 2018
Dig deeper into Apple’s spreadsheet app by going beyond its templates. Whether you need to track your budget, monitor investments, or plan a project, these quick tips will get you started. If you’re an ex Excel user and miss its AutoSum function, don
3 min read
Microsoft’s 15 Biggest Flops
Kiplinger
Article
Microsoft’s 15 Biggest Flops
Mar 27, 2018
9 min read
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
STAT
Article
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
Dec 12, 2019
Don't take results from machine learning algorithms at face value. Ask what information isn't available. What subgroups haven't been prioritized? Who is on the research team?
4 min read
How Google Is Making The AI That Powers Its Products Better.
HWM Singapore
Article
How Google Is Making The AI That Powers Its Products Better.
Jun 3, 2019
3 min read
Measure What Matters
Manhattan Institute
Article
Measure What Matters
Feb 27, 2020
10 min read
“Ultimately, The Projects That Succeed Are Those That Invest The Time Insome Sort Of Discovery Phase”
PC Pro Magazine
Article
“Ultimately, The Projects That Succeed Are Those That Invest The Time Insome Sort Of Discovery Phase”
Dec 10, 2020
5 min read
Apple Has Reportedly Hit A Major IPhone Road Block With Its 5G Modem
iPad & iPhone User
Article
Apple Has Reportedly Hit A Major IPhone Road Block With Its 5G Modem
Jul 15, 2022
2 min read
10 Characteristics of Heathcare’s New Reality
Rotman Management
Article
10 Characteristics of Heathcare’s New Reality
May 1, 2021
Digital front door: The only front door. Hospital out-patient and general practitioner appointments across the world have been transformed with as much as three-quarters of all consultations now taking place virtually. ‘Clean’ and ‘dirty’ sites (Gree
2 min read
Data Analysis In Numbers
TechLife
Article
Data Analysis In Numbers
May 30, 2022
2 min read
How European Companies Can Use The Cloud To Increase Their Competitiveness
The European Business Review
Article
How European Companies Can Use The Cloud To Increase Their Competitiveness
Nov 25, 2021
5 min read
Competition Drives Down the Cost of Technology for Driverless Cars
Los Angeles Times
Article
Competition Drives Down the Cost of Technology for Driverless Cars
Dec 15, 2017
SAN FRANCISCO - As the real-world deployment of driverless cars draws near, the race is on to make them even remotely affordable. The systems that currently drive robot cars cost upward of $100,000 per vehicle - not counting the cost of the car itsel
4 min read
New Infrastructure, New Engine
Beijing Review
Article
New Infrastructure, New Engine
May 21, 2020
4 min read
That Russia Router Malware Threat Might Be Worse Than Feared: What You Need To Know
MacWorld
Article
That Russia Router Malware Threat Might Be Worse Than Feared: What You Need To Know
Jul 17, 2018
4 min read
5 Reasons Experts Think Autonomous Cars Are Many Years Away
TechLife News
Article
5 Reasons Experts Think Autonomous Cars Are Many Years Away
Apr 27, 2019
3 min read
Remote Support Software 2020
PC Pro Magazine
Article
Remote Support Software 2020
Aug 13, 2020
3 min read
Do Local Food Markets Support Profitable Farms and Ranches?
Union of Concerned Scientists
Article
Do Local Food Markets Support Profitable Farms and Ranches?
Apr 26, 2018
4 min read
After Years of Challenges, Foursquare Has Found its Purpose -- and Profits
Entrepreneur
Article
After Years of Challenges, Foursquare Has Found its Purpose -- and Profits
Apr 1, 2017
8 min read
20 Top Stock Picks the Analysts Love for 2019
Kiplinger
Article
20 Top Stock Picks the Analysts Love for 2019
Dec 28, 2018
14 min read
3 Salesforce Buys Slack In A $27.4B Deal.
Techfastly
Article
3 Salesforce Buys Slack In A $27.4B Deal.
Jan 6, 2021
5 min read
August and September Have the Two Largest Worldwide Digital-Mode Contests
CQ Amateur Radio
Article
August and September Have the Two Largest Worldwide Digital-Mode Contests
Aug 1, 2021
10 min read
Contesting
CQ Amateur Radio
Article
Contesting
Mar 1, 2021
8 min read
Contesting
CQ Amateur Radio
Article
Contesting
Oct 1, 2022
8 min read
Results of the 2020 CQWW DX SSB Contest
CQ Amateur Radio
Article
Results of the 2020 CQWW DX SSB Contest
Apr 1, 2021
7 min read
Contesting
CQ Amateur Radio
Article
Contesting
Jul 1, 2022
8 min read

Related categories

Skip carousel

Reviews for Making Sense of Data I

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Making Sense of Data I - Glenn J. Myatt

PREFACE

An unprecedented amount of data is being generated at increasingly rapid rates in many disciplines. Every day retail companies collect data on sales transactions, organizations log mouse clicks made on their websites, and biologists generate millions of pieces of information related to genes. It is practically impossible to make sense of data sets containing more than a handful of data points without the help of computer programs. Many free and commercial software programs exist to sift through data, such as spreadsheet applications, data visualization software, statistical packages and scripting languages, and data mining tools. Deciding what software to use is just one of the many questions that must be considered in exploratory data analysis or data mining projects. Translating the raw data collected in various ways into actionable information requires an understanding of exploratory data analysis and data mining methods and often an appreciation of the subject matter, business processes, software deployment, project management methods, change management issues, and so on.

The purpose of this book is to describe a practical approach for making sense out of data. A step-by-step process is introduced, which is designed to walk you through the steps and issues that you will face in data analysis or data mining projects. It covers the more common tasks relating to the analysis of data including (1) how to prepare data prior to analysis, (2) how to generate summaries of the data, (3) how to identify non-trivial facts, patterns, and relationships in the data, and (4) how to create models from the data to better understand the data and make predictions.

The process outlined in the book starts by understanding the problem you are trying to solve, what data will be used and how, who will use the information generated, and how it will be delivered to them, and the specific and measurable success criteria against which the project will be evaluated.

The type of data collected and the quality of this data will directly impact the usefulness of the results. Ideally, the data will have been carefully collected to answer the specific questions defined at the start of the project. In practice, you are often dealing with data generated for an entirely different purpose. In this situation, it is necessary to thoroughly understand and prepare the data for the new questions being posed. This is often one of the most time-consuming parts of the data mining process where many issues need to be carefully adressed.

The analysis can begin once the data has been collected and prepared. The choice of methods used to analyze the data depends on many factors, including the problem definition and the type of the data that has been collected. Although many methods might solve your problem, you may not know which one works best until you have experimented with the alternatives. Throughout the technical sections, issues relating to when you would apply the different methods along with how you could optimize the results are discussed.

After the data is analyzed, it needs to be delivered to your target audience. This might be as simple as issuing a report or as complex as implementing and deploying new software to automatically reapply the analysis as new data becomes available. Beyond the technical challenges, if the solution changes the way its intended audience operates on a daily basis, it will need to be managed. It will be important to understand how well the solution implemented in the field actually solves the original business problem.

Larger projects are increasingly implemented by interdisciplinary teams involving subject matter experts, business analysts, statisticians or data mining experts, IT professionals, and project managers. This book is aimed at the entire interdisciplinary team and addresses issues and technical solutions relating to data analysis or data mining projects. The book also serves as an introductory textbook for students of any discipline, both undergraduate and graduate, who wish to understand exploratory data analysis and data mining processes and methods.

The book covers a series of topics relating to the process of making sense of data, including the data mining process and how to describe data table elements (i.e., observations and variables), preparing data prior to analysis, visualizing and describing relationships between variables, identifying and making statements about groups of observations, extracting interesting rules, and building mathematical models that can be used to understand the data and make predictions.

The book focuses on practical approaches and covers information on how the techniques operate as well as suggestions for when and how to use the different methods. Each chapter includes a Further Reading section that highlights additional books and online resources that provide background as well as more in-depth coverage of the material. At the end of selected chapters are a set of exercises designed to help in understanding the chapter's material. The appendix covers a series of practical tutorials that make use of the freely available Traceis software developed to accompany the book, which is available from the book's website: http://www.makingsenseofdata.com; however, the tutorials could be used with other available software. Finally, a deck of slides has been developed to accompany the book's material and is available on request from the book's authors.

The authors wish to thank Chelsey Hill-Esler, Dr. McCullough, and Vinod Chandnani for their help with the book.

CHAPTER 1

INTRODUCTION

1.1 OVERVIEW

Almost every discipline from biology and economics to engineering and marketing measures, gathers, and stores data in some digital form. Retail companies store information on sales transactions, insurance companies keep track of insurance claims, and meteorological organizations measure and collect data concerning weather conditions. Timely and well-founded decisions need to be made using the information collected. These decisions will be used to maximize sales, improve research and development projects, and trim costs. Retail companies must determine which products in their stores are under- or over-performing as well as understand the preferences of their customers; insurance companies need to identify activities associated with fraudulent claims; and meteorological organizations attempt to predict future weather conditions.

Data are being produced at faster rates due to the explosion of internet-related information and the increased use of operational systems to collect business, engineering and scientific data, and measurements from sensors or monitors. It is a trend that will continue into the foreseeable future. The challenges of handling and making sense of this information are significant because of the increasing volume of data, the complexity that arises from the diverse types of information that are collected, and the reliability of the data collected.

The process of taking raw data and converting it into meaningful information necessary to make decisions is the focus of this book. The following sections in this chapter outline the major steps in a data analysis or data mining project from defining the problem to the deployment of the results. The process provides a framework for executing projects related to data mining or data analysis. It includes a discussion of the steps and challenges of (1) defining the project, (2) preparing data for analysis, (3) selecting data analysis or data mining approaches that may include performing an optimization of the analysis to refine the results, and (4) deploying and measuring the results to ensure that any expected benefits are realized. The chapter also includes an outline of topics covered in this book and the supporting resources that can be used alongside the book's content.

1.2 SOURCES OF DATA

There are many different sources of data as well as methods used to collect the data. Surveys or polls are valuable approaches for gathering data to answer specific questions. An interview using a set of predefined questions is often conducted over the phone, in person, or over the internet. It is used to elicit information on people's opinions, preferences, and behavior. For example, a poll may be used to understand how a population of eligible voters will cast their vote in an upcoming election. The specific questions along with the target population should be clearly defined prior to the interviews. Any bias in the survey should be eliminated by selecting a random sample of the target population. For example, bias can be introduced in situations where only those responding to the questionnaire are included in the survey, since this group may not be representative of a random sample of the entire population. The questionnaire should not contain leading questions—questions that favor a particular response. Other factors which might result in segments of the total population being excluded should also be considered, such as the time of day the survey or poll was conducted. A well-designed survey or poll can provide an accurate and cost-effective approach to understanding opinions or needs across a large group of individuals without the need to survey everyone in the target population.

Experiments measure and collect data to answer specific questions in a highly controlled manner. The data collected should be reliably measured; in other words, repeating the measurement should not result in substantially different values. Experiments attempt to understand cause-and-effect phenomena by controlling other factors that may be important. For example, when studying the effects of a new drug, a double-blind study is typically used. The sample of patients selected to take part in the study is divided into two groups. The new drug is delivered to one group, whereas a placebo (a sugar pill) is given to the other group. To avoid a bias in the study on the part of the patient or the doctor, neither the patient nor the doctor administering the treatment knows which group a patient belongs to. In certain situations it is impossible to conduct a controlled experiment on either logistical or ethical grounds. In these situations a large number of observations are measured and care is taken when interpreting the results. For example, it would not be ethical to set up a controlled experiment to test whether smoking causes health problems.

As part of the daily operations of an organization, data is collected for a variety of reasons. Operational databases contain ongoing business transactions and are accessed and updated regularly. Examples include supply chain and logistics management systems, customer relationship management databases (CRM), and enterprise resource planning databases (ERP). An organization may also be automatically monitoring operational processes with sensors, such as the performance of various nodes in a communications network. A data warehouse is a copy of data gathered from other sources within an organization that is appropriately prepared for making decisions. It is not updated as frequently as operational databases. Databases are also used to house historical polls, surveys, and experiments. In many cases data from in-house sources may not be sufficient to answer the questions now being asked of it. In these cases, the internal data can be augmented with data from other sources such as information collected from the web or literature.

1.3 PROCESS FOR MAKING SENSE OF DATA

1.3.1 Overview

Following a predefined process will ensure that issues are addressed and appropriate steps are taken. For exploratory data analysis and data mining projects, you should carefully think through the following steps, which are summarized here and expanded in the following sections:

Problem definition and planning: The problem to be solved and the projected deliverables should be clearly defined and planned, and an appropriate team should be assembled to perform the analysis.

Data preparation: Prior to starting a data analysis or data mining project, the data should be collected, characterized, cleaned, transformed, and partitioned into an appropriate form for further processing.

Analysis: Based on the information from steps 1 and 2, appropriate data analysis and data mining techniques should be selected. These methods often need to be optimized to obtain the best results.

Deployment: The results from step 3 should be communicated and/or deployed to obtain the projected benefits identified at the start of the project.

Figure 1.1 summarizes this process. Although it is usual to follow the order described, there will be interactions between the different steps that may require work completed in earlier phases to be revised. For example, it may be necessary to return to the data preparation (step 2) while implementing the data analysis (step 3) in order to make modifications based on what is being learned.

FIGURE 1.1 Summary of a general framework for a data analysis project.

1.3.2 Problem Definition and Planning

The first step in a data analysis or data mining project is to describe the problem being addressed and generate a plan. The following section addresses a number of issues to consider in this first phase. These issues are summarized in Figure 1.2.

FIGURE 1.2 Summary of some of the issues to consider when defining and planning a data analysis project.

It is important to document the business or scientific problem to be solved along with relevant background information. In certain situations, however, it may not be possible or even desirable to know precisely the sort of information that will be generated from the project. These more open-ended projects will often generate questions by exploring large databases. But even in these cases, identifying the business or scientific problem driving the analysis will help to constrain and focus the work. To illustrate, an e-commerce company wishes to embark on a project to redesign their website in order to generate additional revenue. Before starting this potentially costly project, the organization decides to perform data analysis or data mining of available web-related information. The results of this analysis will then be used to influence and prioritize this redesign. A general problem statement, such as make recommendations to improve sales on the website, along with relevant background information should be documented.

This broad statement of the problem is useful as a headline; however, this description should be divided into a series of clearly defined deliverables that ultimately solve the broader issue. These include: (1) categorize website users based on demographic information; (2) categorize users of the website based on browsing patterns; and (3) determine if there are any relationships between these demographic and/or browsing patterns and purchasing habits. This information can then be used to tailor the site to specific groups of users or improve how their customers purchase based on the usage patterns found in the analysis. In addition to understanding what type of information will be generated, it is also useful to know how it will be delivered. Will the solution be a report, a computer program to be used for making predictions, or a set of business rules? Defining these deliverables will set the expectations for those working on the project and for its stakeholders, such as the management sponsoring the project.

The success criteria related to the project's objective should ideally be defined in ways that can be measured. For example, a criterion might be to increase revenue or reduce costs by a specific amount. This type of criteria can often be directly related to the performance level of a computational model generated from the data. For example, when developing a computational model that will be used to make numeric projections, it is useful to understand the required level of accuracy. Understanding this will help prioritize the types of methods adopted or the time or approach used in optimizations. For example, a credit card company that is losing customers to other companies may set a business objective to reduce the turnover rate by 10%. They know that if they are able to identify customers likely to switch to a competitor, they have an opportunity to improve retention through additional marketing. To identify these customers, the company decides to build a predictive model and the accuracy of its predictions will affect the level of retention that can be achieved.

It is also important to understand the consequences of answering questions incorrectly. For example, when predicting tornadoes, there are two possible prediction errors: (1) incorrectly predicting a tornado would strike and (2) incorrectly predicting there would be no tornado. The consequence of scenario (2) is that a tornado hits with no warning. In this case, affected neighborhoods and emergency crews would not be prepared and the consequences might be catastrophic. The consequence of scenario (1) is less severe than scenario (2) since loss of life is more costly than the inconvenience to neighborhoods and emergency services that prepared for a tornado that did not hit. There are often different business consequences related to different types of prediction errors, such as incorrectly predicting a positive outcome or incorrectly predicting a negative one.

There may be restrictions concerning what resources are available for use in the project or other constraints that influence how the project proceeds, such as limitations on available data as well as computational hardware or software that can be used. Issues related to use of the data, such as privacy or legal issues, should be identified and documented. For example, a data set containing personal information on customers' shopping habits could be used in a data mining project. However, if the results could be traced to specific individuals, the resulting findings should be anonymized. There may also be limitations on the amount of time available to a computational algorithm to make a prediction. To illustrate, suppose a web-based data mining application or service that dynamically suggests alternative products to customers while they are browsing items in an online store is to be developed. Because certain data mining or modeling methods take a long time to generate an answer, these approaches should be avoided if suggestions must be generated rapidly (within a few seconds) otherwise the customer will become frustrated and shop elsewhere. Finally, other restrictions relating to business issues include the window of opportunity available for the deliverables. For example, a company may wish to develop and use a predictive model to prioritize a new type of shampoo for testing. In this scenario, the project is being driven by competitive intelligence indicating that another company is developing a similar shampoo and the company that is first to market the product will have a significant advantage.

Enjoying the preview?

Page 1 of 1

Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining

About this ebook

Glenn J. Myatt

Read more from Glenn J. Myatt

Related authors

Related to Making Sense of Data I

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Making Sense of Data I

What did you think?

Book preview

Making Sense of Data I - Glenn J. Myatt

1.1 OVERVIEW

1.2 SOURCES OF DATA

1.3 PROCESS FOR MAKING SENSE OF DATA

1.3.1 Overview

1.3.2 Problem Definition and Planning