Meta-Analytics: Consensus Approaches and System Patterns for Data Analysis

Ebook657 pages7 hours

Meta-Analytics: Consensus Approaches and System Patterns for Data Analysis

Name: Meta-Analytics: Consensus Approaches and System Patterns for Data Analysis
Author: Steven Simske
ISBN: 9780128146248

By Steven Simske

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Meta-Analytics: Consensus Approaches and System Patterns for Data Analysis presents an exhaustive set of patterns for data science to use on any machine learning based data analysis task. The book virtually ensures that at least one pattern will lead to better overall system behavior than the use of traditional analytics approaches. The book is ‘meta’ to analytics, covering general analytics in sufficient detail for readers to engage with, and understand, hybrid or meta- approaches. The book has relevance to machine translation, robotics, biological and social sciences, medical and healthcare informatics, economics, business and finance.

Inn addition, the analytics within can be applied to predictive algorithms for everyone from police departments to sports analysts.

Provides comprehensive and systematic coverage of machine learning-based data analysis tasks
Enables rapid progress towards competency in data analysis techniques
Gives exhaustive and widely applicable patterns for use by data scientists
Covers hybrid or ‘meta’ approaches, along with general analytics
Lays out information and practical guidance on data analysis for practitioners working across all sectors

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateMar 10, 2019

ISBN9780128146248

Author

Steven Simske

Steven J Simske is HP Fellow and Director at Hewlett Packard Labs, and has worked in machine intelligence and analytics for the past 25 years, with domains extending from medical image analytics to text summarization. He has performed research relevant to meta analytics for over 20 years at HP Labs, and in collaboration with major universities in the US and Brazil.

Related authors

Skip carousel

Related to Meta-Analytics

Related ebooks

Skip carousel

Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management
Ebook
Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management
byKevin Schmidt
Rating: 4 out of 5 stars
4/5
Malware Forensics Field Guide for Linux Systems: Digital Forensics Field Guides
Ebook
Malware Forensics Field Guide for Linux Systems: Digital Forensics Field Guides
byEoghan Casey
Rating: 4 out of 5 stars
4/5
Data Analysis with R
Ebook
Data Analysis with R
byFischetti Tony
Rating: 5 out of 5 stars
5/5
Deep Learning in Bioinformatics: Techniques and Applications in Practice
Ebook
Deep Learning in Bioinformatics: Techniques and Applications in Practice
byHabib Izadkhah
Rating: 0 out of 5 stars
0 ratings
Applied Network Security Monitoring: Collection, Detection, and Analysis
Ebook
Applied Network Security Monitoring: Collection, Detection, and Analysis
byChris Sanders
Rating: 3 out of 5 stars
3/5
Implementing Analytics: A Blueprint for Design, Development, and Adoption
Ebook
Implementing Analytics: A Blueprint for Design, Development, and Adoption
byNauman Sheikh
Rating: 0 out of 5 stars
0 ratings
Social Sensing: Building Reliable Systems on Unreliable Data
Ebook
Social Sensing: Building Reliable Systems on Unreliable Data
byDong Wang
Rating: 0 out of 5 stars
0 ratings
Data Science: Concepts and Practice
Ebook
Data Science: Concepts and Practice
byVijay Kotu
Rating: 3 out of 5 stars
3/5
Data Warehousing in the Age of Big Data
Ebook
Data Warehousing in the Age of Big Data
byKrish Krishnan
Rating: 0 out of 5 stars
0 ratings
Developing High Quality Data Models
Ebook
Developing High Quality Data Models
byMatthew West
Rating: 0 out of 5 stars
0 ratings
Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Rules
Ebook
Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Rules
byJeff Johnson
Rating: 4 out of 5 stars
4/5
The Neurobiology of Brain and Behavioral Development
Ebook
The Neurobiology of Brain and Behavioral Development
byRobbin Gibb
Rating: 5 out of 5 stars
5/5
Cognitive Sophistication and the Development of Judgment and Decision-Making
Ebook
Cognitive Sophistication and the Development of Judgment and Decision-Making
byMaggie E. Toplak
Rating: 0 out of 5 stars
0 ratings
Investigating Windows Systems
Ebook
Investigating Windows Systems
byHarlan Carvey
Rating: 0 out of 5 stars
0 ratings
Data Insights: New Ways to Visualize and Make Sense of Data
Ebook
Data Insights: New Ways to Visualize and Make Sense of Data
byHunter Whitney
Rating: 2 out of 5 stars
2/5
An Elementary Introduction to Statistical Learning Theory
Ebook
An Elementary Introduction to Statistical Learning Theory
bySanjeev Kulkarni
Rating: 0 out of 5 stars
0 ratings
The Basics of Web Hacking: Tools and Techniques to Attack the Web
Ebook
The Basics of Web Hacking: Tools and Techniques to Attack the Web
byJosh Pauli
Rating: 0 out of 5 stars
0 ratings
Simulating Data with SAS
Ebook
Simulating Data with SAS
byRick Wicklin
Rating: 0 out of 5 stars
0 ratings
Designing and Managing Complex Systems
Ebook
Designing and Managing Complex Systems
byDavid Moriarty
Rating: 0 out of 5 stars
0 ratings
ZBrush Digital Sculpting Human Anatomy
Ebook
ZBrush Digital Sculpting Human Anatomy
byScott Spencer
Rating: 5 out of 5 stars
5/5
Interpretable AI: Building explainable machine learning systems
Ebook
Interpretable AI: Building explainable machine learning systems
byAjay Thampi
Rating: 0 out of 5 stars
0 ratings
Statistical Methods for Ranking Data
Ebook
Statistical Methods for Ranking Data
byMayer Alvo
Rating: 0 out of 5 stars
0 ratings
Statistics and Probability in Forensic Anthropology
Ebook
Statistics and Probability in Forensic Anthropology
byZuzana Obertová
Rating: 0 out of 5 stars
0 ratings
It's Our Research: Getting Stakeholder Buy-in for User Experience Research Projects
Ebook
It's Our Research: Getting Stakeholder Buy-in for User Experience Research Projects
byTomer Sharon
Rating: 5 out of 5 stars
5/5
Service Science and the Information Professional
Ebook
Service Science and the Information Professional
byYvonne de Grandbois
Rating: 0 out of 5 stars
0 ratings
Dancing with Qubits: How quantum computing works and how it can change the world
Ebook
Dancing with Qubits: How quantum computing works and how it can change the world
byRobert S. Sutor
Rating: 5 out of 5 stars
5/5
Introduction to Nature-Inspired Optimization
Ebook
Introduction to Nature-Inspired Optimization
byGeorge Lindfield
Rating: 0 out of 5 stars
0 ratings
Penetration Tester's Open Source Toolkit
Ebook
Penetration Tester's Open Source Toolkit
byJeremy Faircloth
Rating: 4 out of 5 stars
4/5
Sensory Evaluation Practices
Ebook
Sensory Evaluation Practices
byHerbert Stone
Rating: 0 out of 5 stars
0 ratings
Distant Readings of Disciplinarity: Knowing and Doing in Composition/Rhetoric Dissertations
Ebook
Distant Readings of Disciplinarity: Knowing and Doing in Composition/Rhetoric Dissertations
byBenjamin Miller
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Dancing with Qubits: How quantum computing works and how it can change the world
Ebook
Dancing with Qubits: How quantum computing works and how it can change the world
byRobert S. Sutor
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

ABA in the Juvenile Justice System: Session 223: This is a fun conversation to share, not only because it involves chatting with three very smart grad students from my alma mater, , but also because the topic tackles an issue that is outside of what we might consider the "mainstream" of Applied...
Podcast episode
ABA in the Juvenile Justice System: Session 223: This is a fun conversation to share, not only because it involves chatting with three very smart grad students from my alma mater, , but also because the topic tackles an issue that is outside of what we might consider the "mainstream" of Applied...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
An AI Hammer in Search of a Nail
Podcast episode
An AI Hammer in Search of a Nail
byHow to Fix the Internet
0 ratings
0% found this document useful
What Lack of Human Connection Does to You & How Roads Impact the World
Podcast episode
What Lack of Human Connection Does to You & How Roads Impact the World
bySomething You Should Know
0 ratings
0% found this document useful
Dangers of Collective Illusions & The Inadequacy of Language - SYSK Choice
Podcast episode
Dangers of Collective Illusions & The Inadequacy of Language - SYSK Choice
bySomething You Should Know
0 ratings
0% found this document useful
SHED SESSION - Everything I Know About Landing a Job in the Conservation World: For years now, listeners have been emailing me with many different versions of the same basic question: “How do I get a job in the conservation world?” Some of the inquiries come from college students or young professionals,...
Podcast episode
SHED SESSION - Everything I Know About Landing a Job in the Conservation World: For years now, listeners have been emailing me with many different versions of the same basic question: “How do I get a job in the conservation world?” Some of the inquiries come from college students or young professionals,...
byMountain & Prairie with Ed Roberson
0 ratings
0% found this document useful
Eye Contact, Core vs Fringe Vocab, State Testing, and Other Non-Controversial Topics: The 2022 VBC Panel: If you're a long-time listener, you've likely heard me talk about the at various points over the last few years. If you're not familiar with the event, it's a two-day workshop that my friends at the have been putting on for...
Podcast episode
Eye Contact, Core vs Fringe Vocab, State Testing, and Other Non-Controversial Topics: The 2022 VBC Panel: If you're a long-time listener, you've likely heard me talk about the at various points over the last few years. If you're not familiar with the event, it's a two-day workshop that my friends at the have been putting on for...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
Tom Sharp
Podcast episode
Tom Sharp
byPeople doing Physics
0 ratings
0% found this document useful
344: Responsible Consumption and Production of Research with Elizabeth Engel and Polly Karpowicz: It’s critical that learning business professionals pay careful attention to the research they create and the research they rely on for making decisions. This means asking questions, knowing the research methods used, and understanding the...
Podcast episode
344: Responsible Consumption and Production of Research with Elizabeth Engel and Polly Karpowicz: It’s critical that learning business professionals pay careful attention to the research they create and the research they rely on for making decisions. This means asking questions, knowing the research methods used, and understanding the...
byLeading Learning Podcast
0 ratings
0% found this document useful
What is Facilitated Communication? Session 199 with Jason Travers: If your social media consumption is anything like mine, you've likely seen some as of late that report on non-speaking students - generally students with Autism - who are graduating from college, giving valedictorian speeches, and so...
Podcast episode
What is Facilitated Communication? Session 199 with Jason Travers: If your social media consumption is anything like mine, you've likely seen some as of late that report on non-speaking students - generally students with Autism - who are graduating from college, giving valedictorian speeches, and so...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
#89 – Owen Cotton-Barratt on epistemic systems and layers of defense against potential global catastrophes: You could think of academia as one big epistemic system — something which processes information, directs people's attention, and finds new ideas. 
Podcast episode
#89 – Owen Cotton-Barratt on epistemic systems and layers of defense against potential global catastrophes: You could think of academia as one big epistemic system — something which processes information, directs people's attention, and finds new ideas. 
by80,000 Hours Podcast
0 ratings
0% found this document useful
Probe Data: The Good, The Bad, and The Ugly
Podcast episode
Probe Data: The Good, The Bad, and The Ugly
bySLP Nerdcast
0 ratings
0% found this document useful
Cris Moore on Algorithmic Justice & The Physics of Inference
Podcast episode
Cris Moore on Algorithmic Justice & The Physics of Inference
byCOMPLEXITY: Physics of Life
0 ratings
0% found this document useful
Are scientific journals just parasites? (with Chris Chambers)
Podcast episode
Are scientific journals just parasites? (with Chris Chambers)
byClearer Thinking with Spencer Greenberg
0 ratings
0% found this document useful
Why Systems Thinkers are Better for the Planet with Stephen Lezak, PhD
Podcast episode
Why Systems Thinkers are Better for the Planet with Stephen Lezak, PhD
byHow to Save the World | The Psychology & Science of Environmental Behavior
0 ratings
0% found this document useful
Albert Kao on Animal Sociality & Collective Computation
Podcast episode
Albert Kao on Animal Sociality & Collective Computation
byCOMPLEXITY: Physics of Life
0 ratings
0% found this document useful
Ep 412 – Factor Five: “I Have a Client Who . . .” Pathology Conversations with Ruth Werner: Blood clots, legitimate concern about blood clots, and well-honed knowledge about blood clots are all part of our profession’s requirements for safe practice, right? And over the years I’ve been doing I Have a Client Who . . ., I’ve received...
Podcast episode
Ep 412 – Factor Five: “I Have a Client Who . . .” Pathology Conversations with Ruth Werner: Blood clots, legitimate concern about blood clots, and well-honed knowledge about blood clots are all part of our profession’s requirements for safe practice, right? And over the years I’ve been doing I Have a Client Who . . ., I’ve received...
byThe ABMP Podcast | Speaking With the Massage & Bodywork Profession
0 ratings
0% found this document useful
AAC, Science-Based Treatment, Clinical Judgement, and More: The 2023 Verbal Behavior Conference Panel Discussion: Session 252 is the recording from the 2023 Verbal Behavior Conference Panel discussion. The participants were Troy Fry, Drs. Lina Slim, Sam Bergmann, Sarah Frampton, Einar Ingvarsson, Pat McGreevy, and Andresa de Sousa; and the voice at the beginning...
Podcast episode
AAC, Science-Based Treatment, Clinical Judgement, and More: The 2023 Verbal Behavior Conference Panel Discussion: Session 252 is the recording from the 2023 Verbal Behavior Conference Panel discussion. The participants were Troy Fry, Drs. Lina Slim, Sam Bergmann, Sarah Frampton, Einar Ingvarsson, Pat McGreevy, and Andresa de Sousa; and the voice at the beginning...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
A Vision of Graduate School for the 21st Century With Monica Granados: Support the show on Patreon ! Or buy me a coffee :) This week on the show, I bring you a very lucid and pleasant conversation about how the graduate school experience could improve with Monica Granados,
Podcast episode
A Vision of Graduate School for the 21st Century With Monica Granados: Support the show on Patreon ! Or buy me a coffee :) This week on the show, I bring you a very lucid and pleasant conversation about how the graduate school experience could improve with Monica Granados,
byBeyond the Thesis With Papa PhD
0 ratings
0% found this document useful
Circulation Fellows-in-Training Podcast
Podcast episode
Circulation Fellows-in-Training Podcast
byCirculation on the Run
0 ratings
0% found this document useful
YCBK 378: 3 reasons why test scores are important to admission officers: In this episode you will hear: o Mark shares something personal that is exhilarating and something personal that is excruciating with the listeners o Mark shares three reasons why test scores are important to...
Podcast episode
YCBK 378: 3 reasons why test scores are important to admission officers: In this episode you will hear: o Mark shares something personal that is exhilarating and something personal that is excruciating with the listeners o Mark shares three reasons why test scores are important to...
byYour College Bound Kid | Admission Tips, Admission Trends & Admission Interviews
0 ratings
0% found this document useful
S2E18 - Can You Decide?
Podcast episode
S2E18 - Can You Decide?
byOpen Your Eyes with McKay Christensen
0 ratings
0% found this document useful
Writing Studies Research in Practice: Welcome to Mere Rhetoric, the podcast for beginners and outsider about the ideas, people and movements who have shaped rhetorical history. Today we’re going to talk about the method to the madness, if madness were writing studies research. That’s...
Podcast episode
Writing Studies Research in Practice: Welcome to Mere Rhetoric, the podcast for beginners and outsider about the ideas, people and movements who have shaped rhetorical history. Today we’re going to talk about the method to the madness, if madness were writing studies research. That’s...
byMere Rhetoric
0 ratings
0% found this document useful
The New Way to Learn: Welcome to the 1,202 new members of the curiosity tribe who have joined us since Friday. Join the 76,887 others who are receiving high-signal, curiosity-inducing content every single week.Today’s newsletter is brought to you by Rows!After spending 7+ year
Podcast episode
The New Way to Learn: Welcome to the 1,202 new members of the curiosity tribe who have joined us since Friday. Join the 76,887 others who are receiving high-signal, curiosity-inducing content every single week.Today’s newsletter is brought to you by Rows!After spending 7+ year
byThe Curiosity Chronicle
0 ratings
0% found this document useful
#90, Demystifying MCMC & Variational Inference, with Charles Margossian
Podcast episode
#90, Demystifying MCMC & Variational Inference, with Charles Margossian
byLearning Bayesian Statistics
0 ratings
0% found this document useful
How to craft and communicate a simple science story: Ditch jargon, keep sentences short, stay topical. Pakinam Amer shares the secrets of good science writing for books and magazines.In the final episode of this six-part series about science communication, three experts describe how they learned to cr...
Podcast episode
How to craft and communicate a simple science story: Ditch jargon, keep sentences short, stay topical. Pakinam Amer shares the secrets of good science writing for books and magazines.In the final episode of this six-part series about science communication, three experts describe how they learned to cr...
byWorking Scientist
0 ratings
0% found this document useful
The Science of Science: A Discussion with Aaron Clauset
Podcast episode
The Science of Science: A Discussion with Aaron Clauset
byNew Books in Science
0 ratings
0% found this document useful
The Science of Science: A Discussion with Aaron Clauset
Podcast episode
The Science of Science: A Discussion with Aaron Clauset
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Episode 281: Nursing’s Role in AI in Health Care: “I think the horizon, the trends that we are seeing today, are indicating that this technology is just going to explode and be integrated into everything we do in nursing or beyond. Many of the things with nursing are going to change significantly...
Podcast episode
Episode 281: Nursing’s Role in AI in Health Care: “I think the horizon, the trends that we are seeing today, are indicating that this technology is just going to explode and be integrated into everything we do in nursing or beyond. Many of the things with nursing are going to change significantly...
byThe Oncology Nursing Podcast
0 ratings
0% found this document useful
Christophe Bernard, Director of Research at INSERM and Editor-in-Chief of eNeuro: An interview with Christophe Bernard
Podcast episode
Christophe Bernard, Director of Research at INSERM and Editor-in-Chief of eNeuro: An interview with Christophe Bernard
byNew Books in Science
0 ratings
0% found this document useful
Data Decisions (w/ Dr. Peter Enns)
Podcast episode
Data Decisions (w/ Dr. Peter Enns)
byThe People Nerds Podcast
0 ratings
0% found this document useful

Skip carousel

Should You Tell Everyone They’re Honest?
Nautilus
Article
Should You Tell Everyone They’re Honest?
Jun 28, 2018
Here is the predicament that most of us seem to be in. We are not virtuous people. We simply do not have characters that are good enough to qualify as honest, compassionate, wise, courageous, and the like. We are not vicious people either—dishonest,
10 min read
For National STEM Day, Argonne Lab’s Valerie Taylor Talks About AI, ‘Star Trek’ And Diversity In The Sciences
Chicago Tribune
Article
For National STEM Day, Argonne Lab’s Valerie Taylor Talks About AI, ‘Star Trek’ And Diversity In The Sciences
Nov 9, 2023
5 min read
Opinion: Bots Started Sabotaging My Online Research. I Fought Back
STAT
Article
Opinion: Bots Started Sabotaging My Online Research. I Fought Back
Nov 21, 2019
I launched an online study aimed at understanding what influences eating behaviors and eating disorders among individuals who identify as in LGBTQ+. Little did I know I was actually launching…
4 min read
What We Learned On The DNA Bootcamp
Family Tree UK
Article
What We Learned On The DNA Bootcamp
Jul 9, 2021
4 min read
Top Scientists Revamp Standards To Foster Integrity In Research
NPR
Article
Top Scientists Revamp Standards To Foster Integrity In Research
Apr 11, 2017
3 min read
Women ‘Don’t Have To Fit Themselves Into Someone Else's Perception,’ Says Turkish Aerospace Engineer
Global Voices
Article
Women ‘Don’t Have To Fit Themselves Into Someone Else's Perception,’ Says Turkish Aerospace Engineer
Mar 17, 2021
5 min read
Starting Out With DNA
Family Tree UK
Article
Starting Out With DNA
Apr 14, 2023
A Beginner’s Guide Which companies provide consumer DNA tests for family history? • 23andMe www.23andme.com/ • Ancestry DNA www.ancestry.co.uk/c/dna • FamilyTreeDNA www.familytreedna.com/ • MyHeritage www.myheritage.com/dna • LivingDNA https://
5 min read
The 'Genome Hacker' Who Mapped a 13-Million-Person Family Tree
The Atlantic
Article
The 'Genome Hacker' Who Mapped a 13-Million-Person Family Tree
Mar 1, 2018
5 min read
In Fermat’s Library, No Margin Is Too Narrow
Nautilus
Article
In Fermat’s Library, No Margin Is Too Narrow
Oct 16, 2017
4 min read
Terminal Velocity
Linux Format
Article
Terminal Velocity
Jun 4, 2019
9 min read
Management: So Much More Than a Science
Rotman Management
Article
Management: So Much More Than a Science
Sep 1, 2019
11 min read
The ‘Unbelievable Journey’ Of CRISPR, Now On Netflix
STAT
Article
The ‘Unbelievable Journey’ Of CRISPR, Now On Netflix
Oct 18, 2019
"Unnatural Selection," a four-part docuseries debuting today, dissects the stories, science, and ethics behind genome editing.
8 min read
Your Digital Family Tree Helpdesk
Family Tree UK
Article
Your Digital Family Tree Helpdesk
Mar 10, 2020
4 min read
The Human Factor
AQ: Australian Quarterly
Article
The Human Factor
Mar 31, 2019
10 min read
Microsoft’s Technology Chief Pivots To Pandemic Response
AppleMagazine
Article
Microsoft’s Technology Chief Pivots To Pandemic Response
May 1, 2020
3 min read
Microsoft’s Technology Chief Pivots To Pandemic Response
TechLife News
Article
Microsoft’s Technology Chief Pivots To Pandemic Response
May 2, 2020
3 min read
Three Ways to Tell If Research Is Bunk
The Atlantic
Article
Three Ways to Tell If Research Is Bunk
Nov 30, 2023
5 min read
The Ham Notebook
CQ Amateur Radio
Article
The Ham Notebook
Jul 1, 2021
7 min read
Readers’comments
PC Pro Magazine
Article
Readers’comments
Oct 8, 2020
4 min read
Innovation Amidst a Pandemic
Rotman Management
Article
Innovation Amidst a Pandemic
Jan 1, 2021
So much has changed in recent months. How is the pandemic affecting AI and machine learning innovation? Most machine learning is used to make predictions about the future and to help business leaders make better decisions today. Fortunately, the type
5 min read
Hey Higher Ed, Why Not Focus On Teaching?
NPR
Article
Hey Higher Ed, Why Not Focus On Teaching?
Jun 7, 2017
5 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read
What No One Understands About Your Job
The Atlantic
Article
What No One Understands About Your Job
Oct 5, 2022
22 min read
Discovering DNA
Family Tree UK
Article
Discovering DNA
Mar 12, 2021
8 min read
Digital Connection
CQ Amateur Radio
Article
Digital Connection
Jan 1, 2023
4 min read
To Protect Research Subjects, Account For The Internet
Futurity
Article
To Protect Research Subjects, Account For The Internet
Dec 3, 2020
3 min read
How Princeton Is Trying to Get More Women to Be Student Leaders
The Atlantic
Article
How Princeton Is Trying to Get More Women to Be Student Leaders
Jul 6, 2017
5 min read
5 QUESTIONS with: Diahan Southard -DNA Expert
Family Tree
Article
5 QUESTIONS with: Diahan Southard -DNA Expert
Nov 27, 2023
2 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Folding@home In Practice
Maximum PC
Article
Folding@home In Practice
Jul 20, 2021
A computational chemist undertaking a postdoctoral study at the KTH Royal Institute of Technology in Stockholm, Sweden, Sergio Perez Conesa uses Folding@home in the hope of uncovering a new drug or treatment for common illnesses associated with ion c
5 min read

Related categories

Skip carousel

Reviews for Meta-Analytics

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Meta-Analytics - Steven Simske

2019

Chapter 1

Introduction, overview, and applications

Abstract

In this mammoth chapter, we cover the basic background material in statistics, machine learning, and artificial intelligence needed to understand the ever-broadening field of analytics. This chapter also introduces the software, data mining, and knowledge discovery skills necessary for the data scientist to proceed toward meta-analytics, that is, the next generation of analytics in which systems are hybrid by design and use multiple analytics to deduce valuable information about the data. Two longer sections at the end of the chapter will show how to build a classifier from the ground up that incorporates much of the statistical approaches of the earlier sections.

Keywords

Algorithms; Analytics; Artificial intelligence; Deep learning; Deep unlearning; Classification; Data mining; Machine intelligence; Machine learning; Parallelism; Recognition; Statistics; System architecture; Systems

It is a capital mistake to theorize before one has data

Arthur Conan Doyle (1887)

Numquam ponenda est pluralitas sine necessitate

William of Ockham, Duns Scotus, et al. (c. 1300)

E pluribus unum

US Motto

1.1 Introduction

We live in a world in which more data have been collected in the past 2–3 years than were collected in the entire history of the world before then. Based on the trends of the past few years, we’ll be saying this for a while. Why is this the case? The confluence of nearly limitless storage and processing power has, quite simply, made it far easier to generate and preserve data. The most relevant question is, perhaps, not whether this will continue, but rather how much of the data will be used for anything more than filling up storage space.

The machine intelligence community is, of course, interested in turning these data into information and has had tremendous success to date albeit in somewhat specific and/or constrained situations. Recent advancements in hardware—from raw processing power and nearly limitless storage capacity, to the architectural revolution that graphics processing units (GPUs) bring, to parallel and distributed computation—have allowed software developers and algorithm developers to encode processes that were unthinkable with the hardware of even a decade ago. Deep learning and in particular convolutional neural networks, together with dataflow programming, allow for an ease of rolling out sophisticated machine learning algorithms and processes that is unprecedented, with the entire field having by all means a bright future.

Taking the power of hybrid architectures as a starting point, analytic approaches can be upgraded to benefit from all components when employing a plurality of analytics. This book is about how simple building blocks of analytics can be used in aggregate to provide systems that are readily optimized for accuracy, robustness, cost, scalability, modularity, reusability, and other design concerns. This book covers the basics of analytics; builds on them to create a set of meta-analytic approaches; and provides straightforward analytics algorithms, processes, and designs that will bring a neophyte up to speed while augmenting the arsenal of an analytics authority. The goal of the book is to make analytics enjoyable, efficient, and comprehensible to the entire gamut of data scientists—in what is surely an age of data science.

1.2 Why is this book important?

First and foremost, this book is meant to be accessible to anyone interested in data science. Data already permeate every science, technology, engineering, and mathematics (STEM) endeavor, and the expectations to generate relevant and copious data in any process, service, or product will only continue to grow in the years to come. A book helping a STEM professional pick up the art of data analysis from the ground up, providing both fundamentals and a roadmap for the future, is needed.

The book is aimed at supplying an extensive set of patterns for data scientists to use to hit the ground running on any machine-learning-based data analysis task and virtually ensures that at least one approach will lead to better overall system behavior (accuracy, cost, robustness, performance, etc.) than by using traditional analytic approaches only. Because the book is meta- analytics, it also must cover general analytics well enough for the reader to engage with and comprehend the hybrid approaches, or meta- approaches. As such, the book aims to allow a relative novice to analytics to move to an elevated level of competency and fluency relatively quickly. It is also intended to challenge the data scientist to think more broadly and more thoroughly than they might be otherwise motivated.

The target audience, therefore, consists of data scientists in all sectors—academia, industry, government, and NGO. Because of the importance of statistical methods, data normalization, data visualization, and machine intelligence to the types of data science included in this book, the book has relevance to machine translation, robotics, biological and social sciences, medical and health-care informatics, economics, business, and finance. The analytic approaches covered herein can be applied to predictive algorithms for everyone from police departments (crime prediction) to sport analysts. The book is readily amenable to a graduate class on systems engineering, analytics, or data science, in addition to a course on machine intelligence. A subset of the book could be used for an advanced undergraduate class in intelligent systems.

Predictive analytics have long held a fascination for people. Seeing the future has been associated with divinity, with magic, with the occult, or simply—and more in keeping with Occam’s razor—with enhanced intelligence. But is Occam’s razor, or the law of parsimony, applicable in the age of data science? It is no longer necessarily the best advice to say Numquam ponenda est pluralitas sine necessitate, or plurality is never to be posited without necessity, unless, of course, one uses goodness of fit to a model, output of sensitivity analysis, or least-squares estimation, among other quantitative artifacts, as proxies for necessity. The concept of predictive analytics, used at the galactic level and extending many thousands of years into the future, is the basis of the Foundation trilogy by Isaac Asimov, written in the middle of the 20th century. Futurist—or should we say mathematician?—Hari Seldon particularized the science of psychohistory, which presumably incorporated an extremely multivariate analysis intended to remove as much uncertainty from the future as possible for those privy to his output. Perhaps, the only prediction he was unable to make was the randomness of the personality of the Mule, an überintelligent, übermanipulative leader of the future. However, his ability to estimate the future in probabilistic terms led to the (correct) prediction of the collapse of the Galactic Empire and so included a manual to abbreviate the millennia of chaos expected to follow. In other words, he may have foreseen not the specific randomness of the Mule, but constructed his psychohistory to be optimally robust to the unforeseen. That is, Hari Seldon performed preflight sensitivity analysis of his predictive model. Kudos to Asimov for anticipating the value of analytics in the future. But even more so, kudos for anticipating that the law of parsimony would be insufficient to address the needs of a predictive analytic system to be insensitive to such unpredictable random artifacts (people, places, and things). The need to provide for the simplest model reasonable—that is, the law of model parsimony—remains. However, it is evident that hybrid systems, affording simplicity where possible but able to handle much more complexity where appropriate, are more robust than either extreme and ultimately will remain relevant longer in real-world applications.

This book is, consequently, important precisely because of the value provided by both the Williams of Ockham and the Hari Seldon. The real world is dynamic and ever-changing, and predictive models must be preadapted to change in the assumptions that underpin them, including but not limited to the drift in data from that used to train the model; changes in the measurement system including sampling, filtering, transduction, and compression; and changes in the interactions between the system being modeled and measured and the larger environment around it. I hope that the approaches revisited, introduced, and/or elaborated in this book will aid data scientists in their tasks while also bringing non-data scientists to sufficient data fluency to be able to interact intelligently with the world of data. One thing is certain—unlike Hari Seldon’s Galactic Empire, the world of data is not about to crumble. It is getting stronger—for good and for bad—every day.

1.3 Organization of the book

This, the first, is the critical chapter for the entire book and takes on a disproportionate length compared with the other chapters intentionally, as this book is meant to stand on its own, allowing the student, data enthusiast, and even data professional to use it as a single source to proceed from unstructured data to fully tagged, clustered, and classified data. This chapter also provides background on the statistics, machine learning, and artificial intelligence needed for analytics and meta-analytics.

Additional chapters, then, elaborate further on what analytics provide. In Chapter 2, the value of training data is thoroughly investigated, and the assumptions around the long-standing training, validation, and testing process are revisited. In Chapter 3, experimental design—from bias and normalization to the treatment of data experiments as systems of data—is considered. In Chapter 4, meta-analytic approaches are introduced, with primary focus being on cumulative gain, or lift, curves. Chapters 5–10 focus on other key aspects of systems around analytics, including the broad but very approachable field of sensitivity analysis (Chapter 5); the powerful family or platform of patterns for analytics loosely described as predictive selection (Chapter 6); a consideration of models, model fitting, and how to design models to be more robust to their environment (Chapter 7); addition analytic design patterns (Chapter 8); the recursive use of analytics to explore the efficacy of employed analytics (Chapter 9); and optimization of analytic system design (Chapter 10), which is a natural follow-on to Chapter 9. Chapter 11 is used to show how optimized system designs not only provide a better buffer to unanticipated random artifacts (these are called aleatory techniques here) but also do a better job of ingesting domain expertise from decidedly nonrandom artifacts, that is, from domain experts and requirements. In Chapters 12–13, the analytic approaches introduced in the preceding chapters are applied to specific technical fields (Chapter 12) and to some broader fields (Chapter 13). In Chapter 14, the contributions of this book are discussed in a larger context, and the future of data in the age of data is described.

A note on what is meant by meta-analytics is worth providing. Essentially, meta-analysis has two broad fields of study/application:

1.Meta- in the sense of meta-algorithmics, where we are combining two or more analytic techniques (algorithms, processes, services, systems, etc.) to obtain improved analytic output.

2.Meta- in the sense of being outside, additional, and augmentative to pure analytics, which includes fields such as testing, ground truthing, training, and sensitivity analysis and optimization of system design.

With this perspective, analytics is more than just simply machine learning: it is also learning in the correct order. It is not only knowledge extraction but also extraction of knowledge in the correct order. It is not only creating information but also creating information in the correct order. This means that analytics is more than simple descriptive or quantitative information. It is meant to extract and tell a story about the data that someone skilled in the field would be able to provide, including modifying the analysis in light of changing data and context for the data.

1.4 Informatics

Occasionally, data science will be used interchangeably with the term informatics. Informatics, however, is a branch of information engineering/science/systems concerned with the impact of data on humans (and presumably the impact of humans on data!). Informatics is concerned with the interaction between humans and relevant information, particularly in how humans process information digitally. Thus, an important aspect of informatics is the study of the social implications of information technologies. From this broad perspective, then, analytics gathered to determine how digital technologies affect humans [Carr11] are an important part of informatics.

In this book, informatics will only be addressed peripherally, that is, as an integrated part of the example, which is instead focused on the algorithmic, process, or system approach to generating information from a data set. This does not mean we are allowed to operate in a vacuum as data scientists; rather, it simply means that this book will not have as a general concern the specific manner in which data are presented nor with which software the data are processed, etc.

1.5 Statistics for analytics

In this section, a quick summary (and, for many readers, a high-level recapitulation) of statistics relevant to data science is given. The main topics covered will be value (mean and estimate), variability, degrees of freedom, analysis of variance, and the relationship of these statistics to information and inferences that can be drawn from the data.

1.5.1 Value and variance

The value is an individual datum, typically binary, numerical, alphanumeric, or a word, depending on the data-type definition. The first-order descriptor of a plurality of values is the mean, μ, which is distinctly different from the average:

(1.1)

For example, the average income, house price, or cost of goods is generally given as the median, not the mean. The average day that the trash collector comes is usually the mode, not the mean. But in most analytics—that is, in parametric analytics—the mean is our average of choice. In nonparametric statistics, the median is often of concern, since the ranked order of values is important. Still on other occasions, the mean does not need to be computed but is instead a specification that a system is required to meet, for example, miles per gallon, cycles before failure, or bends before fatigue. In these cases, a single type of event is monitored and its mean calculated, and this mean is compared with this specification as mean.

Of course, two populations can share the same mean and still be quite different. This is because most populations (and all nontheoretical populations) have variability around the mean. The second moment of the distribution is the variance, usually denoted by σ², whose square root the standard deviation σ, defined in Eq. (1.2), is an important characterizing datum of a distribution:

(1.2)

For a Gaussian, or normal, distribution, roughly 68% of the samples fall within the range {μ − σ, μ + σ}. Note in Eq. (1.2) that the degrees of freedom, or df for short, are equal to (number of samples)-1. This is intuitive since you can only choose the first (number of samples)-1 samples and then the last one is already determined. Degrees of freedom are always important in statistical analyses, since confidence in the result is directly related to the number of times a result has been repeated. While confidence is not a quantitative statistical measure (though confidence intervals are!), generally, confidence increases with degrees of freedom and inversely with variability. The highest possible confidence, then, comes when you repeat the exact same result many, many times.

It is usually quite important to distinguish between comparing means and comparing variances. For example, this distinguishes between weather and climate: if, in a locale, the mean temperature is the same but the variance increases significantly over time, then the mean weather does not change, but the climate does. Similarly, higher variability in a genome more likely leads to new speciation than lower variability.

Another example may be for an engine used for transportation or for hauling materials. For example, the modal and median engine revolutions per minute (RPM), when measured over a day or even over a driving/on-cycle session, may be well within the safety range. But this does not account for the variability. In some short driving sessions, the standard deviation may be as high as the mean, and so, a more important measure might be percent of time spent above a given value, which may be, for example, 1.2 standard deviation above the mean. Here, the nature of the distribution (the shape of the variance) is far more important than the mean. As a general rule, for nonnegative data sets, whenever μ ≤ σ, what you are measuring requires further elaboration to be useful from an analytic viewpoint.

1.5.2 Sample and population tests

This type of confidence directly factors in when we consider the first quantitative measurement for determining whether a sample belongs to a given population. This measure, the z-score, is given in Eq. (1.3), where we see that the numerator is the difference between the sample value, x, and the mean of the population, μ. The denominator is the standard deviation, σ, divided by the square root of the number of samples being compared with the population (which is effectively the degrees of freedom for comparing the sample x to the population having n samples):

(1.3)

Note that the value of z can be positive or negative depending on whether x is greater than the mean of the population. The z-score is used to decide with a given level of confidence that a sample does not come from a population. As such, the absolute value of the z-score in Eq. (1.3) is typically our concern. Table 1.3 provides a few of the most important probabilities and their corresponding z-scores. Two-tailed probability means that we do not know beforehand (a priori) whether a sample is being tested to be above or below the mean of the population; one-tailed probability means that we a priori are testing in a single direction from the mean. For example, a two-tailed test might be it’s not a normal temperature for this day of the year, while a one-tailed test might be it’s warmer than usual for this day of the year. In general, from a conservative statistical standpoint, it is better to use a two-tailed test than a one-tailed test unless you already have a hypothesis, model, or regulation guiding your comparison. You are less likely to have false positives for declaring a sample statistically significantly different from a population this way. Note that the probability of a one-tailed test is halfway to 100% from that of a two-tailed test. Thus, for z = 1.96, we are 95% certain that a sample did not come from a specific population, and we are 97.5% certain that it comes from a second population with a higher mean value if z = 1.96 (and not − 1.96). This makes sense, because we are effectively getting another 50% probability correct if the sign of the calculation z-value is correct. In this case, had z been − 1.96, we would not be able to support our hypothesis since the direction from the mean of the population of size n to which we compare the sample contradicts our hypothesis. (See Table 1.1.)

Table 1.1

The probability is not used to establish whether a sample belongs to a population; rather, it provides the probability that a single sample was not drawn from the population having mean μ and standard deviation σ per Eq. (1.3)

Eq. (1.3) relies on some assumptions that are worth discussing, as there are several factors that affect the z-score in addition to the degrees of freedom. The first is the possibility of non-Gaussian (nonnormal) behavior of the population with which the sample is compared (and the population from which the sample actually comes, although we may have no way of knowing/estimating this population yet). When we consider third- and fourth-order moments such as skew and kurtosis, we may uncover non-Gaussian behavior such as left skew (long tail left), right skew (long tail right), bimodality (two clusters of data, implying that the population represents two subpopulations with different attributes), and other non-Gaussian behaviors (e.g., exponential, uniform, logistic, Poisson, and symmetrical distributions). These distribution deviations from assumed Gaussian behavior impact the interpretation of the z-score (generally undermining the p-value, or probability). Secondly, a temporal drift in the samples belonging to the population will undermine the z-score, since the sample may be compared with data that are no longer relevant. For this reason, the population and sample to compare should be time (and other experimental factor) matched whenever possible. Thirdly, an imbalanced training set or population sample bias will impact the z-score. If the population is meant to cover a specific range of input and does not, it can introduce distribution deviation and/or temporal drift or hide the same.

In practice, z-scores are very important for process control and for identifying outliers. A brief example is given here. Suppose we represent a surface-based forensic, such as you might get using a high-resolution imager [Sims10] and image analysis that subtracts the actual postprinting or postmanufacturing micron-scale surface texture to that of a model [Poll10]. The so-called forensic signature (derived from the variations in electromagnetic spectrum, ultrasound, or other salient physical property) of the surface is represented as a bitstream, with 1024 bits in the string. When a new image is captured, its binary surface detail string is compared with that of the candidate (matched) sample and with the population of (unmatched) samples. The expected Hamming distance to the population of unmatched samples has an expected value of 512 bits (i.e., with random guessing, precisely 50% of the bits should match, and the other 50% should be in error). In our test of binary string descriptors for a large set of surfaces, we obtained a mean Hamming distance to unmatched samples of 509.7 (very close to the expected value of 512, with a standard deviation of 31.6). The number of test samples in the population is 100. Next, we measure a value, 319.4, for the Hamming difference between a surface that we wish to prove is authentic with a forensically relevant probability (typically p = 10− 9, meaning there is one chance in a billion of a false-positive match). Plugging into Eq. (1.3), we get Eq. (1.4):

(1.4)

So, z =− 6.02. Note that we use n = 1 (not n = 100, which is the number of samples to determine the population mean and standard deviation) here, since it is the number of samples that we are comparing with the population. Since z =− 5.997932 corresponds to p = 10− 9, we have (just barely!) forensic authentication (p < 10− 9).

Even though there is a term for n, the number of samples, in the z-score, when the number of samples in a second population increases, we generally employ another statistical test for comparing two populations. This test, the t-test, is given by Eq. (1.5):

(1.5)

In the t-test statistic, the means of the two populations are denoted by the symbol μ, the standard deviations by the symbol σ, and the number in each sample by the symbol n (each with the appropriate numerical subscript). The overall degree of freedom (df) for comparison is n1 + n2 − 2 (this is needed when looking up the corresponding probability, or p, value from a t-table). The − 2 indicates the − 1 degree of freedom lost for selecting from each of the two populations. Statistical significance for one-tailed and two-tailed comparisons is determined as for z-values. Generally, t-tables, whether online or in a text, require the three data: df, t-score, and tailedness (1 or 2). For example, for df = 11, a two-tailed p = 0.01 requires | t |>3.106.

Next, we consider what happens when there are several populations to compare simultaneously. In this case, we generally employ analysis of variance (or ANOVA), which is a collection of statistical models and their associated procedures (such as variation among and between groups) that are used to analyze the differences among group means. As with many other statistical approaches, ANOVA was originally developed for quantitative biological applications. A convenient means of calculating the necessary elements of an ANOVA is the tabular arrangement shown in Table 1.2. Here, a particular variable’s variance (sum squared variability about its mean) is partitioned into components attributed to the different sources of the variation (usually from within the groups or from between the groups). Groups can be clusters, classes, or other labeled sets. ANOVA provides a statistical test for whether the means of several groups are equal, providing a logical extension of the z-score (one dimension) to the t-test (two dimensions) to the comparing (testing) of three or more means for statistical significance.

Table 1.2

See text for details.

As shown in Table 1.2, the sums of squares (around the means) between groups and within groups are calculated. Dividing these by the degrees of freedom gives us the mean squared variance (akin to mean squared error), and the ratio of mean squared error between and within groups gives us an F-score (named for Fisher, who was the first to systematize the ANOVA) to test if there are groups statistically significantly different from each other. High ratios of between-group to within-group variance are the basis of clustering, segmentation, and optimized partitioning. Thus, the F-score used for statistical analysis with the ANOVA is confluent with the aggregation approaches used for clustering.

Additional calculations may be required for follow-on tests that determine the statistically significant differences between the groups, such as the Tukey; Student-Newman-Keuls (SNK); Fisher’s least significant difference (LSD); and Dunnett, Holm, Bonferroni, or Duncan’s multiple range test (MRT) [Ott08]. A variety of follow-on tests allow the statistician to trade off between false positives and false negatives. For example, Duncan’s MRT rank orders the clusters and compares each cluster pair with a critical value determined from a studentized range distribution. This has greater statistical power than the SNK but results in, statistically, more false positives. Tukey’s test is based on the z-test and is functionally akin to pairwise z-tests. The SNK test modifies Tukey’s test to have a more relaxed difference for more closely ranked samples, providing a bias toward false positives for closely ranked samples and the same bias toward false negatives for less closely ranked samples.

1.5.3 Regression and estimation

Regression techniques [Hast09] are used to provide predictive output for input across a broad range of values. There are many flavors of regression, including the familiar linear, polynomial, and logistic regressions that match curve descriptors for the relationship between independent (covariate) and dependent variables. Ridge regression, which is also known as weight decay, adds a regularization term that effectively acts like a Lagrange multiplier to incorporate one or more constraints to a regression equation. The least absolute shrinkage and selection operator (lasso) regression and stepwise selection perform both feature selection (dimensionality reduction, in which only a subset of the provided covariates are used in the final model, rather than the complete set of them) and regularization (which allows the regression to avoid overfitting by introducing, for example, interpolated information). Advanced forms of lasso alter the coefficients of the regression rather than setting some to zero as in stepwise selection. Finally, the elastic net adds penalty terms to extend lasso and provides a combination of lasso and ridge functionality.

In this section, important aspects of regression for prediction—in particular sensitivity of the estimation—will be discussed using linear and logistic regression as the exemplars. Figs. 1.1 and 1.2 provide a simple linear and logistic, respectively, curve, along with the sample points from which the curve was defined. For linear regression, the line of best fit is described by Eq. (1.6):

(1.6)

Fig. 1.1 Example linear regression where the line of best fit for the filled circular points is indicated. The line is determined using least squared error as the cost function.

Fig. 1.2 Example logistic regression where the logistic curve of best fit for the filled circular points is indicated. The curve is determined using least squared error as the cost function.

For the logistic regression curve of Fig. 1.2, the relationship between the dependent and independent variables is given by Eq. (1.7):

(1.7)

Once the regression curve (center curves in Figs. 1.3 and 1.4) is determined, the curve is subtracted from the observations, and the mean and standard deviation of the errors, | xi − μ |, is computed. The error bars shown in Figs. 1.3–1.6 are the 99% error bars, that is, 2.576 standard deviations above and below the regression curves.

Fig. 1.3 Example linear regression of Fig. 1.1 with 99% confidence interval lines indicated. These are 2.576 standard deviations to either side of the regression line.

Fig. 1.4 Example logistic regression of Fig. 1.2 with 99% confidence interval lines indicated. These are 2.576 standard deviations to either side of the regression curve.

Fig. 1.5 Example linear regression of Fig. 1.1 with sensitivity lines indicated. See text for details.

Enjoying the preview?

Page 1 of 1

Meta-Analytics: Consensus Approaches and System Patterns for Data Analysis

About this ebook

Steven Simske

Related authors

Related to Meta-Analytics

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Meta-Analytics

What did you think?

Book preview

Meta-Analytics - Steven Simske

Abstract

Keywords

Algorithms; Analytics; Artificial intelligence; Deep learning; Deep unlearning; Classification; Data mining; Machine intelligence; Machine learning; Parallelism; Recognition; Statistics; System architecture; Systems

1.1 Introduction

1.2 Why is this book important?

1.3 Organization of the book

1.4 Informatics

1.5 Statistics for analytics

1.5.1 Value and variance

1.5.2 Sample and population tests

Table 1.1

(1.5)

Table 1.2

1.5.3 Regression and estimation