Data Literacy: How to Make Your Experiments Robust and Reproducible

Ebook589 pages18 hours

Data Literacy: How to Make Your Experiments Robust and Reproducible

Name: Data Literacy: How to Make Your Experiments Robust and Reproducible
Author: Neil Smalheiser
ISBN: 9780128113073

By Neil Smalheiser

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Data Literacy: How to Make Your Experiments Robust and Reproducible provides an overview of basic concepts and skills in handling data, which are common to diverse areas of science. Readers will get a good grasp of the steps involved in carrying out a scientific study and will understand some of the factors that make a study robust and reproducible.The book covers several major modules such as experimental design, data cleansing and preparation, statistical analysis, data management, and reporting. No specialized knowledge of statistics or computer programming is needed to fully understand the concepts presented.

This book is a valuable source for biomedical and health sciences graduate students andresearchers, in general, who are interested in handling data to make their research reproducibleand more efficient.

Presents the content in an informal tone and with many examples taken from the daily routine at laboratories
Can be used for self-studying or as an optional book for more technical courses
Brings an interdisciplinary approach which may be applied across different areas of sciences

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateSep 5, 2017

ISBN9780128113073

Author

Neil Smalheiser

Dr. Neil Smalheiser has over 30 years of experience pursuing basic wet-lab research in neuroscience, most recently studying synaptic plasticity and the genomics of small RNAs. He has also directed multi-disciplinary, multi-institutional consortia dedicated to text mining and bioinformatics research, which have created new theoretical models, databases, open source software, and web-based services. Regardless of the subject matter, one common thread in his research is to link and synthesize different datasets, approaches and apparently disparate scientific problems to form new concepts and paradigms. Another common thread is to identify scientific frontier areas that have fundamental and strategic importance, yet are currently under-studied, particularly because they fall “between the cracks of existing disciplines. This book is based on lecture notes that Dr. Smalheiser prepared for a course he created, “Data Literacy for Neuroscientists, given to undergraduate and graduate students.

Related authors

Skip carousel

Related to Data Literacy

Related ebooks

Skip carousel

Data Mining for the Social Sciences: An Introduction
Ebook
Data Mining for the Social Sciences: An Introduction
byPaul Attewell
Rating: 0 out of 5 stars
0 ratings
Design and Analysis of Experiments in the Health Sciences
Ebook
Design and Analysis of Experiments in the Health Sciences
byGerald van Belle
Rating: 0 out of 5 stars
0 ratings
Data Preparation and Exploration: Applied to Healthcare Data
Ebook
Data Preparation and Exploration: Applied to Healthcare Data
byRobert Hoyt
Rating: 0 out of 5 stars
0 ratings
ReSearch: A Career Guide for Scientists
Ebook
ReSearch: A Career Guide for Scientists
byTeresa M. Evans
Rating: 0 out of 5 stars
0 ratings
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Ebook
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
byJim Frost
Rating: 0 out of 5 stars
0 ratings
Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information
Ebook
Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information
byJules J. Berman
Rating: 0 out of 5 stars
0 ratings
The Real Work of Data Science: Turning data into information, better decisions, and stronger organizations
Ebook
The Real Work of Data Science: Turning data into information, better decisions, and stronger organizations
byRon S. Kenett
Rating: 0 out of 5 stars
0 ratings
Surviving Statistics: A Professor's Guide to Getting Through
Ebook
Surviving Statistics: A Professor's Guide to Getting Through
byLuther Maddy
Rating: 0 out of 5 stars
0 ratings
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
Ebook
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
byJim Frost
Rating: 0 out of 5 stars
0 ratings
The Art of Statistical Thinking
Ebook
The Art of Statistical Thinking
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Data Simplification: Taming Information With Open Source Tools
Ebook
Data Simplification: Taming Information With Open Source Tools
byJules J. Berman
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
Hypothesis Testing: Getting Started With Statistics
Ebook
Hypothesis Testing: Getting Started With Statistics
byLee Baker
Rating: 5 out of 5 stars
5/5
Life Out of Sequence: A Data-Driven History of Bioinformatics
Ebook
Life Out of Sequence: A Data-Driven History of Bioinformatics
byHallam Stevens
Rating: 4 out of 5 stars
4/5
Real World Health Care Data Analysis: Causal Methods and Implementation Using SAS
Ebook
Real World Health Care Data Analysis: Causal Methods and Implementation Using SAS
byDouglas Faries
Rating: 0 out of 5 stars
0 ratings
Business Metadata: Capturing Enterprise Knowledge
Ebook
Business Metadata: Capturing Enterprise Knowledge
byW.H. Inmon
Rating: 4 out of 5 stars
4/5
Introduction to Information Quality
Ebook
Introduction to Information Quality
byCraig Fisher
Rating: 0 out of 5 stars
0 ratings
If Only We Knew What We Know: The Transfer of Internal Knowledge and Best Practi
Ebook
If Only We Knew What We Know: The Transfer of Internal Knowledge and Best Practi
byC. Jackson Grayson
Rating: 5 out of 5 stars
5/5
Entity Information Life Cycle for Big Data: Master Data Management and Information Integration
Ebook
Entity Information Life Cycle for Big Data: Master Data Management and Information Integration
byJohn R. Talburt
Rating: 0 out of 5 stars
0 ratings
The Data Warehouse Lifecycle Toolkit
Ebook
The Data Warehouse Lifecycle Toolkit
byRalph Kimball
Rating: 0 out of 5 stars
0 ratings
Introduction to Data Science Using R
Ebook
Introduction to Data Science Using R
byPrema Alla
Rating: 0 out of 5 stars
0 ratings
Knowledge Management in Libraries: Concepts, Tools and Approaches
Ebook
Knowledge Management in Libraries: Concepts, Tools and Approaches
byMohammad Nazim
Rating: 0 out of 5 stars
0 ratings
Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information
Ebook
Principles and Practice of Big Data: Preparing, Sharing, and Analyzing Complex Information
byJules J. Berman
Rating: 0 out of 5 stars
0 ratings
Writing Effective Business Rules
Ebook
Writing Effective Business Rules
byGraham Witt
Rating: 5 out of 5 stars
5/5
Successes and Failures of Knowledge Management
Ebook
Successes and Failures of Knowledge Management
byJay Liebowitz
Rating: 0 out of 5 stars
0 ratings
Data Quality: Empowering Businesses with Analytics and AI
Ebook
Data Quality: Empowering Businesses with Analytics and AI
byPrashanth Southekal
Rating: 0 out of 5 stars
0 ratings
Model Management and Analytics for Large Scale Systems
Ebook
Model Management and Analytics for Large Scale Systems
byBedir Tekinerdogan
Rating: 0 out of 5 stars
0 ratings
Data Teams: A Unified Management Model for Successful Data-Focused Teams
Ebook
Data Teams: A Unified Management Model for Successful Data-Focused Teams
byJesse Anderson
Rating: 0 out of 5 stars
0 ratings
Persistent Fools: Cunning Intelligence and the Politics of Design
Ebook
Persistent Fools: Cunning Intelligence and the Politics of Design
byThomas Wendt
Rating: 0 out of 5 stars
0 ratings
Designing Science Presentations: A Visual Guide to Figures, Papers, Slides, Posters, and More
Ebook
Designing Science Presentations: A Visual Guide to Figures, Papers, Slides, Posters, and More
byMatt Carter
Rating: 4 out of 5 stars
4/5

Biology For You

Skip carousel

A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals
Ebook
A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals
byRobert F. Kennedy, Jr.
Rating: 3 out of 5 stars
3/5
Why We Sleep: Unlocking the Power of Sleep and Dreams
Ebook
Why We Sleep: Unlocking the Power of Sleep and Dreams
byMatthew Walker
Rating: 4 out of 5 stars
4/5
The Soul of an Octopus: A Surprising Exploration into the Wonder of Consciousness
Ebook
The Soul of an Octopus: A Surprising Exploration into the Wonder of Consciousness
bySy Montgomery
Rating: 4 out of 5 stars
4/5
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
Ebook
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
byGiulia Enders
Rating: 4 out of 5 stars
4/5
The Sixth Extinction: An Unnatural History
Ebook
The Sixth Extinction: An Unnatural History
byElizabeth Kolbert
Rating: 4 out of 5 stars
4/5
All That Remains: A Renowned Forensic Scientist on Death, Mortality, and Solving Crimes
Ebook
All That Remains: A Renowned Forensic Scientist on Death, Mortality, and Solving Crimes
bySue Black
Rating: 4 out of 5 stars
4/5
"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022
Ebook
"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022
byEd Dowd
Rating: 5 out of 5 stars
5/5
Sapiens: A Brief History of Humankind
Ebook
Sapiens: A Brief History of Humankind
byYuval Noah Harari
Rating: 4 out of 5 stars
4/5
The Grieving Brain: The Surprising Science of How We Learn from Love and Loss
Ebook
The Grieving Brain: The Surprising Science of How We Learn from Love and Loss
byMary-Frances O'Connor
Rating: 4 out of 5 stars
4/5
Anatomy 101: From Muscles and Bones to Organs and Systems, Your Guide to How the Human Body Works
Ebook
Anatomy 101: From Muscles and Bones to Organs and Systems, Your Guide to How the Human Body Works
byKevin Langford
Rating: 4 out of 5 stars
4/5
The Winner Effect: The Neuroscience of Success and Failure
Ebook
The Winner Effect: The Neuroscience of Success and Failure
byIan H. Robertson
Rating: 5 out of 5 stars
5/5
Lifespan: Why We Age—and Why We Don't Have To
Ebook
Lifespan: Why We Age—and Why We Don't Have To
byDavid A. Sinclair
Rating: 4 out of 5 stars
4/5
Woman: An Intimate Geography
Ebook
Woman: An Intimate Geography
byNatalie Angier
Rating: 4 out of 5 stars
4/5
Mother of God: An Extraordinary Journey into the Uncharted Tributaries of the Western Amazon
Ebook
Mother of God: An Extraordinary Journey into the Uncharted Tributaries of the Western Amazon
byPaul Rosolie
Rating: 4 out of 5 stars
4/5
Peptide Protocols: Volume One
Ebook
Peptide Protocols: Volume One
byMD William A. Seeds
Rating: 4 out of 5 stars
4/5
Homo Deus: A Brief History of Tomorrow
Ebook
Homo Deus: A Brief History of Tomorrow
byYuval Noah Harari
Rating: 4 out of 5 stars
4/5
Dopamine Detox: Biohacking Your Way To Better Focus, Greater Happiness, and Peak Performance
Ebook
Dopamine Detox: Biohacking Your Way To Better Focus, Greater Happiness, and Peak Performance
byNick Trenton
Rating: 3 out of 5 stars
3/5
The Trouble With Testosterone: And Other Essays On The Biology Of The Human Predi
Ebook
The Trouble With Testosterone: And Other Essays On The Biology Of The Human Predi
byRobert M. Sapolsky
Rating: 4 out of 5 stars
4/5
The Obesity Code: the bestselling guide to unlocking the secrets of weight loss
Ebook
The Obesity Code: the bestselling guide to unlocking the secrets of weight loss
byJason Fung
Rating: 4 out of 5 stars
4/5
Written in Bone: Hidden Stories in What We Leave Behind
Ebook
Written in Bone: Hidden Stories in What We Leave Behind
bySue Black
Rating: 4 out of 5 stars
4/5
The Blood of Emmett Till
Ebook
The Blood of Emmett Till
byTimothy B. Tyson
Rating: 4 out of 5 stars
4/5
The Coming Plague: Newly Emerging Diseases in a World Out of Balance
Ebook
The Coming Plague: Newly Emerging Diseases in a World Out of Balance
byLaurie Garrett
Rating: 4 out of 5 stars
4/5
The Great Mortality: An Intimate History of the Black Death, the Most Devastating Plague of All Time
Ebook
The Great Mortality: An Intimate History of the Black Death, the Most Devastating Plague of All Time
byJohn Kelly
Rating: 4 out of 5 stars
4/5
How Emotions Are Made: The Secret Life of the Brain
Ebook
How Emotions Are Made: The Secret Life of the Brain
byLisa Feldman Barrett
Rating: 4 out of 5 stars
4/5
Fantastic Fungi: How Mushrooms Can Heal, Shift Consciousness, and Save the Planet
Ebook
Fantastic Fungi: How Mushrooms Can Heal, Shift Consciousness, and Save the Planet
byPaul Stamets
Rating: 5 out of 5 stars
5/5
The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race
Ebook
The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness
Ebook
Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness
byPeter Godfrey-Smith
Rating: 4 out of 5 stars
4/5
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career
Ebook
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career
byScott H. Young
Rating: 4 out of 5 stars
4/5
This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking
Ebook
This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking
byJohn Brockman
Rating: 4 out of 5 stars
4/5
Your Brain: A User's Guide: 100 Things You Never Knew
Ebook
Your Brain: A User's Guide: 100 Things You Never Knew
byNational Geographic
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Maintaining Your Data Lake At Scale With Spark - Episode 85: A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
Podcast episode
Maintaining Your Data Lake At Scale With Spark - Episode 85: A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
byData Engineering Podcast
0 ratings
0% found this document useful
Building An Enterprise Data Fabric At CluedIn - Episode 74: An interview about building an enterprise data fabric at scale to ease enterprise data integration
Podcast episode
Building An Enterprise Data Fabric At CluedIn - Episode 74: An interview about building an enterprise data fabric at scale to ease enterprise data integration
byData Engineering Podcast
0 ratings
0% found this document useful
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
Podcast episode
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
#1 Bayes, open-source and bioinformatics, with Osvaldo Martin
Podcast episode
#1 Bayes, open-source and bioinformatics, with Osvaldo Martin
byLearning Bayesian Statistics
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
Podcast episode
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
byThe Black Goat
0 ratings
0% found this document useful
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
Podcast episode
Four Most Commonly Asked Questions About AI with Dr. Jerry Smith: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is talking today about questions and...
byAI Live & Unbiased
0 ratings
0% found this document useful
#030 AI Too Smart for Clinical Trials: SPIRIT-AI and CONSORT-AI — Professor Alastair Deniston and Dr Xiao Liu
Podcast episode
#030 AI Too Smart for Clinical Trials: SPIRIT-AI and CONSORT-AI — Professor Alastair Deniston and Dr Xiao Liu
byBig Picture Medicine
0 ratings
0% found this document useful
589: Prof Michael Rathleff: Barriers Between the Research and Implementation: In this episode, Aalborg University Professor, Prof Michael Rathleff, talks about his role at the upcoming WCSPT. Today, Michael talks about how he organized the congress, creating tools for clinicians to educate their patients, and his research on...
Podcast episode
589: Prof Michael Rathleff: Barriers Between the Research and Implementation: In this episode, Aalborg University Professor, Prof Michael Rathleff, talks about his role at the upcoming WCSPT. Today, Michael talks about how he organized the congress, creating tools for clinicians to educate their patients, and his research on...
byHealthy Wealthy & Smart
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
Data Decisions (w/ Dr. Peter Enns)
Podcast episode
Data Decisions (w/ Dr. Peter Enns)
byThe People Nerds Podcast
0 ratings
0% found this document useful
Probe Data: The Good, The Bad, and The Ugly
Podcast episode
Probe Data: The Good, The Bad, and The Ugly
bySLP Nerdcast
0 ratings
0% found this document useful
Rodney H. Jones, “Health and Risk Communication: An Applied Linguistic Perspective” (Routledge, 2013): Scientists – and I claim to include myself in this category – sometimes seem to be disparaging about the ability of people in general to understand and act upon quantitative data, such as information about risk in the medical domain.
Podcast episode
Rodney H. Jones, “Health and Risk Communication: An Applied Linguistic Perspective” (Routledge, 2013): Scientists – and I claim to include myself in this category – sometimes seem to be disparaging about the ability of people in general to understand and act upon quantitative data, such as information about risk in the medical domain.
byNew Books in Language
0 ratings
0% found this document useful
#60 - Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better: Have you ever been infuriated by a doctor's unwillingness to give you an honest, probabilistic estimate about what to expect? Or a lawyer who won't tell you the chances you'll win your case?
Podcast episode
#60 - Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better: Have you ever been infuriated by a doctor's unwillingness to give you an honest, probabilistic estimate about what to expect? Or a lawyer who won't tell you the chances you'll win your case?
by80,000 Hours Podcast
0 ratings
0% found this document useful
The Trials of Clinical Trials
Podcast episode
The Trials of Clinical Trials
byRaising Health
0 ratings
0% found this document useful
Challenging the Foundation of Asset Pricing Theory with Andrew Chen and Alejandro Lopez-Lira
Podcast episode
Challenging the Foundation of Asset Pricing Theory with Andrew Chen and Alejandro Lopez-Lira
byExcess Returns
0 ratings
0% found this document useful
Causality and Artificial Intelligence with Arni Steingrimsson: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is joined today by , a Data Science...
Podcast episode
Causality and Artificial Intelligence with Arni Steingrimsson: Dr. Jerry Smith welcomes you to another episode of AI Live and Unbiased to explore the breadth and depth of Artificial Intelligence and to encourage you to change the world, not just observe it! Dr. Jerry is joined today by , a Data Science...
byAI Live & Unbiased
0 ratings
0% found this document useful
The science of addiction (with Crystal Dilworth)
Podcast episode
The science of addiction (with Crystal Dilworth)
byClearer Thinking with Spencer Greenberg
0 ratings
0% found this document useful
37: Is Research More Important than Clinical Experience?: Session 37 Each week, Ryan pulls out a question from the forums over at OldPreMeds.org so we can deliver the answers right on to you. This week's question comes from a student asking about research experience, specifically about whether...
Podcast episode
37: Is Research More Important than Clinical Experience?: Session 37 Each week, Ryan pulls out a question from the forums over at OldPreMeds.org so we can deliver the answers right on to you. This week's question comes from a student asking about research experience, specifically about whether...
byOldPreMeds Podcast
0 ratings
0% found this document useful
Nature's Take: Can Registered Reports help tackle publication bias?: Nature staff take on the big topics that matter in science.
Podcast episode
Nature's Take: Can Registered Reports help tackle publication bias?: Nature staff take on the big topics that matter in science.
byNature Podcast
0 ratings
0% found this document useful
#21 Unlocking the power of real-world data – Patrick Ryan
Podcast episode
#21 Unlocking the power of real-world data – Patrick Ryan
byDrug Safety Matters
0 ratings
0% found this document useful
WLP186 Thoughtful Thursday: Reading Between the Headlines: Today’s unusual episode brings us an interview originally recorded for another show, but it’s a great fit for our audience here, with whom we often share and discuss interesting research findings from the changing world of work. Journalism and...
Podcast episode
WLP186 Thoughtful Thursday: Reading Between the Headlines: Today’s unusual episode brings us an interview originally recorded for another show, but it’s a great fit for our audience here, with whom we often share and discuss interesting research findings from the changing world of work. Journalism and...
by21st Century Work Life and leading remote teams
0 ratings
0% found this document useful
L&D's Pivot To Performance: Episode 1 With Dr. Kenneth Yates
Podcast episode
L&D's Pivot To Performance: Episode 1 With Dr. Kenneth Yates
byThe Learning & Development Podcast
0 ratings
0% found this document useful
Neurotech and the Growing Battle for Our Brains
Podcast episode
Neurotech and the Growing Battle for Our Brains
byThe Pulse
0 ratings
0% found this document useful
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
Podcast episode
Ep. 145 - Laura Anne Edwards, DATA OASIS founder, NASA Datanaut, TED Resident & SheCanHackIT on Sustainable Innovation and Big Data: Laura Anne Edwards is founder of DATA OASIS and serves as a NASA Datanaut, TED Resident and with SheCanHackIT. Brian Ardinger, Inside Outside Innovation founder, talks with Laura Anne about sustainable innovation and big data. Important Take Aways: Su
byInside Outside Innovation
0 ratings
0% found this document useful
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
Podcast episode
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
Where expertise and tech meet: AI in healthcare
Podcast episode
Where expertise and tech meet: AI in healthcare
byTechnology Now
0 ratings
0% found this document useful
You Down with RCTs? (Yeah, You Know Me!)
Podcast episode
You Down with RCTs? (Yeah, You Know Me!)
byUnbiased Science
0 ratings
0% found this document useful
Bridging the Research-to-Practice Gap Part 1: It’s Not Your Fault
Podcast episode
Bridging the Research-to-Practice Gap Part 1: It’s Not Your Fault
bySLP Nerdcast
0 ratings
0% found this document useful
#338: Site Selection for Clinical Trials
Podcast episode
#338: Site Selection for Clinical Trials
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful

Skip carousel

Innovating in Canada: Challenges and Opportunities
Rotman Management
Article
Innovating in Canada: Challenges and Opportunities
May 1, 2023
There is widespread worry that growth and innovation in Canada are slowing down. Why is that? If you think back to the early 20th century, with innovations like airplanes, electricity, automobiles and home appliances, the world was changing very quic
6 min read
Opinion: Big Data Often Yields Small Returns. Here’s How to Fix That
STAT
Article
Opinion: Big Data Often Yields Small Returns. Here’s How to Fix That
Nov 29, 2017
4 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
Article
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Management: So Much More Than a Science
Rotman Management
Article
Management: So Much More Than a Science
Sep 1, 2019
11 min read
The National Academies Illustrates the More Nuanced Value of Transparency in Science
Union of Concerned Scientists
Article
The National Academies Illustrates the More Nuanced Value of Transparency in Science
May 13, 2019
4 min read
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
STAT
Article
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
Dec 12, 2019
Don't take results from machine learning algorithms at face value. Ask what information isn't available. What subgroups haven't been prioritized? Who is on the research team?
4 min read
Leadership Forum: Creative Destruction in Healthcare
Rotman Management
Article
Leadership Forum: Creative Destruction in Healthcare
Jan 1, 2020
10 min read
Think Like a Researcher
UltraRunning Magazine
Article
Think Like a Researcher
Nov 26, 2021
When I was asked to start writing this column in 2015, I had just started as an assistant professor at The College of Idaho after moving from the Bay Area. I had just injured myself and was struggling to regain the peak running form I achieved a few
6 min read
A New Goal: Aim To Be Less Wrong
NPR
Article
A New Goal: Aim To Be Less Wrong
Feb 12, 2018
4 min read
Opinion: Fixing Health Care’s Replication Crisis Is Important For Researchers And Patients
STAT
Article
Opinion: Fixing Health Care’s Replication Crisis Is Important For Researchers And Patients
Jul 16, 2019
We are awash in reports of new study results, from a potential cure for an illness or the latest take on what makes a healthy diet. But are we getting…
4 min read
How the Pandemic Has Tested Behavioral Science
Nautilus
Article
How the Pandemic Has Tested Behavioral Science
Jul 6, 2020
5 min read
How Uncertainty Can Help Fight Science Denialism
Nautilus
Article
How Uncertainty Can Help Fight Science Denialism
Jun 26, 2013
3 min read
Opinion: Biotech Execs Need To Create An Environment That Fosters A Truly Scientific Culture
STAT
Article
Opinion: Biotech Execs Need To Create An Environment That Fosters A Truly Scientific Culture
Mar 13, 2019
All #biotech CEOs should be energetic about their higher calling as caretakers of science and fostering cultures that support it.
3 min read
How to Make Sense of Contradictory Science Papers
Nautilus
Article
How to Make Sense of Contradictory Science Papers
Jun 2, 2021
The science you can come across today can often appear to be full of contradictory claims. One study tells you red wine is good for your heart; another tells you it is not. Over the past year, COVID-19 research has offered conflicting reports about t
6 min read
Using Neuroscience Insights for Innovation
Rotman Management
Article
Using Neuroscience Insights for Innovation
Jan 1, 2019
Tell us about your work at the Wharton School’s Neuroscience Initiative. Our lab tries to understand how the brain makes decisions and motivates behaviour. We use an array of techniques, including psychophysics, intracranial recordings, brain stimula
5 min read
Should Scientists Publish Their Personal Biases?
Nautilus
Article
Should Scientists Publish Their Personal Biases?
Dec 19, 2017
2 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Google Isn’t Grad School
The Atlantic
Article
Google Isn’t Grad School
Jul 6, 2023
4 min read
In Conversation With CHARLES BOICEY
Techfastly
Article
In Conversation With CHARLES BOICEY
Aug 2, 2021
8 min read
Intelligence Analysis
PRIVATE GAME WILDLIFE RANCHING
Article
Intelligence Analysis
Jun 13, 2018
3 min read
The Bias Against Radical Innovation
The European Business Review
Article
The Bias Against Radical Innovation
Sep 30, 2022
4 min read
Why We Shouldn’t Accept Unrepeated Science—Our Author Responds to His Critics
Nautilus
Article
Why We Shouldn’t Accept Unrepeated Science—Our Author Responds to His Critics
Aug 17, 2016
4 min read
Opinion: Artificial Intelligence In Pharma, Health Care: At The Crossroads Of Hype And Reality
STAT
Article
Opinion: Artificial Intelligence In Pharma, Health Care: At The Crossroads Of Hype And Reality
Dec 6, 2018
Artificial intelligence is at the forefront of the minds of many pharmaceutical and health care executives. Is it hype, or the future?
4 min read
How To Spot Juck Science
Men's Health
Article
How To Spot Juck Science
Oct 14, 2021
4 min read
The Pitfalls of AI Health Coaches
Nautilus
Article
The Pitfalls of AI Health Coaches
Mar 5, 2024
You’re dragging a bit as you get out of bed, but you’re roused by the greeting of your AI health coach: “Ready for a healthy, happy day?” it chirps from your smartwatch. “I’ve been noticing a trend,” it continues, unbidden. Not again, you think. “Sin
6 min read
Current Development in PSYCHOMETRIC TESTS
The European Business Review
Article
Current Development in PSYCHOMETRIC TESTS
Oct 3, 2019
9 min read
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Union of Concerned Scientists
Article
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Mar 4, 2020
3 min read
Opinion: Health Data Alone Doesn’t Account For Much. We Need Better Ways To Extract Its Value
STAT
Article
Opinion: Health Data Alone Doesn’t Account For Much. We Need Better Ways To Extract Its Value
Mar 6, 2019
5 min read
The Dawn Of Post-theory Science
Guardian Weekly
Article
The Dawn Of Post-theory Science
Jan 14, 2022
5 min read
Opinion: Trustworthy Data From Consumer Health Devices Can Help Solve Stubborn Health Care Challenges
STAT
Article
Opinion: Trustworthy Data From Consumer Health Devices Can Help Solve Stubborn Health Care Challenges
Jul 9, 2019
Clinicians, researchers, and innovators need to collaborate on data integrity now, before bad consumer-generated data undermine consumer health devices and medicine.
4 min read

Related categories

Skip carousel

Reviews for Data Literacy

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Data Literacy - Neil Smalheiser

Data Literacy

How to Make Your Experiments Robust and Reproducible

Neil R. Smalheiser, MD, PHD

Associate Professor in Psychiatry, Department of Psychiatry and Psychiatric Institute, University of Illinois School of Medicine, USA

Cover image

Title page

Copyright

What Is Data Literacy?

Introduction

Acknowledgments

Why This Book?

Introduction

Part A. Designing Your Experiment

Chapter 1. Reproducibility and Robustness

Basic Terms and Concepts

Reproducibility and Robustness Tales to Give You Nightmares

The Way Forward

Chapter 2. Choosing a Research Problem

Introduction

Scientific Styles and Mind-Sets

Programmatic Science Versus Lily-Pad Science

Criteria (and Myths) in Choosing a Research Problem

Strong Inference

Designing Studies as Court Trials

Introduction

Chapter 3. Basics of Data and Data Distributions

Introduction

Averages

Variability

The Bell-Shaped (Normal) Curve

Normalization

Distribution Shape

A Peek Ahead at Sampling, Effect Sizes, and Statistical Significance

Other Important Curves and Distributions

Probabilities That Involve Discrete Counts

Conditional Probabilities

Are Most Published Scientific Findings False?

Chapter 4. Experimental Design: Measures, Validity, Sampling, Bias, Randomization, Power

Measures

Validity

Sampling and Randomization

Sources of Bias in Experiments

Power Estimation

Introduction

Chapter 5. Experimental Design: Design Strategies and Controls

A Feeling for the Organism

Building an Experimental Series in Layers

Specific Design Strategies

Controls

Specific, Nonspecific, and Background Effects

Simple Versus Complex Experimental Designs

How Many Times Should One Repeat an Experiment Before Publishing?

Some Common Pitfalls to Avoid

What to Do When the Unexpected Happens During an Experiment?

Should Experimental Design Be Centered Around the Null Hypothesis?

Chapter 6. Power Estimation

Introduction

What Is Power Estimation?

The Nuts and Bolts

A Closer Look at Fig. 6.1 and the Parameters That Go Into Power Estimation

How to Increase the Power of an Experiment

What Is the Power of Published Experiments in the Literature?

The Hidden Dangers of Carrying Out Underpowered Experiments

The File Drawer Problem in Science and How Adequate Power Helps

Why Not Carry Out Power Estimation After the Experiment Is Completed?

Introduction

Part B. Getting a Feel for Your Data

Chapter 7. The Data Cleansing and Analysis Pipeline

Steps in Data Cleansing

Data Normalization

A Brief Data Cleansing Checklist

Chapter 8. Topics to Consider When Analyzing Data

What Is an Experimental Outcome?

Why You Need to Present and Examine All the Results

Data Fishing, p-Hacking, HARKing, and Post Hoc Analyses

Problems Associated With Heterogeneity

Problems Associated with Nonindependence

Even Professionals Make This Mistake Half the Time!

In Summary

Introduction

Part C. Statistics (Without Much Math!)

Chapter 9. Null Hypothesis Statistical Testing and the t-Test

The Nuts and Bolts of Null Hypothesis Statistical Testing (NHST)

What Null Hypothesis Statistical Testing Does and Does Not Do

Does it Matter if My Population is Normally Distributed or Not?

Choosing t-Test Parameters

A Final Word

Chapter 10. The New Statistics and Bayesian Inference

Statistical Significance Is Not Scientific Significance

The Magical Value P=.05

How to Move Beyond Null Hypothesis Statistical Testing?

Conditional Probabilities

Bayes' Rule

Bayesian Inference

Comparing Null Hypothesis Statistical Testing and Bayesian Inference

Systematic Reviews and Metaanalyses

Chapter 11. ANOVA

Analysis of Variance (ANOVA)

One-Way ANOVA (One Factor or One Treatment)

ANOVA Is a Parametric Test

Types of ANOVAs

The ANOVA Shows Significance; What Next?

Correction for Multiple Testing

Chapter 12. Nonparametric Tests

Introduction

The Sign Test

The Wilcoxon Signed-Rank Test

The Mann–Whitney U Test

Exact Tests

Nonparametric t-Tests

Nonparametric ANOVAS

Permutation Tests

Chapter 13. Correlation and Other Concepts You Should Know

Linear Correlation and Linear Regression

What Correlations Mean and What They Do Not

Nonparametric Correlation

Multiple Linear Regression Analysis

Logistic Regression

Machine Learning

Some Machine-Learning Methods

Big Data

Dimensional Reduction

Part D. Make Your Data Go Farther

Chapter 14. How to Record and Report Your Experiments

Scientists Keep Diaries Too!

Who Owns Your Data?

Reporting Authorship

Reporting Citations

Writing the Introduction/Motivation Section

Writing the Methods Section

Writing the Results

Writing the Discussion/Conclusion Sections

Introduction

Chapter 15. Data Sharing and Reuse

Data Sharing—When, Why, With Whom

Data Sharing Is Good for You (Really)

Data Archiving and Sharing Infrastructure

Terminologies

Ontologies

Your Experiment Is Not Just for You! or Is It?

What Data to Share?

Where to Share Data?

Data Repositories and Databases

Servers and Workflows

A Final Thought

Introduction

Chapter 16. The Revolution in Scientific Publishing

Journals as an Ecosystem

Peer Review

Journals That Publish Primary Research Findings

Indexing of Journals

One Journal Is a Mega Outlier

What Is Open Access?

Impact Factors and Other Metrics

New Trends in Peer Review

The Scientific Article as a Data Object

Where Should I Publish My Paper?

Is There an Ideal Publishing Portfolio?

Introduction

Postscript: Beyond Data Literacy

Learned Concepts

Index

Copyright

Academic Press is an imprint of Elsevier

125 London Wall, London EC2Y 5AS, United Kingdom

525 B Street, Suite 1800, San Diego, CA 92101-4495, United States

50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

ISBN: 978-0-12-811306-6

For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mica Haley

Acquisition Editor: Rafael E. Teixeira

Editorial Project Manager: Mariana L. Kuhl

Production Project Manager: Poulouse Joseph

Designer: Alan Studholme

Typeset by TNQ Books and Journals

Cover image credit and illustrations placed between chapters: Stephanie Muscat

What Is Data Literacy?

Being literate means—literally!—being able to read and write, but it also implies having a certain level of curiosity and acquiring enough background to notice, appreciate, and enjoy the finer points of a piece of writing. A person who has money literacy may not have taken courses in accounting or business, but is likely to know how much they have in the bank, to know whose face is on the 10-dollar bill, and to know roughly how much they spend on the electric bill each month. Many famous musicians have no formal training and cannot read sheet music (Jimi Hendrix and Eric Clapton, to name two), yet they do possess music literacy—able to recognize, produce, and manipulate melodies, harmonies, rhythms, and chord shifts. And data literacy? Almost everyone has some degree of data literacy—one speaks of 1 bird or 2 birds, but never 1.3 birds!

The goal of this book is to learn how a scientist looks at data—how a feeling for data permeates every aspect of a scientific investigation, touching on aspects of experimental design, data analysis, statistics, and data management. After acquiring scientific data literacy, you will not be able to hear about an experiment without automatically asking yourself a series of questions such as: Is the sampling adequate in size, balanced, and unbiased? What are the positive and negative controls? Are the data properly cleansed and normalized?

Data literacy makes a difference in daily life too: When a layperson goes to the doctor for a checkup, the nurse tells him or her to take off their shoes and they step on the scale (Fig. 1). When a scientist goes to the doctor's office, before they step on the scale, they tare the scale to make sure it reads zero when no weight is applied. Then, they find a known calibrated weight and put it on the scale, to make sure that it reads accurately (to within a few ounces). They may even take a series of weights that cover the range of their own weight (say, 100, 150, and 200 pounds) to make sure that the readings are linear within the range of its effective operation. They take the weight of their clothes (and contents of their pockets) into account, perhaps by estimation, perhaps by disrobing. Finally, they step on the scale. And then they do that three times and take the average of the three measurements!

Figure 1 A nurse weighs a patient who seems worried – maybe he is thinking about the need for calibration and linearity of the measurement?

This book is based upon a course that I have given to graduate students in neuroscience at the University of Illinois Medical School. Because most of the students are involved in laboratory animal studies, test-tube molecular biological studies, human psychological or neuroimaging studies, or clinical trials, I have chosen examples liberally from this sphere. Some of the examples do unavoidably have jargon, and a basic familiarity with science is assumed. However, the book should be readable and relevant to students and working scientists of any discipline, including physical sciences, biomedical sciences, social sciences, information science, and computer science.

Even though all graduate students have the opportunity to take courses on experimental design and statistics, I have found that the amount of material presented there is overwhelmingly comprehensive. Equally important, the authors of textbooks on those topics come from a different world than the typical student contemplating a career at the laboratory bench. (Hint: There is a hidden but yawning digital divide between the world of those who can program computer code, and those who cannot.) As a result, students tend to learn experimental design and statistics by rote yet do not achieve a basic, intuitive sense of data literacy that they can apply to their everyday scientific life.

Hence this book is not intended to replace traditional courses, texts, and online resources, but rather should be read as a prequel or supplement to them. I will try to illustrate points with examples and anecdotes, sometimes from my own personal experiences—and will offer more personal opinions, advice, and tips than you may be used to seeing in a textbook! On the other hand, I will not include problem sets and will cite only the minimum number of references to scholarly works.

Introduction

Teaching is harder than it looks.

Acknowledgments

Thanks to John Larson for originally inviting me to teach a course on Data Literacy for students in the Graduate Program in Neuroscience at the University of Illinois Medical School in Chicago. I owe a particular debt of gratitude to the students in the class, whose questions and feedback have shaped the course content over several years. My colleagues Aaron Cohen and Maryann Martone gave helpful comments and corrections on selected chapters. Vetle Torvik and Giovanni Lugli have been particularly longstanding research collaborators of mine, and my adventures in experimental design and data analysis have often involved one or both of them. Finally, I thank my illustrator, Stephanie Muscat, who has a particular talent for capturing scientific processes in visual terms—simply and with humor.

Why This Book?

The scientific literature is increasing exponentially. Each day, about 2000 new articles are added to MEDLINE, a free and public curated database of peer-reviewed biomedical articles (http://www.pubmed.gov). And yet, the scientific community is currently faced with not one, but two major crises that threaten our continued progress.

First, a huge amount of waste occurs at every step in the scientific pipeline [1]: Most experiments that are carried out are preliminary (pilot studies), descriptive, small scale, incomplete, lack some controls for interpretation, have unclear significance, or simply do not give clear results. Of experiments that do give clear results, most are never published, and the majority of those published are never cited (and may not ever be read!). The original raw data acquired by the experimenter sits in a drawer or on a hard drive, eventually to be lost. Rarely are the data preserved in a form that allows others to view them, much less reuse them in additional research.

Second, a significant minority of published findings cannot be replicated by independent investigators. This is both a crisis of reproducibility (failing to find the same results even when trying to duplicate the experimental variables exactly) [2,3] and robustness (failing to find similar results when seemingly incidental variables are allowed to vary, e.g., when an experiment originally reported on 6-month-old Wistar rats is repeated on 8-month-old Sprague Dawley rats). The National Institutes of Health and leading journals and pharmaceutical companies have acknowledged the problem and its magnitude and are taking steps to improve the way that experiments are designed and reported [4–6].

What has brought us to this state of affairs? Certainly, a lack of data literacy is a contributing factor, and a major goal of this book is to cover issues that contribute to waste and that limit reproducibility and robustness. However, we also need to face the fact that the culture of science actively encourages scientists to engage in a number of engrained practices that—if we are being charitable—would describe as outdated. The current system rewards scientists for publishing findings that lead to funding, citations, promotions, and awards. Unfortunately, none of these goals are under the direct control of the investigators themselves! Achieving high impact or winning an award is like achieving celebrity in Hollywood: capricious and unpredictable. One would like to believe that readers, reviewers, and funders will recognize and support work that is of high intrinsic quality, but evidence suggests that there is a high degree of randomness in manuscript and grant proposal scores [7,8], which can lead to superstitious behavior [9] and outright cheating. In contrast, it is within the power of each scientist to make their data solid, reliable, extensive, and definitive in terms of findings. The interpretation of the data may be tentative and may not be true in some abstract or lasting sense, but at least others can build on the data in the future.

In fact, philosophically, there are some advantages to recentering the scientific enterprise around the desire to publish findings that are, first and foremost, robust and reproducible. As we will see, placing a high value on robustness and reproducibility empowers scientists and is part of a larger emerging movement that includes open access for publishing and open sharing of data.

Traditionally, a scientific paper is expected to present a coherent narrative with a strong interpretation and a clear conclusion—that is, it tells a good story and it has a good punch line! The underlying data are often presented in a highly compressed, summarized form, or not presented at all. Recently, however, there has been a move toward considering the raw data themselves to be the primary outcome of a scientific study, to be carefully described and preserved, while the authors' own analyses and interpretation are considered secondary or even dispensible.

We can see why this may be a good idea: For example, let us consider a study hypothesizing that the age of the father (at the time of birth) correlates positively with the risk of their adult offspring developing schizophrenia [10]. Imagine that the raw data consist of a table of human male subjects listing their ages and other attributes, together with a list of their offspring and subsequent psychiatric histories. Different investigators might choose to analyze these raw data in different ways, which might affect or alter their conclusions: For example, one might correlate paternal ages with risk across the entire life cycle, while another might divide the subjects into categorical groups, e.g., young fathers (aged 14–21 years), regular fathers (aged 21–40 years), and old fathers (aged 40+ years). Another investigator might focus only on truly old fathers, e.g., aged 50 or even 60 years. Furthermore, investigators might correlate ages with overall prevalence of psychiatric illnesses, or any disease having psychotic features, or only those with a stable diagnosis of schizophrenia by the age of 30 years, etc. Without knowing the nature of the effect in advance, one could defend any of these ways of analyzing the data.

So, the same data can be sliced and diced in any number of ways, and the resulting publication can look very different depending on how the authors choose to proceed. Even if one accepts that there is some relationship between paternal age and schizophrenia—and this finding has been replicated many times in the past 15 years—it is not at all obvious what this finding means in terms of underlying mechanisms. One can imagine that older fathers might bring up their children differently (e.g., perhaps exposing their young offspring to old-fashioned discipline practices). Alternatively, older fathers may have acquired a growing number of point mutations in their sperm DNA over time! Subsequent follow-up studies may attempt to characterize the relationship of age to risk in more detail, and to test hypotheses regarding which possible mechanisms seem most likely. And of course, the true mechanism(s) might reflect genetic or environmental influences that are not even appreciated or known at the time that the relation of age to risk was first noticed.

To summarize, the emerging view is that the bedrock of a scientific paper is its data. The authors' presentation and analysis of the data, resulting in its primary finding, is traditionally considered by most scientists to be the outcome of the paper, and it is this primary finding that ought to be robust and reproducible. However, as we have seen, the primary finding is a bit more subjective and removed from the data themselves, and according to the emerging view, it is NOT the bedrock of the paper. Rather, it is important that independent investigators should be able to view the raw data to reanalyze them, or compare or pool with other data obtained from other sources. Finally, the authors' interpretation of the finding, and their general conclusions, may be insightful and point the way forward, but should be taken with a big grain of salt.

The status quo of scientific practice is changing, radically and rapidly, and it is important to understand these trends to do science in the 21st century. This book will provide a roadmap for students wishing to navigate each step in the pipeline, from hypothesis to publication, during this time of transition. Do not worry, this roadmap won't turn you into a mere data collector. Finding novel, original, and dramatic findings, and achieving breakthroughs will remain as important as ever.

References

[1] Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet. July 4, 2009;374(9683):86–89. doi: 10.1016/S0140-6736(09)60329-9.

[2] Ioannidis J.P. Why most published research findings are false. PLoS Med. August 2005;2(8):e124.

[3] Leek J.T, Jager L.R. Is most published research really false? bioRXiv. April 27, 2016 doi: 10.1101/050575.

[4] Landis S.C, Amara S.G, Asadullah K, Austin C.P, Blumenstein R, Bradley E.W, Crystal R.G, Darnell R.B, Ferrante R.J, Fillit H, Finkelstein R, Fisher M, Gendelman H.E, Golub R.M, Goudreau J.L, Gross R.A, Gubitz A.K, Hesterlee S.E, Howells D.W, Huguenard J, Kelner K, Koroshetz W, Krainc D, Lazic S.E, Levine M.S, Macleod M.R, McCall J.M, Moxley 3rd. R.T, Narasimhan K, Noble L.J, Perrin S, Porter J.D, Steward O, Unger E, Utz U, Silberberg S.D. A call for transparent reporting to optimize the predictive value of preclinical research. Nature. October 11, 2012;490(7419):187–191. doi: 10.1038/nature11556.

[5] Hodes R.J, Insel T.R, Landis S.C. On behalf of the NIH blueprint for neuroscience research. The NIH toolbox: setting a standard for biomedical research. Neurology. 2013;80(11 Suppl. 3):S1. doi: 10.1212/WNL.0b013e3182872e90.

[6] Begley C.G, Ellis L.M. Drug development: raise standards for preclinical cancer research. Nature. March 28, 2012;483(7391):531–533. doi: 10.1038/483531a.

[7] Cole S, Simon G.A. Chance and consensus in peer review. Science. November 20, 1981;214(4523):881–886.

[8] Snell R.R. Menage a quoi? Optimal number of peer reviewers. PLoS One. April 1, 2015;10(4):e0120838. doi: 10.1371/journal.pone.0120838.

[9] Skinner B.F. Superstition in the pigeon. J Exp Psychol. April 1948;38(2):168–172.

[10] Brown A.S, Schaefer C.A, Wyatt R.J, Begg M.D, Goetz R, Bresnahan M.A, Harkavy-Friedman J, Gorman J.M, Malaspina D, Susser E.S. Paternal age and risk of schizophrenia in adult offspring. Am J Psychiatry. September 2002;159(9):1528–1533.

Introduction

How many potential new discoveries are filed away somewhere, unpublished, unfunded, and unknown?

Part A

Designing Your Experiment

Outline

Chapter 1. Reproducibility and Robustness

Chapter 2. Choosing a Research Problem

Introduction

Chapter 3. Basics of Data and Data Distributions

Chapter 4. Experimental Design: Measures, Validity, Sampling, Bias, Randomization, Power

Introduction

Chapter 5. Experimental Design: Design Strategies and Controls

Chapter 6. Power Estimation

Introduction

Chapter 1 Reproducibility and Robustness

Abstract

In this chapter, we analyze a simple experiment reporting that college students majoring in economics are less likely to have signed an organ donor card than students majoring in social work. We consider its data, findings, and conclusion, and ask what it means for each of these aspects to be successfully replicated by others. We contrast the validity of a finding with its reproducibility, robustness, and generalizability. The state of the current crisis in reproducibility is illustrated and underscored by reviewing two large bodies of literature concerned with cultured cells and with mouse and rat behavioral assays. We argue that scientific progress moves forward much more efficiently if an article that describes an interesting finding can be replicated, and if its findings are demonstrated to be robust within the initial article itself.

Keywords

Conclusion; Effect size; Findings; Generalizability; Proxy; Replication; Reproducible; Robust; Statistically significant difference; Validity

Basic Terms and Concepts

An experiment is said to be successfully replicated when an independent investigator can repeat the experiment as closely as possible and obtain the same or similar results. Let us see why it is so surprisingly difficult to replicate a simple experiment even when no fraud or negligence is involved. Consider an (imaginary) article that reports that Stanford college students majoring in economics are less likely to have signed an organ donor card than students majoring in social work. The authors suggest that students in the caring professions may be more altruistic than those in the money professions. What does it mean to say that this article is reproducible? Following the major sections of a published study (see Box 1.1), one must separate the question into five parts:

What does it mean to replicate the data obtained by the investigators?

What does it mean to replicate the methods employed by the investigators?

What does it mean to replicate the findings?

What does it mean to say that the findings are robust or generalizable?

What does it mean to replicate the interpretation of the data, i.e., the authors' conclusion?

Replicating the Data

We will presume that the investigators took an adequately large sample of students at Stanford—that they either (1) examined all students (or a large unbiased random sample), and then restricted their analysis to economics majors versus social work majors, or (2) sampled only from these two majors. We will presume that they discerned whether the students had signed an organ donor card by asking the students to fill out a self-report questionnaire. Reproducibility of the data means that if someone took another random sample of Stanford students and examined the same majors using the same methods, the distribution of the data would be (roughly) the same, that is, there would be no statistically significant differences between the data distributions in the two data sets. In particular, the demographic and baseline characteristics of the students should not be essentially different in the two data sets—it would be troubling if the first experiment had interviewed 50% females and the replication experiment only 20% females or if the first experiment interviewed a much higher proportion of honors students among the economics majors versus the social work majors, and this proportion was reversed in the replication experiment.

Box 1.1

The Nuts and Bolts of a Scientific Report

A typical article will introduce a problem, which may have previously been tackled by the existing literature or by making a new observation. The authors may pose a hypothesis and outline an experimental plan, either designed to test the hypothesis conclusively or more often to shed more light on the problem and constrain possible explanations. After acquiring and analyzing their data, the authors present their findings or results, discuss the implications and limitations of the study, and point out directions for further research.

As we will discuss in later chapters in detail, the data associated with a study is not a single entity. The raw data acquired in a study represent the most basic, unfiltered data, consisting of images, machine outputs, tape recordings, hard copies of questionnaires, etc. This is generally transcribed to give numerical summary measurements and/or textual descriptors (e.g., marking subjects as male vs. female). Often each sample is assigned one row of a spreadsheet, and each measure or descriptor is placed in a different column. This spreadsheet is still generally referred to as raw data, even though the original images, machine reads, or questionnaire answers have been transformed, and some information has been filtered and possibly lost.

Next, the raw data undergo successive stages of data cleansing: Some experimental runs may be discarded entirely as unreliable (e.g., if the control experiments in these runs did not behave as expected). Some data points may be missing or suspicious (e.g., suggestive of typographical errors in transcribing or faulty instrumentation) or anomalous (i.e., very different from most of the other points in the study). How investigators deal with these issues is critically important and may affect their overall results and conclusions, yet different investigators may make very different choices about how to proceed. Once the data points are individually cleansed, the data are often thresholded (i.e., points whose values are very low may be excluded) and normalized (e.g., instead of considering the raw magnitude of a measurement, the data points may be ranked from highest to lowest and the ranks used instead) and possibly data points may be grouped into bins for further analysis. Again, this can be done in many different ways, and the choice of how to proceed may alter the findings and conclusions. It is important to preserve ALL the data of a study, including each stage of data transformation and cleansing, to allow others to replicate, reuse, and extend the study.

The findings of a study often take the form of comparing two or more experimental groups with regard to some measurement or parameter. Again, the findings are not a single entity! At the first level, each experimental group is associated with that measurement or parameter, which is generally summarized by the sample size, a mean (or median) value, and some indication of its variability (e.g., the standard deviation, standard error of the mean, or confidence intervals). These represent the most basic findings and should be presented in detail.

Next, two or more experimental groups are often compared by measuring the absolute difference in their means or the ratio or fold difference of the two means. (Although the difference and the ratio are closely related, they do not always convey the same information—for example, two mean values that are very small, say 0.001 vs. 0.0001, may actually be indistinguishable within the margin of experimental error, and yet their ratio is 10:1, which might appear to be a large effect.) Ideally, both the differences and the ratios should be analyzed and presented.

Finally, especially if the two experimental groups appear to be different, some statistical test(s) are performed to estimate the level of statistical significance. Often a P-value or F-score is presented. Statistical significance is indeed an important aspect, but to properly assess and interpret a study, a paper should report ALL findings—the sample size, mean values and variability of each group, and the absolute differences and fold differences of two groups. Only then should the statistical significance be presented.

This brief outline shows that the data and findings of even the simplest study are surprisingly complex and include a mix of objective measurements and subjective decisions. The current state of the art of publishing is such that rarely does an article preserve all of the elements of the data and findings transparently, which makes it difficult, if not impossible, for an outside laboratory to replicate a study exactly or to employ and reuse the data fully for their own research. It is even common in certain fields to present ONLY the P-values as if those are the primary findings, without showing the actual means or even fold differences! Clearly, as we proceed, we will be advising the reader on proper behavior, regardless of whether this represents current scientific practice!

Replicating the Methods

This refers to detailed, transparent reporting and sharing of the methods, software, reagents, equipment, and other tools used in the experiment. We will discuss reporting guidelines in Chapter 14. Here it is worth noting that many reagents used in experiments cannot be shared and utilized by others, because they were generated in limited amounts, are unstable during long-term storage, are subject to proprietary trade secrets, etc. Freezers fail and often only then does the experimenter find out that the backup systems thought to be in place (backup power, backup CO2 tanks) were not properly installed or maintained. Not uncommonly, reagents (and experimental samples) become misplaced, uncertainly labeled, or thrown out when the experimenter graduates or changes jobs.

Replicating the Findings

The stated finding is that students majoring in economics are less likely to have signed an organ donor card than students majoring in social work. That is, there is a statistically significant difference between the proportion of economics majors who have signed cards versus the proportion of social work majors who have signed cards. But note that a statement of statistical significance is actually a derived parameter (the difference between two primary effects or experimental outcomes), and it is important to state not only the

Enjoying the preview?

Page 1 of 1

Data Literacy: How to Make Your Experiments Robust and Reproducible

About this ebook

Neil Smalheiser

Related authors

Related to Data Literacy

Related ebooks

Biology For You

Related podcast episodes

Related articles

Related categories

Reviews for Data Literacy

What did you think?

Book preview

Data Literacy - Neil Smalheiser

Data Literacy

Table of Contents

Part A. Designing Your Experiment

Part B. Getting a Feel for Your Data

Part C. Statistics (Without Much Math!)

Part D. Make Your Data Go Farther

Copyright

What Is Data Literacy?

Introduction

Acknowledgments

Why This Book?

References

Introduction

Part A

Designing Your Experiment

Chapter 1

Reproducibility and Robustness

Abstract

Keywords

Basic Terms and Concepts

Replicating the Data

The Nuts and Bolts of a Scientific Report

Replicating the Methods

Replicating the Findings