Reproducibility in Biomedical Research: Epistemological and Statistical Problems and the Future
()
About this ebook
Reproducibility in Biomedical Research: Epistemological and Statistical Problems, 2nd Ed. explores the ideas and conundrums inherent in scientific research.
Reproducibility is one of the biggest challenges in biomedical research. It affects not only the ability to replicate results, but the very trust in the findings. Since published in 2019, Reproducibility of Biomedical Research: Epistemological and Statistical Problems established itself as a solid ethical reference in the area, leading to significant reflection on biomedical research. The second edition addresses new challenges to reproducibility in biosciences, namely reproducibility of machine learning Artificial Intelligence (AI), reproducibility of translation from research to medical care, and the fundamental challenges to reproducibility. All current chapters will be expanded to cover advances in the topics previously addressed.
Reproducibility in Biomedical Research: Epistemological and Statistical Problems, 2nd Ed. provides biomedical researchers with a framework to better understand the reproducibility challenges in the area. Newly introduced interactive exercises and updated case studies help students understand the fundamental concepts involved in the area.
- Includes four new chapters and updates across the book, covering recent developments of issues affecting reproducibility in biomedical research
- Covers reproducibility of results from machine learning AI algorithms
- Presents new case studies to illustrate challenges in related fields
- Includes a companion website with interactive exercises and summary tables
Erwin B. Montgomery Jr.
Dr. Montgomery has been an academic neurologist for over 40 years pursuing teaching, clinical and basic research at major academic medical centers. He has authored over 120 peer reviewed journal articles (available on PubMed) and 8 books on medicine (4 on the subject of Deep Brain Stimulation). The last two have been “Reproducibility in Biomedical Research (Academic Press, 2019) and “The Ethics of Everyday Medicine (Academic Press, 2019).
Read more from Erwin B. Montgomery Jr.
The Ethics of Everyday Medicine: Explorations of Justice Rating: 0 out of 5 stars0 ratingsReproducibility in Biomedical Research: Epistemological and Statistical Problems Rating: 0 out of 5 stars0 ratings
Related to Reproducibility in Biomedical Research
Related ebooks
Mass-Action Law Dynamics Theory and Algorithm for Translational and Precision Medicine Informatics Rating: 0 out of 5 stars0 ratingsInterpreting Biomedical Science: Experiment, Evidence, and Belief Rating: 0 out of 5 stars0 ratingsEpigenetics and Systems Biology Rating: 0 out of 5 stars0 ratingsImmunoinformatics of Cancers: Practical Machine Learning Approaches Using R Rating: 0 out of 5 stars0 ratingsCommon Errors in Statistics (and How to Avoid Them) Rating: 0 out of 5 stars0 ratingsClinical Research Computing: A Practitioner's Handbook Rating: 0 out of 5 stars0 ratingsArtificial Intelligence in Precision Health: From Concept to Applications Rating: 0 out of 5 stars0 ratingsClinical Prediction Models: A Practical Approach to Development, Validation, and Updating Rating: 0 out of 5 stars0 ratingsPrinciples of Biomedical Informatics Rating: 0 out of 5 stars0 ratingsDoing Research in Emergency and Acute Care: Making Order Out of Chaos Rating: 0 out of 5 stars0 ratingsPediatric Anxiety Disorders Rating: 2 out of 5 stars2/5Psychology Research Methods: A Writing Intensive Approach Rating: 0 out of 5 stars0 ratingsArtificial Intelligence in Healthcare Rating: 0 out of 5 stars0 ratingsPractical Biostatistics: A Friendly Step-by-Step Approach for Evidence-based Medicine Rating: 5 out of 5 stars5/5Meta Learning With Medical Imaging and Health Informatics Applications Rating: 0 out of 5 stars0 ratingsProgress in Motor Control: From Neuroscience to Patient Outcomes Rating: 0 out of 5 stars0 ratingsLogic and Critical Thinking in the Biomedical Sciences: Volume 2: Deductions Based Upon Quantitative Data Rating: 0 out of 5 stars0 ratingsMethods and Biostatistics in Oncology: Understanding Clinical Research as an Applied Tool Rating: 0 out of 5 stars0 ratingsAI in Clinical Practice: A Guide to Artificial Intelligence and Digital Medicine Rating: 0 out of 5 stars0 ratingsPragmatic Randomized Clinical Trials: Using Primary Data Collection and Electronic Health Records Rating: 0 out of 5 stars0 ratingsThe Skeffington Perspective of the Behavioral Model of Optometric Data Analysis and Vision Care Rating: 0 out of 5 stars0 ratingsHeterogeneous Contributions to Numerical Cognition: Learning and Education in Mathematical Cognition Rating: 0 out of 5 stars0 ratingsDetecting Concealed Information and Deception: Recent Developments Rating: 0 out of 5 stars0 ratingsQuantitative Analysis and Modeling of Earth and Environmental Data: Space-Time and Spacetime Data Considerations Rating: 0 out of 5 stars0 ratingsCognitive and Soft Computing Techniques for the Analysis of Healthcare Data Rating: 0 out of 5 stars0 ratingsComputational Immunology: Models and Tools Rating: 0 out of 5 stars0 ratingsTranslational Radiation Oncology Rating: 0 out of 5 stars0 ratingsArtificial Intelligence in Healthcare and COVID-19 Rating: 0 out of 5 stars0 ratingsBig Data in Healthcare: Statistical Analysis of the Electronic Health Record Rating: 0 out of 5 stars0 ratings
Biology For You
Anatomy and Physiology For Dummies Rating: 4 out of 5 stars4/5Sapiens: A Brief History of Humankind Rating: 4 out of 5 stars4/5Anatomy 101: From Muscles and Bones to Organs and Systems, Your Guide to How the Human Body Works Rating: 4 out of 5 stars4/5Dopamine Detox: Biohacking Your Way To Better Focus, Greater Happiness, and Peak Performance Rating: 3 out of 5 stars3/5Why We Sleep: Unlocking the Power of Sleep and Dreams Rating: 4 out of 5 stars4/5The Rise and Fall of the Dinosaurs: A New History of a Lost World Rating: 4 out of 5 stars4/5The Obesity Code: the bestselling guide to unlocking the secrets of weight loss Rating: 4 out of 5 stars4/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5The Grieving Brain: The Surprising Science of How We Learn from Love and Loss Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5The Seven Sins of Memory: How the Mind Forgets and Remembers Rating: 4 out of 5 stars4/5Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition) Rating: 4 out of 5 stars4/5This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking Rating: 4 out of 5 stars4/5Peptide Protocols: Volume One Rating: 4 out of 5 stars4/5Lifespan: Why We Age—and Why We Don't Have To Rating: 4 out of 5 stars4/5Mother of God: An Extraordinary Journey into the Uncharted Tributaries of the Western Amazon Rating: 4 out of 5 stars4/5The Soul of an Octopus: A Surprising Exploration into the Wonder of Consciousness Rating: 4 out of 5 stars4/5All That Remains: A Renowned Forensic Scientist on Death, Mortality, and Solving Crimes Rating: 4 out of 5 stars4/5The Winner Effect: The Neuroscience of Success and Failure Rating: 5 out of 5 stars5/5The Code Breaker: Jennifer Doudna, Gene Editing, and the Future of the Human Race Rating: 4 out of 5 stars4/5Lies My Gov't Told Me: And the Better Future Coming Rating: 4 out of 5 stars4/5Jaws: The Story of a Hidden Epidemic Rating: 4 out of 5 stars4/5The Blood of Emmett Till Rating: 4 out of 5 stars4/5Suicidal: Why We Kill Ourselves Rating: 4 out of 5 stars4/5The Coming Plague: Newly Emerging Diseases in a World Out of Balance Rating: 4 out of 5 stars4/5Vax-Unvax: Let the Science Speak Rating: 5 out of 5 stars5/5Woman: An Intimate Geography Rating: 4 out of 5 stars4/5The Sixth Extinction: An Unnatural History Rating: 4 out of 5 stars4/5
Related categories
Reviews for Reproducibility in Biomedical Research
0 ratings0 reviews
Book preview
Reproducibility in Biomedical Research - Erwin B. Montgomery Jr.
Reproducibility in Biomedical Research
Epistemological and Statistical Problems and the Future
Second Edition
Erwin B. Montgomery Jr.
Department of Medicine (Neurology), Michael G. DeGroote School of Medicine at McMaster University, Hamilton, ON, Canada
Table of Contents
Cover image
Title page
Copyright
Dedication
Quotes
Preface to the second edition
Looking just over the horizon
Machine learning artificial intelligence
Translational research
Biological realism and Chaos and Complexity
Probability and statistical epistemology
Randomness as fundamental and foundational
Finally…
Preface to the first edition
Chapter 1. Introduction
Abstract
Productive irreproducibility
The multifaceted notion of reproducibility and irreproducibility
Turning from the past with an eye to the future
The fundamental causes of unproductive irreproducibility
Proceeding from what is certain but not useful to what is uncertain but useful
Precision versus accuracy
Dynamics
Machine learning artificial intelligence and the emperor’s new clothes
Knowledge is prediction, prediction is reproducibility or productive irreproducibility
Challenges to prediction and thus biomedical research
When traditional experimental design and statistics breed unproductive irreproducibility
Data do not and cannot speak for themselves
Reductionism and the fundamental problem
Summary
Chapter 2. The problem of irreproducibility
Abstract
Getting a handle on the scope of unproductive irreproducibility
The inescapable risk of irreproducibility
Institutional responses
Who speaks for reproducibility and irreproducibility?
Fundamental limits to reproducibility as traditionally defined
Variability, central tendency, Chaos, and Complexity
Summary
Chapter 3. Validity of biomedical science, reproducibility, and irreproducibility
Abstract
Science must be doing something right and therein lies reproducibility and productive irreproducibility
Legacy of injudicious use of scientific logical fallacies
Science versus human knowledge of it
The necessity of enabling assumptions
Special cases of irreproducible reproducibility
Science as inference to the best explanation
Summary
Chapter 4. The logic of certainty versus the logic of discovery
Abstract
Certainty, reproducibility, and logic
Deductive logic—certainty and limitations
Propositional logic
Syllogistic deduction
Centrality of syllogistic deduction and the Fallacy of Four Terms in biomedical research
Judicious use of the Fallacy of Four Terms
Partial, probability, practical, and causal syllogisms
Induction
The Duhem–Quine thesis
Summary
Chapter 5. The logic of probability and statistics
Abstract
Probability has always been central, statistics only relatively recently
Precision versus accuracy, epistemology versus ontology
The purpose of the chapter
Continuing legacy of notions of probability
The value of the logical perspective in probability and statistics
Metaphysics: ontology versus epistemology and biomedical reproducibility
Probability
Statistics
Key general assumptions whose violation risks unproductive irreproducibility
Summary
Chapter 6. Causation, process metaphor, and reductionism
Abstract
Renewed need for causation
Practical syllogism and beyond
Centrality of hypothesis to experimentation and centrality of causation to hypothesis generation
Ontological sense of cause
Reductionism and the Fallacies of Composition and Division
Other fallacies as applied to cause
Discipline in the Principles of Causational and Informational Synonymy
Process metaphor
Summary
Chapter 7. Case studies in clinical biomedical research
Abstract
Forbearance of repetition
Purpose of clinical research as the standard
Clinical importance
Establishing clinical importance
Specific features to look for in case studies
Case study—two conflicting studies of hormone use in postmenopausal women, which is irreproducible?
Why the dominance of the Women’s Health Initiative Study over the Nurses’ Health Study?
Aftermath
Summary
Chapter 8. Case studies in basic biomedical research
Abstract
Forbearance of repetition
Purpose
Setting the stage
The value of a tool from its intended use
What is basic biomedical research?
Scientific importance versus statistical significance
Reproducibility and the willingness to ignore irreproducibility
Specific features to look for in case studies
Case study—pathophysiology of parkinsonism and physiology of the basal ganglia
Summary
Chapter 9. Case studies in computational biomedical research
Abstract
Theorizing versus computational modeling with simulation
Summary
Chapter 10. Case studies in translational research
Abstract
Translational research as the ultimate goal of basic and clinical research
Contemporary perspective on translational research
Summary
Chapter 11. Case studies in machine learning artificial intelligence
Abstract
The current environment of machine learning artificial intelligence
Machine learning AI in the context of biomedical research
Example of a neural network machine learning AI
The game of warmer/colder
Difference between machine learning AI and other analytic methods
Similarities between machine learning AI and other methods, such as regression analyses
Quality assurance in multivariate regression and implications for machine learning AI
Fallacy of Four Terms
What is the purpose or goal?
To whom or what is the machine learning AI algorithm to apply?
How should one select the training set?
What is the learning methodology?
The notion of error
Only as good as the gold standard
Error analyses as quality control
Case study
The psychology
of machine learning AI
Summary
Chapter 12. Chaotic and Complex systems, statistics, and far-from-equilibrium thermodynamics
Abstract
Chaos and Complexity and the game of pool
Linearization of complex nonlinear systems
The Large Number and Central Limit theorems
Incompleteness
Self-organization
Discovering Chaos and Complexity
Equilibrium and steady-state conditions
Chaos, Complexity, and the basis for statistics
Self-organization
Summary
Chapter 13. The fundamental problem
Abstract
A word at the beginning with an eye to the future
Not for the faint of heart
Implications of the epistemic choice of variety as variation
The necessary transcendental nature of fundamentals
The argument for nonidentity
Simplification as means to willfully ignore or hide
The importance of differences
Physics of the fundamental ontological problem
How to know when no two experiences are exactly alike?
Summary
Chapter 14. Epilogue
Abstract
Implementation science
De-implementation
Metacognition and metaphysics
Reclaiming philosophy
Scientism
Some suggestions—Good Manufacturing Practices
Ethical obligations as applied to biomedical research
Summary
Appendix A. An introduction to the logic of logic
Misperception of what is logic
Logic is a discipline used to help understand reality
Proceeding from what is most certain
Proceeding to what is not certain but useful and dangerous
Extension to syllogistic deduction
Where it gets more uncertain, from state-of-being linking verbs to causation
Appendix B. Introduction to the logic of probability and statistics
Purpose
The notion of Introduction
Probability by enumeration of past experiences
Combinatorics to avoid uniform probability distributions and assure utility
The arithmetic mean as probability calculus
Defined statistical distributions and their models
Accuracy, precision, population, mean, and variance
The shaky ground upon which traditional statistics rests
Measures of randomness as alternatives
Experimentation is a technical matter; science is altogether different and much more difficult
Appendix C. Moving away from sample-based analyses for translational research
Importance of normal distributions in the data
Mill’s Method of Differences
Possible only when enough is ignored by what appears to be reasonable presuppositions
Preserving information of the particular individual subject
Multidimensional Shannon’s entropy
Glossary
Bibliography
Index
Copyright
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Publisher’s note: Elsevier takes a neutral position with respect to territorial disputes or jurisdictional claims in its published content, including in maps and institutional affiliations.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-443-13829-4
For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Mica Haley
Acquisitions Editor: Andre Wolff
Editorial Project Manager: Timothy Bennett
Production Project Manager: Neena S. Maheen
Cover Designer: Miles Hitchen
Typeset by MPS Limited, Chennai, India
Dedication
First edition
To Lyn Turkstra for everything…
And to the Saints Thomas,
Hobbes and Kuhn, our scientific consciences
Second edition
To Lyn, for whom my appreciation continues…
Quotes
An experiment is never a failure solely because it fails to achieve predicted results. An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don’t prove anything one way or another.
—Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance: An Inquiry into Values (1974)
This day may possibly be my last: but the laws of probability, so true in general, so fallacious in particular, still allow about fifteen years.
—Edward Gibbon, The Autobiography and Correspondence of Edward Gibbon the Historian (1869)
Preface to the second edition
Erwin B. Montgomery, Jr.
Looking just over the horizon
Concern regarding reproducibility and irreproducibility in biomedical research continues to increase (Fig. 1). But there have been some striking additions to the range of concerns. Indeed, many of these challenges of reproducibility in biomedical research are now clearer, but new and novel challenges are on the horizon since the publication of the first edition. While past concerns still prevail such that the efforts of the first edition are still relevant, these new challenges offer a unique opportunity to be proactive in the second edition. These challenges include machine learning artificial intelligence (AI), translational research from a new perspective, and opportunities for increasingly realistic approaches to biological phenomena such that traditional approaches are no longer acceptable.
Figure 1 Number of citations per year in PubMed using keyword irreproducibility or reproducibility
in title accessed June 19, 2023.
In some ways, this second edition is like the two-faced god Janus (Fig. 2). In this case, one face is to the past and was the primary effort of the first edition. Looking again at the past remains critical as there are important lessons to be learned and, at least as importantly, past lessons to be unlearned. That effort continues in the second edition. Also, this second edition is an opportunity to look to the future through the second face.
Figure 2 Statue representing Janus Bifrons in the Vatican Museums (https://commons.wikimedia.org/wiki/File:Double_herm_Chiaramonti_Inv1395.jpg#/media/File:Double_herm_Chiaramonti_Inv1395.jpg, accessed June 19, 2023).
But note, the two faces belong to the same head and presumably the same brain and mind. What drove the past perspectives, attitudes, and approaches likewise will drive the future. This is true simply because both what was done in the past and what will be done in the future are responding to the same fundamental problem of all knowledge as discussed in Chapter 13. The fundamental conundrum is that fact that every sense experience, note perception is different. Thus, each sense experience is unique, de novo, and therefore, in the sense experience in itself, uninformative of each other. Then how to predict future experiences and to understand past experiences? One option is to hold that the effective infinite variety of sense experiences is diversity with each unique, de novo, and uninformative of the other. To be sure, taking this position puts one at risk for the Solipsism of the Present Moment.
The alternative is that variety is a variation over an economical set of fundamentals such that each experience is some combination. Foundationally, this is the choice made by biomedical scientists. But that choice is problematic and the consequent conundrums drive experimental design and analysis. In many ways, the problematic choice of viewing variety as a variation over an economical set of transcendental fundamentals places biomedical research at risk for unproductive irreproducibility. One need look no further than the arithmetic mean as the Central Tendency in a research study. In one way, the arithmetic mean must bear some relation to each and every observation from which the arithmetic mean was extracted, yet the arithmetic mean cannot be the same as each of the observations. One seems forced to conclude that the observations are real
as that is what is experienced, but then what of the arithmetic mean? In a critical way, the arithmetic mean becomes the really real.
For example, if a doctor were to ask a scientist what is the effect of agent Equation on their patients, the scientist likely would point to the arithmetic mean of the sample in the experiment, not to any particular observation from which the arithmetic mean was generated. How can epistemic realist scientists avoid thinking the real
is real and the really real
is a useful artifact, that the really real
is a fudge factor? Biomedical researchers, and scientists in general, have been very adept at correcting the real
to finding the really real
or at creating fudge factors, depending on one’s perspective. But each maneuver is an opening to the risk of unproductive irreproducibility.
Machine learning artificial intelligence
Machine learning AI has generated great excitement, only perhaps exceeded by fearful concern, and both the excitement and the fear in relation to its potential. Interestingly, much of the concern is mostly driven by the potential for consequential adverse effects. As discussed in detail in Chapter 11, the issue is not whether machine learning AI will produce answers but how will humans know if the answers will be reproducible in relevant and important ways? In computer programming, runtime errors occur when the computer program comports with the syntax of the computer program and produces an answer. Yet, the answer is wrong, perhaps to be discovered at the time or in retrospect. Yet, for a runtime error to be detected, there must be some expectation of what the answer should look like.
What it should look like
is not derived from the experiment alone and thus the risk is that machine learning AI cannot self-correct. Human responsibility for correcting machine learning AI is inescapable even if ignored. But humans must first know what the answer should look like
prior to the computer giving its answer.
But how does one know what the answer should look like? In comparison, knowing what the answer should look like
in traditional experimental design and statistical analysis is relatively straightforward. In fact, in many ways the structure of the experiment and analyses predetermine what the answer will be. This is an example of the logical fallacy Petitio Principi, or begging the question. Indeed, what the answer should look like
is consequent to the process rather than any definite empirical finding. This is an example of the Process metaphor. In machine learning AI, the empirical outcome cannot not be predicted and the process is unknowable. Thus, machine learning AI is a mystery that conveys an almost mystic quality to machine learning AI. In important ways, the machine learning AI revolution will force a great reconsideration of what counts as reproducibility. A close examination of the epistemology of machine learning AI will be illuminative itself, but the contrast with traditional approaches likely will require a reconsideration of what scientists think they know about traditional experimental design and analysis. First, it is necessary to demystify
machine learning AI.
Translational research
Evidence-Based Medicine, in the particular definition where it is derivative from randomized control trials exclusively (Djulbegovic and Guyatt, 2017), has gained considerable force. Indeed, Evidence-Based Medicine often becomes the sole criterium for generating guidelines to medical practice. Thus, the issues regarding Evidence-Based Medicine are relevant to clinical research, as a critique of Evidence-Based Medicine is also a critique of clinical research. However, the implications extend to basic research. Despite widespread adoption of Evidence-Based Medicine by medical academics, Evidence-Based Medicine continues to encounter resistance (Broom et al., 2009; Goldman and Shih, 2011; Pope, 2003). Evidence of concern about resistance to or failure to adopt Evidence-Based Medicine is seen in the special and dedicated efforts to establish Evidence-Based Medicine in routine medical care (Dopson et al., 2003). Various divisions of the National Institutes of Health have developed programs in Implementation science such as the National Cancer Institute, which defines Implementation science as Implementation science (IS) is the study of methods to promote the adoption and integration of evidence-based practices, interventions, and policies into routine health care and public health settings to improve our impact on population health. This discipline is characterized by a variety of research designs and methodological approaches, partnerships with key stakeholder groups (e.g., patients, providers, organizations, systems, and/or communities), and the development and testing of ways to effectively and efficiently integrate evidence-based practices, interventions, and policies into routine health settings
(https://cancercontrol.cancer.gov/is/about).
To be sure, the various factors relating to resistance to Evidence-Based Medicine are diverse and range from political, sociological, or psychological, among others. But there also is an epistemic skepticism. As Evidence-Based Medicine has become synonymous with clinical research, particularly in the form of randomized control trials, the epistemic resistance raises the possibility that clinical research is epistemically unfit for the purposes of medicine. Yet, this would be quite shocking as the National Institutes of Health, perhaps the greatest source of support and leadership in clinical research, holds its mission to "… seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability [italics added]" (https://www.nih.gov/about-nih/what-we-do/mission-goals#:~:text=NIH’s%20mission%20is%20to%20seek,and%20reduce%20illness%20and%20disability).
The question demanded by the epistemic skeptic is what is it about clinical research that has been so powerful in amassing biomedical knowledge that proves to be its undoing in the application to prospective particular individual subjects, such as patients? The fact of the matter is that sample-centric studies and analyses can only make predictions to future samples, not individual subjects, which is a central theme in this second edition. It is the use of samples that allows some resolution of the conundrums of the epistemic choice to view the effectively infinite variety as variations over an economical set of transcendental fundamentals. As will be explained in greater detail in this second edition, the sample allows construction of a standard normal distribution to represent biological variability and noise. The arithmetic mean, as the Central Tendency, of the standard normal distribution equals 0 and thus the effect of biological variability and noise zeros out
in the sample. However, the effects of whatever mechanisms underlying biological variability and noise do not zero out
in the case of the individual observation. Alternatives to sample-based analyses will be discussed, for example, in Appendix C.
Biological realism and Chaos and Complexity
As will be discussed in this second edition, so many of the enabling assumptions that allowed basic and clinical research to proceed by traditional methods are unrealistic. To be sure, methods have utility, that is, they work until they don’t. The result is a risk for unproductive irreproducibility. Many enabling assumptions and presumptions, such as simplifications, are instrumental. However, the remarkable advance in scientific technology has greatly expanded the range, resolution, diversity, and consilience of quantification of biological phenomenon. In some ways, the advancing technology has taken away the excuse
for continued simplification.
Intellectual or conceptual tools likewise have contributed to methodologies that provide an answer,
even if the answer is a runtime error. These include composite measures for individual observations and composites reflected in the group or sample. For example, in studying an experimental agent for stroke risk reduction, is it realistic to just lump all the confounds together in the experimental and control groups and ignore the variations in the incidence of each confound and to ignore potential interaction effects? To be sure, doing so has utility in getting an answer but the answer can only be unrealistic. Perhaps this was a necessary trade-off in the past. It is not clear that such trade-offs are still necessary. At the very least, the question should be asked.
Many of the enabling assumptions and presumptions presume linearity and independence among the variables. For example, the conundrum of diabetes mellitus in a study of stroke risk is handled as though it was independent of the conundrum of hypertension. Rather, the incidences are simply added, presuming the Principle of Superposition, and as long as the sums of confounds are the same in the experimental and control groups, the study is held valid.
It is becoming increasingly clear that the presumptions of linearity and superposition are false. Biological systems are made up of an enormous number of entities that are interrelated in complex and nonlinear manners. While the relevance of highly complex nonlinear systems displaying Chaos and Complexity seems doubtful to many who demand proof that the biological mechanisms under their examination are Chaotic and Complex, how can any realistic understanding of biological phenomenon not be Chaotic and Complex? Thus, a great many assumptions in traditional experimental design and analysis, particularly those that presuppose linearity and superposition, are not only unrealistic, they are misleading, leading to an increased risk of unproductive irreproducibility. This is discussed in greater detail in Chapter 12.
Probability and statistical epistemology
The first edition of this book focused on logic as the injudicious use of necessary logical fallacies is an important factor in the crisis of reproducibility in biomedical research. The actual full application of deductive logic, including the judicious use of logical fallacies, requires the deductive propositions and syllogisms to be translated into probability calculus. Thus, the many misadventures in the use of probability calculus are derivative from the corresponding logical fallacies. Further, just as deductive logic can build complex logical arguments, called theorems, probability calculus allows for complex theorems
based on the fundamental axioms and rules of inference in the probability calculus. Bayes’ theorem is critical to biomedical research and is an example. One application of Bayes’ theorem relates the probability of a hypothesis, such that biochemical assay Equation indicates biological process Equation , by relating the biochemical assay to the truth and falsehood of process Equation through the specificity and sensitivity of the assay and the prior probability of the biological process. No biological experimentation would be possible without Bayes’ theorem. Unfortunately, the potential of Bayes’ theorem for reducing the risk of unproductive irreproducibility has not been sufficiently leveraged, as will be discussed.
Newly expanded in this second edition is a similar analysis of statistics. To be sure, statistics was described as the study of the probabilities of a probability in the first edition. But there is more and what is more is a major focus of this second edition. Traditional statistics presume the following logical statement if population implies sample is true and sample is true, then population is true. What is held true of the population is that which is held true of the sample. But this is an example of the Fallacy of Confirming the Consequence and is illustrated by the Venn diagram in Fig. 3. The set sample is considered a subset of the set population and thus whatever is said about the population is true of the sample. But what is true of the sample may not be true of the population. In other words, there are members of the set population that are not the same or equivalent to the members of the set sample.
Figure 3 Venn diagram representation of the logical statement if population implies sample is true and sample is true, then population is true. As can be appreciated, all the members of the set sample are members of the set population and thus what can be said of the population can be said of the sample. But note, there are members in the set population that are not in the set sample so what can be said about the set sample cannot be said of all the members of the set population. Thus, the logical statement is invalid and is an example of the Fallacy of Confirming the Consequence.
The logical statement if population implies sample is true and sample is true then population is true is rather useless except in the rarest of cases. In the vast majority of biomedical research, it is not possible to know the population and thus it is impossible to assess the truth or falsehood of the theorem if population implies sample is true and sample is true then population is true. To proceed as though the theorem is valid is to place the experiment in risk of unproductive irreproducibility. What is the biomedical researcher to do?
Traditional statistics then uses another theorem, population implies sample1 and population implies sample2, therefore sample1 implies sample2 and sample2 implies sample1. Note, this theorem is the basis for reproducibility in biomedical research. But this theorem is the Fallacy of Pseudotransitivity as shown in Fig. 4. Sample1 is used in the first experiment and sample2 is used in an experiment to assess the reproducibility of the first experiment. If sample2 does not imply sample1, then what can be said about sample1 from the second experiment cannot be said of sample1 in the first experiment. The same can be said if sample1 does not imply sample2, thus making both experiments irreproducible.
Figure 4 Venn diagram representation of the logical statement if population implies sample1 is true and population implies sample2 is true, then sample1 implies sample2 and sample2 implies sample1 is true. As can be appreciated, all members of set sample1 are members of the set population and thus what can be said of the population can be said of the sample1. But note, there are members in the set population that are not in set sample1 so what can be said about set sample1 cannot be said of all the members of the set population. The same holds for sample2. Thus, there may be members of sample1 that are not members of set sample2 and likewise there may be members of sample2 that are not members of set sample1. Consequently, the conclusion that sample1 implies sample2 and sample2 implies sample1 is true is invalid. This is an example of the Fallacy of Pseudotransitivity.
One response it to propose a tautology of the form population is sample1 and population is sample2, therefore sample1 is sample2. This is certain by the Principle of Transitivity. For this to be the case, sample1 would have to be exhaustive of the entire population and thus sample1 is the population and therefore equal to the population. Similarly, sample2 would have to be exhaustive of the population and therefore equal to the population. Yet, if the population is unknowable, then the theorem fails.
For the population to be known exactly, the population would have to be finite. However, the Fallacy of Induction argues that it is impossible, in principle, to know whether the population is finite. This generates the need for the Large Number theorem. First, the theorem holds that the population is adequately represented by the Central Tendency, such as the arithmetic mean. Yet, there is no empirical proof and thus the presumption is a metaphysical faith even as it has utility. Second, the Large Number theorem holds that the arithmetic mean will become constant or stable as long as the sample size is sufficiently large enough. Indeed, the stability of the arithmetic mean as sufficient sample sizes is taken as justification even if it is without empirical foundation. The justification is in the form of a Process metaphor. But the notion of sufficiently large is a relative term, as the numerator of the ratio is Equation and where the denominator is the size of the population, but the latter is unknown.
The response is to use Limit theory and hold that in the limit that the sample size increases, Equation , toward infinity. Thus, accounting for any population unknown size, the mean of the sample, Equation , becomes a constant equal to the mean of the population, Equation . The Large Number theorem can be expressed as Equation . But how can one be certain of the limit if Equation is unknowable? In actual practice, the following limit is used, Equation , and importantly, Equation is just taken as Equation as Equation is unknowable. But how is this possible as no sample has an infinite size? A response is that the sample size, Equation , gets close enough, whatever the latter means. The important point is that close enough
provides ample opportunity for the risk of unproductive irreproducibility.
But note, these theorems relate to the Central Tendency, such as the arithmetic mean, not actual observations. The question demanded is what justifies taking the Central Tendency, such as the arithmetic mean, as the truer representation of the phenomenon than the actual observations, the real? Clearly, the arithmetic mean is transcendental as it is presumed to be informative of all the actual observations, yet not identical to each and every observation. The latter is not possible according to the fundamental problem of ontology as discussed in Chapter 13.
One response to the question of justification is that the actual observations are contaminated
by biological variability, noise, and confounds. The Central Tendency is the true or really real representation of the phenomenon and the claim is made clear by statistical methods to get rid of biological variability, noise, and confounds. The biological variability and noise are disposed of because it is presumed, that is taken as faith, that the populations of both biological variability and noise follow standard normal distributions where the mean effects of the biological variation and noise are 0. Thus, in the mean of the actual observations, the contributions made by biological variability and noise have Equation effects. In other words, they cancel out. Further, experiments are constructed such that the confounds are counterbalanced between the experimental and the control groups such that the mean effect of the confounds becomes Equation . Yet, these approaches are based on presumptions and assumptions that are of questionable justifiable foundations and, consequently, their use places the research at risk for unproductive irreproducibility. These issues are discussed in greater detail throughout this second edition.
Another response to the question of why the Central Tendency, such as the arithmetic mean, is a truer representation of the ontology or reality of the phenomenon is that actual observations sufficiently close (close enough) to the Central Tendency are far more likely than the other observations. It is almost like reality or truth is a majority vote. Just how close is close enough is given by confidence intervals. But this is a common misconception in that the confidence intervals only provide information as to the precision of calculating the Central Tendency over repeated samples. It is not about the proximity of the actual observations to the Central Tendency.
These conundrums, resulting from asking inconvenient questions, do not disappear if the questions are not asked. It is said that a good carpenter knows their tools. Note, this is not to say that the carpenter’s tools are without flaws, deficits, and limitation. Rather, a good carpenter recognizes the flaws, deficits, and limitations in order to mitigate their effects. The good carpenter does not just say that their tools are good enough or close enough.
Consider a research study where agent Equation is applied to a sample and an outcome measure is observed for each of the 30 subjects. The series of 30 outcome measures from a research study are as follows: 1.213065843, -1.408232038, -0.781683411, -0.719130639, 0.420620836, -1.308030733, 0.432603429, 0.90035428, 1.082294148, 0.013502586, 2.24195901, 0.089507921, 0.442114469, -1.093630999, 2.409105946, 0.151316044, -0.598211045, 1.40125394, 2.060951374, -0.734472678, -0.189817229, 0.714876478, -0.763639036, -0.713001782, -1.565367711, -0.416446255, 0.65168706, 1.221249022, 0.268369149, and 0.81556891. In this hypothetical case, the 30 measures were drawn by a random number generator using a standard normal distribution with an arithmetic mean of 0 and a standard deviation of 1 and are shown in Fig. 5. Typically, the next step is to determine the descriptive statistics of this sample as shown in Table 1.
Figure 5 Equal interval histogram of the distribution of the 30 observations. Note, the 95% confidence interval, 95% CI, does not cover 95% of the observations. Rather, it predicts the range that would be occupied by 95% of the arithmetic means on repeated sampling.
Table 1
What can be said about the results? A typical response is that agent Equation produced an arithmetic mean outcome of 0.207957896. But note, none of the observations were equal to 0.207957896. If the observations were real, that is what actually was observed, then what is one to believe about the arithmetic mean of 0.207957896? Now, most scientists would loath to say that the arithmetic mean is nonsensical, artificial, or not meaningful. Indeed, the history of statistics has been that the Central Tendency, in this case the arithmetic mean, is real, perhaps more real than the actual observations. In other words, the arithmetic mean becomes the really real.
But how is the scientist to have confidence that the arithmetic mean of 0.207957896 is really real? The key is reproducibility. The experiment is repeated on a second sample. In this case, the arithmetic mean of sample2 is not likely to be exactly 0.207957896, the arithmetic mean of sample1. Left at a single repeat, it would be hard to have confidence that either mean is really real. But what if the experiment was repeated 100 times? The distribution of each sample mean would likely be within a specified range 95% of the time. That range provides the 95% confidence interval. The specific percent value for the confidence interval is user defined,
it is not a direct consequence of the data. In other words, the data are not screaming out use 95%!
The question arises, is it feasible or a wise use of resources to repeat the exact same experiment 100 times in order to directly determine the range that will contain 95% of the sample means? One might remark that it is not necessary to repeat the experiment 100 times. One can calculate the 95% confidence interval just from the sample mean, Equation , standard deviation, Equation , and sample size, Equation , after choosing a percentage that translates to a critical value, Equation , for the confidence interval according to the equation Equation .
Then the question is asked, where did Equation come from? It does not appear to be in the actual data or derivable from the actual data. It turns out Equation is the critical value from an appropriated idealized statistical distribution that is a variation on the standard normal distribution determined by the degrees of freedom, such as Equation . As an aside, there is the Pearson distribution, which is a family of distributions obtained by a modification of descriptive parameters, particularly skewness and kurtosis, that include the continuous distributions of concern here, such as normal, standard normal, Equation -, Equation -, or χ² distributions, among others. Note, increasing the Equation -distribution and binomial distributions approximates a standard normal distribution, as the sample size increases, suggesting a deeper relation among the statistical distributions. But the question is demanded what justifies assuming something like Equation -distribution or normal distribution? Note, it cannot just be assumed from the actual distribution of the observations, then it follows a normal distribution. For example, the distribution in Fig. 5 would take a lot of eye squinting
in order to say yes, this is normally distributed.
An answer comes from history and from utility. It turns out that a lot of measurements of planetary motion, for example, follow a normal distribution, suggesting that normal distribution is a natural
phenomenon representing reality and epistemological tools. But how does the fact that measurements that cluster round a central value, subsequently taken as the Central Tendency, such as the arithmetic mean, justify taking that Central Tendency as any truer than any of the other measurements? How does one know if the clustering around the Central Tendency is an artifact of the methods of measurement? The fact that one can ask such questions and the fact that answers are not given
by the actual experiences place the science at an epistemic risk, which creates risks for unproductive irreproducibility. This second edition attempts to help resolve or at least shed some light on the conundrum. The hope is that in doing so, biomedical researchers will be able to continue the important work of knowledge building more productively.
Also, as will be discussed in Chapter 5 and Appendix B, the statistical distributions turn out to be quite useful in getting rid of things like confounds, biological variability, and noise. An example is presuming or constructing biological variability, noise, and confounds to be a standard normal distribution. In a sense, resorting to a normal distribution, standard or incomplete, such as a t-distribution with relatively few degrees of freedom, is like a universal statistical fudge factor. But how is that acceptable? If one commits a crime that has utility in gaining ill-gotten goods, and one does not get caught, is that acceptable? There is a high probability that the thief will be caught someday and stand trial for unproductive irreproducibility.
Resorting to the normal distribution, standard or incomplete, can be seen as a fudge factor as the factor is extraneous to and not justified by any of the actual observations. The fact that it works, meaning it makes humans more comfortable, is hardly reassuring, either epistemologically or ontologically. But sometimes a fudge factor may be a lucky guess or at least inspire further insights. Consider Albert Einstein’s Cosmological Constant that Einstein introduced into his theory of general relativity in order to prevent his model of the universe from continually expanding (O’Raifeartaigh, 2017). The presumption was that the universe was static. It was not until later when Edwin Hubble demonstrated that the universe is expanding, and strikingly, at an accelerating pace, that the Cosmological Constant was called into question (although to be fair, it was suspect even by Einstein). Einstein was reported to have said that the Cosmological Constant was his biggest blunder. But note, one wonders whether this was the scientific analog of a deathbed
confusion once Hubble empirically proved Einstein was wrong. Nonetheless, the jarring contrast between Einstein’s intellect and Hubble’s empirical observations seemed to spur further investigations into what may be Dark Energy and quantum vacuum energy. Maybe the same applies to critiques of traditional experimental design and statistical analyses that will spur future and more realistic approaches.
Appealing to normal distribution, or any other family members of the Pearson distribution, seems to be the modus operandi of traditional statistical analyses. If the observations are not normally distributed, then normalize the observations by some transformation, although doing so further distances the statistical results from reality. Reducing the risk of stroke would seem more straightforward and informing than reducing the log stroke risk reduction. If the actual observations are not normal and there does not appear any means to normalize the distribution, such as when the distribution is uniform, then just study a derivative statistic, such as the arithmetic mean, standard deviation, Equation -statistic, Equation -statistic, or Equation -statistic, among others. One can always appeal to the Central Limit Theorem.
These issues are examined throughout this second edition. One challenge that is still much at the thinking-out-loud
stage examines the emergence of normal distribution in terms of complex nonlinear systems that display Chaos and Complexity in the context of Information-theoretic Incompleteness, of which the Heisenberg Uncertainty Principle, the Halting Problem in computer science and Gödel’s Number-theoretic Incompleteness theorem are examples. The concept of Brownian motion serves as a bridge between the statistical mechanics, Chaos and Complexity, and Incompleteness. Note, the claim to thinking out loud
is not the same or anywhere near saying this is the case.
However, if one cannot find the thinking out loud
productively irreproducible as in the case of modus tollens or demonstrating Reductio ad Absurdum argumentation, perhaps thinking out loud
should continue considering what is at stake—continued unproductive irreproducibility in biomedical research.
Randomness as fundamental and foundational
There is another way to think about statistics, where statistical significance comes first from demonstrating that the phenomenon is not random. In other words, the demonstration of nonrandomness demonstrates that structure and Information exist simultaneously in the phenomenon. Methods and approaches can be borrowed from statistical mechanics, particularly ways to study randomness and departures from randomness.
The physical notion of randomness is inherent in the notion of entropy such as in thermodynamics and methods for quantitating entropy, such as Boltzmann−Gibbs−Shannon entropy, that can be compared among various experimental conditions. One version of Information theory is Shannon’s entropy, Equation , given by Eq. (1),
Equation (1)
where there is an array of data points, such as a vector, Equation , in a series of bits of information, Equation , and each bit of information has Equation states, corresponding to Equation . The value of Equation is the probability of that unique state among those set of bits that constitute the observations. Consider a string of bits, such as 10011100101001, where 14 bits are represented by two states, Equation or Equation . In this case, there are seven 1’s and seven 0’s and thus the probability of each state value is Equation . Shannon’s entropy, Equation , is 1 when randomness is complete and is the case when the probabilities of each state are equal, as in the example. Potential uses of Shannon’s entropy are discussed in Appendix C.
Any statistical distribution that is not uniform will not be random but will have structure and thus Information. The normal distribution of some metric, for example, of the population, is not random, as the probability of all possible states is not equal. Thus, selecting a sample from the nonrandom population presents challenges. The goal of biomedical experimentation is to have a sample that is representative of the population, which can best be obtained by selecting from the population randomly. But note, the selection is not being made from a random distribution. Only the process of selecting data points from the population is held random. This is the notion of stochasticity. But what if there is no such thing as a random selection process and therefore no stochasticity? This issue is addressed in Appendix C.
There is another possible notion of stochasticity that should not be conflated with randomness but rather reflects unpredictability. This notion sounds counterintuitive. What is unpredictable but is not random? One answer is systems that are Chaotic and Complex and that self-organize. The self-organization assures that the system is not random. The system will have structure, but the structure is not predictable, certainly by the methods of traditional experimental design and analysis. For example, the structure of a snowflake is not random as the formation of a snowflake follows deterministic physics. The physical laws do not contain random variables whose values are held to be determined stochastically in the sense that the value is selected randomly from a nonrandom distribution. Yet, the exact structure or nonrandomness cannot be predicted other than there will be six points in each snowflake.
As can be appreciated, Chaos and Complexity present extraordinary challenges to traditional experimental designs and analyses in biomedical research. Yet, these challenges cannot be ignored as the principles underlying Chaos and Complexity are far more biologically realistic compared to those presumptions and assumptions underlying traditional experimental design and analysis. As fearful as stepping away from comfortable and useful modes of thinking in the past is, it is hard to see that it is avoidable in the future for the true scientist.
Finally…
Let whatever modicum of success I have achieved in the works I have written be a hope, perhaps faint, for those who struggle with the written word as I have. As Albert Einstein was reported to have said, I very rarely think in words at all. A thought comes, and I may try to express in words afterwards.
I doubt there are words that can adequately see
in Equation -dimensional space or see a chiliagon (a regular polygon with 1000 sides). It is unlikely any artist can draw a chiliagon in spatial dimensions resolvable to the human eye, but there it is in my mind’s eye. Indeed, calculus allowed me to see
an infinitely many-sided polygon just prior to and at the moment of becoming a circle. Logic allows me to see the common denominator for all knowledge.
It should not be hard to imagine, I do not need to plead my case, the hardship a child and then adult faces when their ability to write fails the promise of their intellect. Expectations are low and, correspondingly, so is respect. Beautiful and wondrous ideas in one’s mind are lost in translation to the written word, yet are discoverable with effort, patience, and willingness to discuss. Perhaps the many readers not willing to work past the words to see the beauty and wonder are not to be blamed, but just think of what may have been lost.
I was fortunate to find refuge in science, mathematics, and logic and then find a profession that demanded relatively little of my otherwise poor writing that was beyond rehabilitation. For me, science, mathematics, and logic greatly expanded what I was able to see in my mind’s eye. It was some solace that I finally was found to have dyslexia and a developmental language disorder, although many years after the fact. At least in my mind, the stigma was reduced, even though working with others was no easier. Strikingly, when applying for grants, I once bucked up the courage to admit my language problems. The response was to give me a few more weeks to submit the written grant application, perhaps with the hope that, in the interim, my dyslexia and developmental language disorder would heal spontaneously. I did not ask for any allowances since.
Late in my career, I gained the means to get help on my terms. I was able to work with a copy editor who translated my dyslexia efforts into relatively understandable English prose. Melissa Revell has helped me over many years, including this effort, for which I am ever grateful. The great dread of red ink
has been relieved by her kindness, generosity, and now friendship.
I thank Andre Wolf, of Academic Press, as it was his suggestion and encouragement to undertake this second edition. I am not sure that Andre appreciated, prior to his suggestion, the very different and challenging direction this second edition would take, but his encouragement not only persisted but expanded.
Finally, I would like to recognize and acknowledge that I live and work on the traditional territories of the Mississauga and Haudenosaunee nations for which I am truly grateful. As a new Canadian citizen, I acknowledge my obligations to the aboriginal and treaty rights of First Nations, Inuit and Métis peoples affirmed by section 35 of the Constitution Act, 1982, and I advocate for the Truth and Reconciliation Calls to Action. As I try to be a good physician, teacher, and citizen, I could do no less.
Preface to the first edition
There is a problem in biomedical research. Whether it is a crisis and whether it is a new problem or an old endemic problem newly recognized are open questions. Whether it represents an ominous turn of events or merely a hiccup in the self-correcting process of biomedical research also is an open question. At the very minimum, it may be just a crisis of confidence. But it seems to have captured the imagination and concern of journal editors and administrators of research-granting institutions and, consequently, should be of concern to everyone involved in biomedical research.
The problem is the failure of reproducibility of biomedical research. Indeed, as will be discussed in this book, there is an appreciation of local or within-experiment reproducibility in that virtually every experiment involves more than one observation or trial. Just as virtually every experiment, at least implicitly, appreciates the importance of replication within an experiment, the same issues and concerns apply to the larger issues of reproducibility across different experiments by different researchers.
Even when the failure of reproducibility is defined narrowly, such as a failure to achieve the same results when other researchers independently replicate the experiment—the narrow sense of irreproducibility—there are concerns. (Note that narrow reproducibility is different from local reproducibility, described previously, and broad reproducibility, described later.) This narrow sense often focuses on issues of fraud, transparency, reagents, materials, methods, and statistical analyses for which better policing
would solve the problem. Perhaps these may be the major factors, but they may not be the only factors. None of this is to deny the importance of fraud, transparency, reagents, materials, methods, and statistical analyses, but at the same time it is possible that some causes of irreproducibility may be in the logic inherent in the research studies. Indeed, numerous examples will be presented and, thus, there is an obligation to carefully consider the logical basis of any experiment. Also, absent from the discussion is the fact that irreproducibility in a productive sense is fundamental to scientific progress. Such consideration is the central theme of this book.
Generally, results of studies are dichotomized into positive
and negative
studies. Irreproducibility affects positive and negative studies differently. Most of the debate centers on positive claims later demonstrated to be false, examples of a type I error—claiming as true what is false. In statistics, a type I error is occasioned by rejecting the null hypothesis inappropriately, which posits no difference in some measured phenomenon between the experimental and the control samples. Understandably, type I errors shake confidence and risk misdirecting subsequent research.
Perhaps even more of a problem are type II errors, where negative studies claiming, falsely, that there is no change in a phenomenon as a result of an experimental manipulation or no differences between phenomena just because the null hypothesis could not be rejected. It is just possible that the experiment, by design or statistical circumstance, would have only a low probability of being able to reject the null hypothesis. A better term for these studies are null studies,
as using the term negative
implies some inference with a degree of confidence even if undefined. The null study contrasts to situations of a negative study where the null hypothesis is rejected in a setting of sufficient statistical power and thus a positive claim of no difference can be made with confidence (equivalence, noninferiority, or nonsuperiority studies). The latter is termed a negative study and provides confidence just as the modus tollens form of the Scientific Method provides certainty, as will be discussed in the text. The extent of the problem of null studies is magnified by the bias toward not publishing null studies and, when published, are typically confused as negative studies. Type II errors in null studies may result in lost opportunities by discouraging further investigations.
Replicability, the narrow sense of reproducibility, is of primary importance as it is the first requirement. The author agrees with those who are concerned about irreproducibility in the narrow sense and indeed supports proposals for the policing of fraud; transparency; reporting of reagents, materials, and methods; and robust statistical analyses—but it does not and should not stop there. There is the temptation to dismiss concerns about irreproducibility in the narrow sense, believing it acceptably low and little more than the cost of doing scientific business. Perhaps so, if these irreproducible studies were merely outliers or flukes and that the vetting process for awarding grants and accepting papers for publication is otherwise effective. However, one must remind oneself that many of the papers describing research subsequently found to be irreproducible were vetted by experts in the field. What does this say about such expertise or the process?
There may be another view, which is the broader sense of irreproducibility, that does not focus on the exact replication of a specific experiment, but rather the failure to generalize or translate. These may be called conceptually irreproducible. Of particular concern is the failure to generalize or translate from nonhuman studies to the human condition; after all, is that not the raison d'être of the National Institutes of Health? But even further, is it not a founding pillar of modern science’s Reductionism where there is faith in the ability to generalize and translate from the reduced and simplified? When considered in the broader sense, there is far more evidence for and concern about these forms of irreproducibility. One need to look no further than the postmarket withdrawal of drugs, biologics, and devices by organizations such as the US Federal Food and Drug Administration (FDA), whose preapproval positive clinical trials were supposedly vetted by experts and found valid. Only later were the efficacy and safety conclusions found irreproducible. This is not without consequences for patients.
The lives of biomedical researchers would be made much easier if irreproducibility was merely the result of fraud; improper use of statistics; lack of transparency; or failure to report reagents, materials, and methods. But what if there are other causes of irreproducibility fundamental to the paradigms of biomedical research, such as the Scientific Method (hypothesis-driven research) and statistics? As will be argued in this text, much of scientific progress requires the use of logical fallacies in order to gain new knowledge. Strict deduction while providing the greatest certainty in the conclusions in an important sense does not create new knowledge and induction to new knowledge is problematic. Traditional valid logical deduction is the logic of certainty, it is not the logic of discovery. Indeed, the Scientific Method is an example of the logical Fallacy of Confirming the Consequence (or Consequent), also known as the Fallacy of Affirming the Consequence (or Consequent). Also, the logic of the Scientific Method is referred to as abduction. This claim itself is not controversial as both scientists and philosophers have pointed this out for decades. What is novel here will be the demonstration that this fallacy is a cause of irreproducibility in biomedical research. Fortunately, if recognized, there are methods to blunt the effect of the fallacy, thereby reducing the risk of unproductive irreproducibility. What is needed, it will be argued, is the judicious use of logical fallacies.
Perhaps novel and counterintuitive, at least from the perspective of some scientists, is that the discipline of logic, typically in the domain of philosophy, could be considered relevant to empirical biomedical research. Scientists from the beginning of modern science, as seen in the founders of the Royal Society, rejected philosophy by taking aim against scholastic metaphysician natural philosophers. While scientists as late as the early 1900s invoked past philosophers in scientific discussions as Sir Charles Sherrington did with Rene Descartes in his text The Integrative Action of the Nervous System
in 1906, such discussions are virtually absent in any literature typically read by today’s biomedical researcher. Thus, it is understandable that biomedical researchers would be skeptical of any argument that proceeds from anything that smacks of philosophy, such as logic. But lack of experience in logic, epistemology (concerning how knowledge is gained rather than the content of knowledge), more generally, is a poor basis for skepticism. All this author can do is ask for forbearance, as the author is confident that such patience will be rewarded.
This author’s ambitions for any role of logic in this particular text are circumscribed. It is critical to appreciate that logic alone does not create biomedical scientific knowledge. Science is fundamentally empiric and its success or failure ultimately relies on observation, data, and demonstration. All logic can do is provide some degree of certainty to the experimental design and analytical methods that drive claims of new scientific knowledge. Yet, the issues of reproducibility fundamentally involve issues of certainty, as reproducibility is at its core a testament to certainty. On this basis alone, logic has a role to play in concerns about scientific reproducibility. The discussions in this text are intended to strengthen, perhaps by just a bit, the already strong and important position of empirical biomedical research.
It may well be that an experiment is so obvious that there are no concerns as to the underlying logic raised. However, the success of such an experiment is not evidence that logic is not operating. For example, consider the Human Genome Project, which has been called a descriptive research program in contrast to a hypothesis-driven program. Perhaps it could be argued that the domain of the research was clearly defined and marked, that is, the human genome, thereby obviating any inductive ambiguity. Essentially, the Human Genome Project consisted of turning the crank.
Interestingly, the project set the stage for subsequent hypothesis-driven research (Verma, 2002), for example, that identified gene A causes disease B such that affecting gene A cures disease B. In that regard, hypothesis-driven research has not fared well given only two FDA-approved gene therapies (as opposed to genetic testing), despite the estimated $3 billion spent on the Human Genome Project. It is important to note that this is not a criticism of the Human Genome Project and that there is every reason to believe that the results will change medical therapies dramatically, but it will take time because it is difficult to go from data collection to cause and effect required of a hypothetico-deductive approach critical to biomedical research.
Further attesting to the potential contributions of logic is the fundamental fallacy inherent in statistics as used in biomedical research, which is the Fallacy of Four Terms. The experimental designs typically involve hypothesis testing on a sample thought representative of the population of concern with inferences from the findings on the sample transferred to the population through a syllogistic deduction. The sample is the entities studied directly, for example, a group of patients, in an effort to understand all patients, which is the population. There are many reasons why all patients cannot be studied and thus scientists have little choice but to study a sample. Consider the example: disease A in a sample of patients is cured by treatment B, the population of those with disease A is the same as the sample of patients; therefore, the population of patients with disease A will be cured with treatment B. The syllogism is extended to my patient is the same as the population of patients with disease A and therefore my patient will be cured with treatment B. However, it is clear that there are many examples where my patient was not cured with treatment B. Indeed, this would be an example of irreproducibility in the broad sense, and one needs only to look to the drugs, biologics, and devices recalled by the FDA to see this is true. Something must be amiss. The majority of pivotal phase 3 trials that initially garnered FDA approval and later abandoned were not likely to have been type I errors based on fraud; lack of transparency; failure to report reagents, materials, or methods; or statistical flaws within the studies.
There are a number of other fallacies to which the scientific enterprise is heir to. These include the Fallacy of Pseudotransitivity that affects the formulation of hypotheses critical to