Ebook738 pages16 hours

Big Data in Psychiatry and Neurology

Name: Big Data in Psychiatry and Neurology
ISBN: 9780128230022

By Ahmed Moustafa

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Big Data in Psychiatry and Neurology provides an up-to-date overview of achievements in the field of big data in Psychiatry and Medicine, including applications of big data methods to aging disorders (e.g., Alzheimer’s disease and Parkinson’s disease), mood disorders (e.g., major depressive disorder), and drug addiction. This book will help researchers, students and clinicians implement new methods for collecting big datasets from various patient populations. Further, it will demonstrate how to use several algorithms and machine learning methods to analyze big datasets, thus providing individualized treatment for psychiatric and neurological patients.

As big data analytics is gaining traction in psychiatric research, it is an essential component in providing predictive models for both clinical practice and public health systems. As compared with traditional statistical methods that provide primarily average group-level results, big data analytics allows predictions and stratification of clinical outcomes at an individual subject level.

Discusses longitudinal big data and risk factors surrounding the development of psychiatric disorders
Analyzes methods in using big data to treat psychiatric and neurological disorders
Describes the role machine learning can play in the analysis of big data
Demonstrates the various methods of gathering big data in medicine
Reviews how to apply big data to genetics

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateJun 11, 2021

ISBN9780128230022

Related to Big Data in Psychiatry and Neurology

Related ebooks

Skip carousel

Artificial Intelligence, Machine Learning, and Mental Health in Pandemics: A Computational Approach
Ebook
Artificial Intelligence, Machine Learning, and Mental Health in Pandemics: A Computational Approach
byShikha Jain
Rating: 0 out of 5 stars
0 ratings
System Vaccinology: The History, the Translational Challenges and the Future
Ebook
System Vaccinology: The History, the Translational Challenges and the Future
byVijay Kumar Prajapati
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence in Behavioral and Mental Health Care
Ebook
Artificial Intelligence in Behavioral and Mental Health Care
byDavid D. Luxton
Rating: 4 out of 5 stars
4/5
Methods in Biomedical Informatics: A Pragmatic Approach
Ebook
Methods in Biomedical Informatics: A Pragmatic Approach
byIndra Neil Sarkar
Rating: 0 out of 5 stars
0 ratings
Practical Guide to Clinical Computing Systems: Design, Operations, and Infrastructure
Ebook
Practical Guide to Clinical Computing Systems: Design, Operations, and Infrastructure
byThomas Payne
Rating: 0 out of 5 stars
0 ratings
Machine Learning in Bioinformatics
Ebook
Machine Learning in Bioinformatics
byYanqing Zhang
Rating: 0 out of 5 stars
0 ratings
Psychiatric Genomics
Ebook
Psychiatric Genomics
byEvangelia Eirini Tsermpini
Rating: 0 out of 5 stars
0 ratings
Pan-genomics: Applications, Challenges, and Future Prospects
Ebook
Pan-genomics: Applications, Challenges, and Future Prospects
byDebmalya Barh
Rating: 0 out of 5 stars
0 ratings
Immunoinformatics of Cancers: Practical Machine Learning Approaches Using R
Ebook
Immunoinformatics of Cancers: Practical Machine Learning Approaches Using R
byNima Rezaei
Rating: 0 out of 5 stars
0 ratings
The Postgenomic Condition: Ethics, Justice, and Knowledge after the Genome
Ebook
The Postgenomic Condition: Ethics, Justice, and Knowledge after the Genome
byJenny Reardon
Rating: 0 out of 5 stars
0 ratings
An Introduction to Pharmacovigilance
Ebook
An Introduction to Pharmacovigilance
byPatrick Waller
Rating: 0 out of 5 stars
0 ratings
Sexual Selection
Ebook
Sexual Selection
byMalte Andersson
Rating: 5 out of 5 stars
5/5
Psychophysiology: Today and Tomorrow
Ebook
Psychophysiology: Today and Tomorrow
byN. P. Bechtereva
Rating: 2 out of 5 stars
2/5
Handbook of Psychology, Assessment Psychology
Ebook
Handbook of Psychology, Assessment Psychology
byIrving B. Weiner
Rating: 0 out of 5 stars
0 ratings
Clinical Precision Medicine: A Primer
Ebook
Clinical Precision Medicine: A Primer
byJudy S. Crabtree
Rating: 0 out of 5 stars
0 ratings
Data Analysis and Visualization in Genomics and Proteomics
Ebook
Data Analysis and Visualization in Genomics and Proteomics
byFrancisco Azuaje
Rating: 0 out of 5 stars
0 ratings
Essentials of Medical Genomics
Ebook
Essentials of Medical Genomics
byStuart M. Brown
Rating: 0 out of 5 stars
0 ratings
Computational Psychiatry: Mathematical Modeling of Mental Illness
Ebook
Computational Psychiatry: Mathematical Modeling of Mental Illness
byAlan Anticevic
Rating: 5 out of 5 stars
5/5
Biological Psychiatry: A Review of Recent Advances
Ebook
Biological Psychiatry: A Review of Recent Advances
byJ. R. Smythies
Rating: 0 out of 5 stars
0 ratings
Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs
Ebook
Methods and Applications of Statistics in Clinical Trials, Volume 1: Concepts, Principles, Trials, and Designs
byN. Balakrishnan
Rating: 0 out of 5 stars
0 ratings
Genomic Biomarkers for Pharmaceutical Development: Advancing Personalized Health Care
Ebook
Genomic Biomarkers for Pharmaceutical Development: Advancing Personalized Health Care
byYihong Yao
Rating: 0 out of 5 stars
0 ratings
Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating
Ebook
Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating
byEwout W. Steyerberg
Rating: 0 out of 5 stars
0 ratings
The Molecular and Cellular Basis of Neurodegenerative Diseases: Underlying Mechanisms
Ebook
The Molecular and Cellular Basis of Neurodegenerative Diseases: Underlying Mechanisms
byMichael S. Wolfe
Rating: 0 out of 5 stars
0 ratings
Nanomedicine-Based Approaches for the Treatment of Dementia
Ebook
Nanomedicine-Based Approaches for the Treatment of Dementia
byUmesh Gupta
Rating: 5 out of 5 stars
5/5
Psychological Assessment and Interventions for Individuals Linked to Radicalization and Lone Wolf Terrorism
Ebook
Psychological Assessment and Interventions for Individuals Linked to Radicalization and Lone Wolf Terrorism
byPublishDrive
Rating: 0 out of 5 stars
0 ratings
Essentials of Noncoding RNA in Neuroscience: Ontogenetics, Plasticity of the Vertebrate Brain
Ebook
Essentials of Noncoding RNA in Neuroscience: Ontogenetics, Plasticity of the Vertebrate Brain
byDavide De Pietri Tonelli
Rating: 0 out of 5 stars
0 ratings
Computational Non-coding RNA Biology
Ebook
Computational Non-coding RNA Biology
byYun Zheng
Rating: 0 out of 5 stars
0 ratings
Epidemiologic Field Methods in Psychiatry: The NIMH Epidemiologic Catchment Area Program
Ebook
Epidemiologic Field Methods in Psychiatry: The NIMH Epidemiologic Catchment Area Program
byWilliam W. Eaton
Rating: 0 out of 5 stars
0 ratings
Probabilistic Methods for Bioinformatics: with an Introduction to Bayesian Networks
Ebook
Probabilistic Methods for Bioinformatics: with an Introduction to Bayesian Networks
byRichard E. Neapolitan
Rating: 0 out of 5 stars
0 ratings
The Scientific Foundation of Neuropsychological Assessment: With Applications to Forensic Evaluation
Ebook
The Scientific Foundation of Neuropsychological Assessment: With Applications to Forensic Evaluation
byElbert Russell
Rating: 0 out of 5 stars
0 ratings

Medical For You

Skip carousel

The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)
Ebook
The Obesity Code: Unlocking the Secrets of Weight Loss (Why Intermittent Fasting Is the Key to Controlling Your Weight)
byDr. Jason Fung
Rating: 4 out of 5 stars
4/5
The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)
Ebook
The Emotion Code: How to Release Your Trapped Emotions for Abundant Health, Love, and Happiness (Updated and Expanded Edition)
byDr. Bradley Nelson
Rating: 4 out of 5 stars
4/5
Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life
Ebook
Women With Attention Deficit Disorder: Embrace Your Differences and Transform Your Life
bySari Solden
Rating: 5 out of 5 stars
5/5
The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine
Ebook
The Vagina Bible: The Vulva and the Vagina: Separating the Myth from the Medicine
byDr. Jen Gunter
Rating: 5 out of 5 stars
5/5
What Happened to You?: Conversations on Trauma, Resilience, and Healing
Ebook
What Happened to You?: Conversations on Trauma, Resilience, and Healing
byOprah Winfrey
Rating: 4 out of 5 stars
4/5
Brain on Fire: My Month of Madness
Ebook
Brain on Fire: My Month of Madness
bySusannah Cahalan
Rating: 4 out of 5 stars
4/5
Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week
Ebook
Mediterranean Diet Meal Prep Cookbook: Easy And Healthy Recipes You Can Meal Prep For The Week
byLisa Rainolds
Rating: 5 out of 5 stars
5/5
The Emperor of All Maladies: A Biography of Cancer
Ebook
The Emperor of All Maladies: A Biography of Cancer
bySiddhartha Mukherjee
Rating: 5 out of 5 stars
5/5
Adult ADHD: How to Succeed as a Hunter in a Farmer's World
Ebook
Adult ADHD: How to Succeed as a Hunter in a Farmer's World
byThom Hartmann
Rating: 4 out of 5 stars
4/5
The People's Hospital: Hope and Peril in American Medicine
Ebook
The People's Hospital: Hope and Peril in American Medicine
byRicardo Nuila
Rating: 4 out of 5 stars
4/5
Herbal Healing for Women
Ebook
Herbal Healing for Women
byRosemary Gladstar
Rating: 4 out of 5 stars
4/5
The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally
Ebook
The Diabetes Code: Prevent and Reverse Type 2 Diabetes Naturally
byDr. Jason Fung
Rating: 4 out of 5 stars
4/5
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
Ebook
Gut: The Inside Story of Our Body's Most Underrated Organ (Revised Edition)
byGiulia Enders
Rating: 4 out of 5 stars
4/5
The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment Inspired By Barbara O'Neill
Ebook
The Lost Book of Simple Herbal Remedies: Discover over 100 herbal Medicine for all kinds of Ailment Inspired By Barbara O'Neill
byBlossom Davis
Rating: 0 out of 5 stars
0 ratings
The Abandonment Recovery Workbook: Guidance through the Five Stages of Healing from Abandonment, Heartbreak, and Loss
Ebook
The Abandonment Recovery Workbook: Guidance through the Five Stages of Healing from Abandonment, Heartbreak, and Loss
bySusan Anderson
Rating: 4 out of 5 stars
4/5
The Song of the Cell: An Exploration of Medicine and the New Human
Ebook
The Song of the Cell: An Exploration of Medicine and the New Human
bySiddhartha Mukherjee
Rating: 4 out of 5 stars
4/5
Native American Herbalist's Bible - 10 Books in 1: Create Your Green Paradise of Medicinal Plants and Herbal Remedies to Unleash Your Vitality: Herbal Apotecary Collection
Ebook
Native American Herbalist's Bible - 10 Books in 1: Create Your Green Paradise of Medicinal Plants and Herbal Remedies to Unleash Your Vitality: Herbal Apotecary Collection
byLomasi Ahusaka
Rating: 5 out of 5 stars
5/5
Working Stiff: Two Years, 262 Bodies, and the Making of a Medical Examiner
Ebook
Working Stiff: Two Years, 262 Bodies, and the Making of a Medical Examiner
byJudy Melinek
Rating: 4 out of 5 stars
4/5
Three James Herriot Classics: All Creatures Great and Small, All Things Bright and Beautiful, and All Things Wise and Wonderful
Ebook
Three James Herriot Classics: All Creatures Great and Small, All Things Bright and Beautiful, and All Things Wise and Wonderful
byJames Herriot
Rating: 5 out of 5 stars
5/5
ATOMIC HABITS:: How to Disagree With Your Brain so You Can Break Bad Habits and End Negative Thinking
Ebook
ATOMIC HABITS:: How to Disagree With Your Brain so You Can Break Bad Habits and End Negative Thinking
byDr. Dan Builfford
Rating: 5 out of 5 stars
5/5
Living Daily With Adult ADD or ADHD: 365 Tips o the Day
Ebook
Living Daily With Adult ADD or ADHD: 365 Tips o the Day
byDouglas A Puryear MD
Rating: 5 out of 5 stars
5/5
Tight Hip Twisted Core: The Key To Unresolved Pain
Ebook
Tight Hip Twisted Core: The Key To Unresolved Pain
byChristine Koth
Rating: 4 out of 5 stars
4/5
"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022
Ebook
"Cause Unknown": The Epidemic of Sudden Deaths in 2021 & 2022
byEd Dowd
Rating: 5 out of 5 stars
5/5
Taking Charge of Your Fertility: The Definitive Guide to Natural Birth Control, Pregnancy Achievement, and Reproductive Health
Ebook
Taking Charge of Your Fertility: The Definitive Guide to Natural Birth Control, Pregnancy Achievement, and Reproductive Health
byToni Weschler
Rating: 4 out of 5 stars
4/5
The Art of Dying Well: A Practical Guide to a Good End of Life
Ebook
The Art of Dying Well: A Practical Guide to a Good End of Life
byKaty Butler
Rating: 4 out of 5 stars
4/5
A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals
Ebook
A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals
byRobert F. Kennedy, Jr.
Rating: 3 out of 5 stars
3/5
Hidden Lives: True Stories from People Who Live with Mental Illness
Ebook
Hidden Lives: True Stories from People Who Live with Mental Illness
byGabor Maté
Rating: 4 out of 5 stars
4/5
Holistic Herbal: A Safe and Practical Guide to Making and Using Herbal Remedies
Ebook
Holistic Herbal: A Safe and Practical Guide to Making and Using Herbal Remedies
byDavid Hoffmann
Rating: 4 out of 5 stars
4/5
Rewire Your Brain: Think Your Way to a Better Life
Ebook
Rewire Your Brain: Think Your Way to a Better Life
byJohn B. Arden
Rating: 4 out of 5 stars
4/5
The Hormone Reset Diet: Heal Your Metabolism to Lose Up to 15 Pounds in 21 Days
Ebook
The Hormone Reset Diet: Heal Your Metabolism to Lose Up to 15 Pounds in 21 Days
bySara Szal Gottfried M.D.
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

NEJM Interview: Dr. Elliot Israel on the risks and benefits of long-acting beta agonists in the treatment of asthma and COPD.
Podcast episode
NEJM Interview: Dr. Elliot Israel on the risks and benefits of long-acting beta agonists in the treatment of asthma and COPD.
byNEJM Interviews
0 ratings
0% found this document useful
Metaverse Medicine
Podcast episode
Metaverse Medicine
byOIS Podcast | Ophthalmology's leading Podcast
0 ratings
0% found this document useful
Podcast Rewind: It's Personalized – The Future of Data-Driven Health: Note: This episode originally aired in January 2023. Tune in this week as special guest, Dr. Leroy Hood, an accomplished scientist best known for his integral work on the Human Genome Project, discusses data-driven analysis of chronic diseases and ...
Podcast episode
Podcast Rewind: It's Personalized – The Future of Data-Driven Health: Note: This episode originally aired in January 2023. Tune in this week as special guest, Dr. Leroy Hood, an accomplished scientist best known for his integral work on the Human Genome Project, discusses data-driven analysis of chronic diseases and ...
byThe Thorne Podcast
0 ratings
0% found this document useful
AI in Healthcare: Artificial Intelligence is the Way to Advance into the Next Era of Medical Technology: Could artificial intelligence be used in clinical drug trials? Is AI really capable of making the overall process easier and allowing the advancement of medical science, bringing cutting-edge treatment options to patients? may sound like science...
Podcast episode
AI in Healthcare: Artificial Intelligence is the Way to Advance into the Next Era of Medical Technology: Could artificial intelligence be used in clinical drug trials? Is AI really capable of making the overall process easier and allowing the advancement of medical science, bringing cutting-edge treatment options to patients? may sound like science...
byFinding Genius Podcast
0 ratings
0% found this document useful
Episode 53: Brian Caulfield on wearable technologies and the potential of electrical muscle stimulation: Brian Caulfield, University College Dublin, wearables, mobile sensing technology, electrical muscle stimulation, physiotherapy, PowerDot, rugby, soccer, fast type fibers,blood flow restriction,ankle sprains,concussions, Ken Ford,Dawn Kernagis,IHMC
Podcast episode
Episode 53: Brian Caulfield on wearable technologies and the potential of electrical muscle stimulation: Brian Caulfield, University College Dublin, wearables, mobile sensing technology, electrical muscle stimulation, physiotherapy, PowerDot, rugby, soccer, fast type fibers,blood flow restriction,ankle sprains,concussions, Ken Ford,Dawn Kernagis,IHMC
bySTEM-Talk
0 ratings
0% found this document useful
The Age of Scientific Wellness: The Future of Medicine Is Personalized, Predictive, Data-Rich and in Your Hands: Taking us to the cutting edge of the new frontier of medicine, a visionary biotechnologist and a pathbreaking researcher show how we can optimize our health in ways that were previously unimaginable.
Podcast episode
The Age of Scientific Wellness: The Future of Medicine Is Personalized, Predictive, Data-Rich and in Your Hands: Taking us to the cutting edge of the new frontier of medicine, a visionary biotechnologist and a pathbreaking researcher show how we can optimize our health in ways that were previously unimaginable.
byCommonwealth Club of California Podcast
0 ratings
0% found this document useful
Research Review of FASHION UK from PEDro: Research Review of FASHION UK from PEDro
Podcast episode
Research Review of FASHION UK from PEDro: Research Review of FASHION UK from PEDro
byPT Pintcast - Physical Therapy
0 ratings
0% found this document useful
New Phoenix Pediatric Sepsis Criteria by L. Schlapbach et al | OPENPediatrics: In this World Shared Practice Forum Podcast, auth…
Podcast episode
New Phoenix Pediatric Sepsis Criteria by L. Schlapbach et al | OPENPediatrics: In this World Shared Practice Forum Podcast, auth…
byOPENPediatrics
0 ratings
0% found this document useful
Medical Information
Podcast episode
Medical Information
bySaving Lives In Slow Motion
0 ratings
0% found this document useful
Journal Review in Surgical Education: Artificial Intelligence
Podcast episode
Journal Review in Surgical Education: Artificial Intelligence
byBehind The Knife: The Surgery Podcast
0 ratings
0% found this document useful
AI in Healthcare: Artificial Intelligence is the Way to Advance into the Next Era of Medical Technology: Could artificial intelligence be used in clinical drug trials? Is AI really capable of making the overall process easier and allowing the advancement of medical science, bringing cutting-edge treatment options to patients? AI in healthcare may sound...
Podcast episode
AI in Healthcare: Artificial Intelligence is the Way to Advance into the Next Era of Medical Technology: Could artificial intelligence be used in clinical drug trials? Is AI really capable of making the overall process easier and allowing the advancement of medical science, bringing cutting-edge treatment options to patients? AI in healthcare may sound...
byFinding Genius Podcast
0 ratings
0% found this document useful
E108 | Using AI to Assess Brain Health more accurately for CNS clinical trials: In this episode of AI For Pharma Growth, Dr Andree Bates is joined by Liam Kaufman, Vice President of Cambridge Cognition. Andree and Liam talk about the use of AI during Central Nervous System (CNS) clinical research trials. They delve into...
Podcast episode
E108 | Using AI to Assess Brain Health more accurately for CNS clinical trials: In this episode of AI For Pharma Growth, Dr Andree Bates is joined by Liam Kaufman, Vice President of Cambridge Cognition. Andree and Liam talk about the use of AI during Central Nervous System (CNS) clinical research trials. They delve into...
byAI For Pharma Growth
0 ratings
0% found this document useful
#34: AI, vaccines and happy sheep With Adam Bohr and Kaveh Memarzadeh
Podcast episode
#34: AI, vaccines and happy sheep With Adam Bohr and Kaveh Memarzadeh
byThe International Business Podcast
0 ratings
0% found this document useful
S 323: The Science Behind Red Light Therapy and Photobiomodulation: If you follow the biohacking world, you've probably heard the term red light therapy. It can be easy to write off at first as you wonder how shining a red light on your skin could have so many health benefits so today's episode is looking at the...
Podcast episode
S 323: The Science Behind Red Light Therapy and Photobiomodulation: If you follow the biohacking world, you've probably heard the term red light therapy. It can be easy to write off at first as you wonder how shining a red light on your skin could have so many health benefits so today's episode is looking at the...
bySam Miller Science
0 ratings
0% found this document useful
"Distraction for Pediatric Painful Procedures" by Dr. Kirsten Hanrahan for OPENPediatrics: In this Nursing World Shared Practice Forum podca…
Podcast episode
"Distraction for Pediatric Painful Procedures" by Dr. Kirsten Hanrahan for OPENPediatrics: In this Nursing World Shared Practice Forum podca…
byOPENPediatrics
0 ratings
0% found this document useful
Ep. 221 - The Doctor Is In Series - The Psychology of Learned Helplessness: Welcome to the Social-Engineer Podcast: The Doctor Is In Series – where we will discuss understandings and developments in the field of psychology. In today’s episode, Chris and Abbie are discussing the psychology of Learned Helplessness....
Podcast episode
Ep. 221 - The Doctor Is In Series - The Psychology of Learned Helplessness: Welcome to the Social-Engineer Podcast: The Doctor Is In Series – where we will discuss understandings and developments in the field of psychology. In today’s episode, Chris and Abbie are discussing the psychology of Learned Helplessness....
byThe Social-Engineer Podcast
0 ratings
0% found this document useful
Ep. 80 Making Ethics Matter with Dr. Eric Keller: In this episode, Dr. Eric Keller joins Dr. Christopher Beck to discuss medical ethics within IR. He speaks about using a bottom-up approach of applied ethics, and we examine why a combination of casuistry and virtue ethics may be helpful rather than principlism.
Podcast episode
Ep. 80 Making Ethics Matter with Dr. Eric Keller: In this episode, Dr. Eric Keller joins Dr. Christopher Beck to discuss medical ethics within IR. He speaks about using a bottom-up approach of applied ethics, and we examine why a combination of casuistry and virtue ethics may be helpful rather than principlism.
byBackTable Vascular & Interventional
0 ratings
0% found this document useful
Is big data good for our health? [Audio]
Podcast episode
Is big data good for our health? [Audio]
byLSE: Public lectures and events
0 ratings
0% found this document useful
The Whole Person: Seqster's System for 360-Degree Healthcare Data: Founder and CEO Ardy Arianpour explains how the company integrates multiple data sources regarding health care into one system. He discusses How they integrate information, medical records, and wearable devises, How this becomes a longitudinal...
Podcast episode
The Whole Person: Seqster's System for 360-Degree Healthcare Data: Founder and CEO Ardy Arianpour explains how the company integrates multiple data sources regarding health care into one system. He discusses How they integrate information, medical records, and wearable devises, How this becomes a longitudinal...
byFinding Genius Podcast
0 ratings
0% found this document useful
S 420: Evidence Based Content Creation, Influencer Social Media Arguments, and the Hierarchy of Research and Coaching Protocol Design: Did you know that every 3.2 seconds, there's an internet argument going on about whether something is true or not? That statistic is not evidence based but I see it far too much. I want to create some context around being an evidence-based coach, some...
Podcast episode
S 420: Evidence Based Content Creation, Influencer Social Media Arguments, and the Hierarchy of Research and Coaching Protocol Design: Did you know that every 3.2 seconds, there's an internet argument going on about whether something is true or not? That statistic is not evidence based but I see it far too much. I want to create some context around being an evidence-based coach, some...
bySam Miller Science
0 ratings
0% found this document useful
Episode 202: The Ethics of Oncology Data on Social Media: Social media is an innovative study recruitment and intervention tool, but what are the ethical considerations surrounding its data? ONS member Lisa Carter-Harris, PhD, APRN, ANP-C, FAAN, associate attending behavioral scientist at Memorial...
Podcast episode
Episode 202: The Ethics of Oncology Data on Social Media: Social media is an innovative study recruitment and intervention tool, but what are the ethical considerations surrounding its data? ONS member Lisa Carter-Harris, PhD, APRN, ANP-C, FAAN, associate attending behavioral scientist at Memorial...
byThe Oncology Nursing Podcast
0 ratings
0% found this document useful
Tissue is Never the Issue
Podcast episode
Tissue is Never the Issue
byOncology Knowledge into Practice Podcast
0 ratings
0% found this document useful
Jonathan W. Goldman, MD - Translating Science, Transforming Practice, and Making Headway Toward Better Outcomes in SCLC: Immunotherapy Has Changed the Game, but Where Do We Go Next?: Go online to PeerView.com/KBM860 to view the activity, download slides and practice aids, and complete the post-test to earn credit.
Podcast episode
Jonathan W. Goldman, MD - Translating Science, Transforming Practice, and Making Headway Toward Better Outcomes in SCLC: Immunotherapy Has Changed the Game, but Where Do We Go Next?: Go online to PeerView.com/KBM860 to view the activity, download slides and practice aids, and complete the post-test to earn credit.
byPeerView Internal Medicine CME/CNE/CPE Audio Podcast
0 ratings
0% found this document useful
Systems Thinking (Feat. Dr Alison Rodrias)
Podcast episode
Systems Thinking (Feat. Dr Alison Rodrias)
byBA Brew - A Business Analysis Podcast
0 ratings
0% found this document useful
You Literally Bled for That Data. Now What?: It’s been about three years since NBT began using supervised machine learning to predict the results of more expensive or unattainable biomedical tests. With our bloodsmart.ai software, we can forecast infections and inflammation, xenobiotic and...
Podcast episode
You Literally Bled for That Data. Now What?: It’s been about three years since NBT began using supervised machine learning to predict the results of more expensive or unattainable biomedical tests. With our bloodsmart.ai software, we can forecast infections and inflammation, xenobiotic and...
byNourish Balance Thrive
0 ratings
0% found this document useful
Medical Errors: In this episode of Critical Matters, Dr. Zanotti …
Podcast episode
Medical Errors: In this episode of Critical Matters, Dr. Zanotti …
byCritical Matters
0 ratings
0% found this document useful
It’s Personalized: The Future of Data-Driven Health: Tune in this week as special guest, Dr. Leroy Hood, an accomplished scientist best known for his integral work on the Human Genome Project, discusses data-driven analysis of chronic diseases and how our genomes may be able to provide individualized h...
Podcast episode
It’s Personalized: The Future of Data-Driven Health: Tune in this week as special guest, Dr. Leroy Hood, an accomplished scientist best known for his integral work on the Human Genome Project, discusses data-driven analysis of chronic diseases and how our genomes may be able to provide individualized h...
byThe Thorne Podcast
0 ratings
0% found this document useful
CTP 015: Real-World Insights and Epidemiology with Dr. Christina Mack: Christina Mack, Ph.D., MSPH is Sr. Director of Epidemiology and Clinical Evidence in the IQVIA Real-World Insights division. She is a recognized expert in effectiveness studies for medical devices and pharmaceutical products, sports injury research,...
Podcast episode
CTP 015: Real-World Insights and Epidemiology with Dr. Christina Mack: Christina Mack, Ph.D., MSPH is Sr. Director of Epidemiology and Clinical Evidence in the IQVIA Real-World Insights division. She is a recognized expert in effectiveness studies for medical devices and pharmaceutical products, sports injury research,...
byClinical Trial Podcast | Conversations with Clinical Research Experts
0 ratings
0% found this document useful
Ethical Issues of Genetic Testing: Biomedical Ethicist Amy Lynn McGuire Covers Modern Concerns: This podcast explores how readily available genomic testing and databases of information raise ethical concerns. Dr. McGuire discusses How information from direct-to-consumer genetic testing can be used and what to look out for, Where different...
Podcast episode
Ethical Issues of Genetic Testing: Biomedical Ethicist Amy Lynn McGuire Covers Modern Concerns: This podcast explores how readily available genomic testing and databases of information raise ethical concerns. Dr. McGuire discusses How information from direct-to-consumer genetic testing can be used and what to look out for, Where different...
byFinding Genius Podcast
0 ratings
0% found this document useful
Labfront: Connecting the Dots Between Research and Real-World Transformation: Chris Peng, the Co-Founder and CEO of Labfront, joins us to delve into the groundbreaking initiatives of their startup, poised to transform health research. Labfront is on a mission to bridge the gap between medical research and practical...
Podcast episode
Labfront: Connecting the Dots Between Research and Real-World Transformation: Chris Peng, the Co-Founder and CEO of Labfront, joins us to delve into the groundbreaking initiatives of their startup, poised to transform health research. Labfront is on a mission to bridge the gap between medical research and practical...
byFinding Genius Podcast
0 ratings
0% found this document useful

Skip carousel

Personalised Medicine: More Than Just Personal
AQ: Australian Quarterly
Article
Personalised Medicine: More Than Just Personal
Mar 31, 2017
9 min read
It’s the End of the Gene As We Know It
Nautilus
Article
It’s the End of the Gene As We Know It
Jan 3, 2019
We’ve all seen the stark headlines: “Being Rich and Successful Is in Your DNA” (Guardian, July 12); “A New Genetic Test Could Help Determine Children’s Success” (Newsweek, July 10); “Our Fortunetelling Genes” make us (Wall Street Journal, Nov. 16); a
10 min read
The EPA’s Rule to Restrict Science Could Compromise Your Confidential Research Data
Union of Concerned Scientists
Article
The EPA’s Rule to Restrict Science Could Compromise Your Confidential Research Data
Mar 5, 2020
5 min read
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
STAT
Article
Opinion: Machine Learning For Clinical Decision-making: Pay Attention To What You Don’t See
Dec 12, 2019
Don't take results from machine learning algorithms at face value. Ask what information isn't available. What subgroups haven't been prioritized? Who is on the research team?
4 min read
Opinion: Sharing Clinical Trial Data: Lessons From The YODA Project
STAT
Article
Opinion: Sharing Clinical Trial Data: Lessons From The YODA Project
Nov 18, 2019
The culture of clinical research is changing, and there are now expectations that researchers will share data — even when it isn't required.
5 min read
Opinion: Electronic Health Records Are Still Waiting To Be Transformed
STAT
Article
Opinion: Electronic Health Records Are Still Waiting To Be Transformed
Apr 11, 2019
Electronic health records aren't yet a transformative tool to support clinical decision-making. Many physicians feel they have traded physical filing cabinets for digital ones.
4 min read
THE WORLD’S BEST Smart Hospitals 2023
Newsweek
Article
THE WORLD’S BEST Smart Hospitals 2023
Sep 16, 2022
7 min read
Opening The ‘Black Box,’ Google DeepMind AI System Diagnoses Eye Diseases And Shows Its Work
STAT
Article
Opening The ‘Black Box,’ Google DeepMind AI System Diagnoses Eye Diseases And Shows Its Work
Aug 13, 2018
Experts said the level of accuracy is impressive, but the bigger breakthrough is the DeepMind system’s solution to the so-called “black box” problem of artificial intelligence.
5 min read
We’re Constantly Generating ‘Shadow’ Medical Records
Futurity
Article
We’re Constantly Generating ‘Shadow’ Medical Records
Feb 22, 2019
3 min read
National Academies Panel Urges Researchers To Routinely Share Test Results With Study Participants
STAT
Article
National Academies Panel Urges Researchers To Routinely Share Test Results With Study Participants
Jul 10, 2018
"It's really calling for a sea change," said the author of a new National Academies report urging the routine release of test results to participants in human research.
3 min read
How Will A.I. Change Medicine?
Futurity
Article
How Will A.I. Change Medicine?
Dec 16, 2018
Artificial intelligence systems for health care have the potential to transform the diagnosis and treatment of diseases, which could help ensure that patients get the right treatment at the right time, but opportunities and challenges are ahead. In a
1 min read
Opinion: All Study Participants Have A Right To Know Their Own Results. My Lab Has Been Doing That For Years
STAT
Article
Opinion: All Study Participants Have A Right To Know Their Own Results. My Lab Has Been Doing That For Years
Sep 5, 2018
Giving study participants their individual results can drive greater public participation in research, increased support for science, and better health.
6 min read
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Union of Concerned Scientists
Article
The Trump EPA Is Restricting EPA Science. It’s Somehow Worse than We Expected.
Mar 4, 2020
3 min read
Opinion: Working Together, Data Scientists And Cancer Researchers Can Transform Cancer Treatment
STAT
Article
Opinion: Working Together, Data Scientists And Cancer Researchers Can Transform Cancer Treatment
Jun 18, 2019
Exposing more cancer researchers and oncologists to data science and data scientists to the complexity of cancer has the potential to transform treatment.
3 min read
Time To Worry About Electronic Pills And Privacy?
Futurity
Article
Time To Worry About Electronic Pills And Privacy?
Dec 30, 2019
2 min read
Opinion: Trustworthy Data From Consumer Health Devices Can Help Solve Stubborn Health Care Challenges
STAT
Article
Opinion: Trustworthy Data From Consumer Health Devices Can Help Solve Stubborn Health Care Challenges
Jul 9, 2019
Clinicians, researchers, and innovators need to collaborate on data integrity now, before bad consumer-generated data undermine consumer health devices and medicine.
4 min read
Smarter Medicine
Fast Company
Article
Smarter Medicine
Jun 20, 2016
1 min read
Is Transparency Always A Good Thing? EPA Weighs Controversial New Rule.
The Christian Science Monitor
Article
Is Transparency Always A Good Thing? EPA Weighs Controversial New Rule.
Mar 12, 2020
The Environmental Protection Agency is mulling a proposal to give preference to scientific research whose datasets and models are publicly available.
3 min read
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Union of Concerned Scientists
Article
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Nov 12, 2019
7 min read
The EPA Should Not Restrict The Science They Use To Protect Us
Union of Concerned Scientists
Article
The EPA Should Not Restrict The Science They Use To Protect Us
Jul 16, 2018
3 min read
Opinion: Big Data Often Yields Small Returns. Here’s How to Fix That
STAT
Article
Opinion: Big Data Often Yields Small Returns. Here’s How to Fix That
Nov 29, 2017
4 min read
Opinion: Two Words To Help Ned Sharpless Revolutionize Clinical Trials: Data Standards
STAT
Article
Opinion: Two Words To Help Ned Sharpless Revolutionize Clinical Trials: Data Standards
May 13, 2019
4 min read
Wheeler’s Breathtaking Ignorance of Science, in One Comment
Union of Concerned Scientists
Article
Wheeler’s Breathtaking Ignorance of Science, in One Comment
Jun 6, 2019
4 min read
Opinion: How One Pharmaceutical Company Is Reinventing The Clinical Trial
STAT
Article
Opinion: How One Pharmaceutical Company Is Reinventing The Clinical Trial
Sep 21, 2018
It's time to make clinical trials more rewarding for patients and investigators, and speed the delivery of new therapies. Here's how one company is doing that.
5 min read
DeepMind AI Predicts Acute Loss Of Kidney Function Two Days In Advance, Study Shows
STAT
Article
DeepMind AI Predicts Acute Loss Of Kidney Function Two Days In Advance, Study Shows
Jul 31, 2019
DeepMind's AI was able to predict 90% of acute kidney injury episodes that required dialysis, with a lead time of 48 hours.
2 min read
Opinion: Teach Medical Students To Use Diagnostic Tools With Their Patients
STAT
Article
Opinion: Teach Medical Students To Use Diagnostic Tools With Their Patients
Sep 19, 2018
Medical students deserve to get consistent training in skills that define high quality, evidence-based, modern medical practice.
3 min read
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
STAT
Article
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
Jan 29, 2019
Instead of waiting for fleshed-out protocols for using real-world evidence, pharmaceutical companies should dive in now using a commonsense approach that relies on rigor and reason.
4 min read
Opinion: Synthetic Control Arms Can Save Time And Money In Clinical Trials
STAT
Article
Opinion: Synthetic Control Arms Can Save Time And Money In Clinical Trials
Feb 5, 2019
Synthetic control arms aren't the solution to all of the challenges facing randomized trials, but they represent a great way for drug development companies to start using real-world evidence.
4 min read
EPA Should Cancel Plans to Restrict Science Once and For All
Union of Concerned Scientists
Article
EPA Should Cancel Plans to Restrict Science Once and For All
May 15, 2020
3 min read
The Sneaky Genius of Facebook's New Preventive Health Tool
The Atlantic
Article
The Sneaky Genius of Facebook's New Preventive Health Tool
Jan 8, 2020
4 min read

Related categories

Skip carousel

Reviews for Big Data in Psychiatry and Neurology

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Big Data in Psychiatry and Neurology - Ahmed Moustafa

Chapter 1: Best practices for supervised machine learning when examining biomarkers in clinical populations

Benjamin G. Schultza; Zaher Joukhadarb; Usha Nattalab; Maria del Mar Quirogab; Francesca Bolkc; Adam P. Vogela,d a Centre for Neuroscience of Speech, The University of Melbourne, Melbourne, VIC, Australia

b Melbourne Data Analytics Platform, The University of Melbourne, Melbourne, VIC, Australia

c Murdoch Children’s Research Institute, The University of Melbourne, Melbourne, VIC, Australia

d Redenlab, Melbourne, VIC, Australia

Abstract

Machine learning approaches are increasingly used in health research. Applications range from the identification of disease onset, classification of disease severity, to predicting epileptic seizures. Although machine learning can be a powerful tool, there is potential for misuse; model performance can be inflated through overfitting and, consequently, will not generalize to the greater population. The risk of misuse increases when the number of variables extracted from continuous data is almost unlimited, as is the case for neural, movement, and acoustic (e.g., speech and music) data. Given that health research may contain small sample sizes, and outcome variables can be noisier for clinical populations, there are important points that should be considered before using machine learning. We suggest best practices in machine learning including data formatting, reducing data dimensionality, model selection and evaluation, and other steps within the machine learning process. We further discuss some common pitfalls in applying machine learning to small sample sizes and high-dimensional data (e.g., speech biomarkers, neural and imaging data). We advocate for parsimonious approaches that include selecting the simplest machine learning method that best describes the data, preventing redundancy and overfitting through variable elimination, and ensuring that certain variables or approaches do not inflate machine learning outcomes. We further consider approaches that can identify the best predictors (or combinations thereof), as well as black box machine learning methods (e.g., deep learning). Finally, we discuss the limitations of current machine learning methods and pose future directions to broaden the applicability of machine learning tools and ensure the outcomes are robust against random factors.

Keywords

Machine learning; Best practices; Big data; Health; Artificial intelligence

1: Introduction

Machine learning is a powerful tool for predicting outcomes as it can simultaneously consider multiple features to identify and delineate classes (e.g., healthy or unhealthy). This approach has several advantages over traditional univariate statistical approaches (see Bzdok, Altman, & Krzywinski, 2018; Bzdok & Meyer-Lindenberg, 2018), with broad scope for machine learning in health and medicine. Machine learning has been used to identify and distinguish medical conditions using genetic (cf. Libbrecht & Noble, 2015; Pattichis & Schizas, 1996), speech (cf. Hegde, Shetty, Rai, & Dodderi, 2019), neural (Craik, He, & Contreras-Vidal, 2019; Kassraian-Fard, Matthis, Balsters, Maathuis, & Wenderoth, 2016), imaging (Thrall et al., 2018), and movement (cf. Figueiredo, Santos, & Moreno, 2018; Kubota, Chen, & Little, 2016) data. There are several decisions made by researchers undertaking machine learning that are seldom explicitly reported (or known) including sample size estimation, variable screening and data reduction, selection of machine learning algorithm(s), selection of training and test datasets, and parameter adjustment. This chapter discusses current best practice for supervised machine learning applications in health and medicine, and describes some of the common pitfalls that researchers may encounter.

2: Data formatting

One of the most important aspects of machine learning, especially across multiple experiments, is data formatting. There are very few guidelines for how data should be formatted in this context and data sharing more generally. While data formatting may seem trivial to some, incorrect data formats can provide misleading results. Following standardized formatting conventions facilitates open data sharing and streamlines collaboration (Ellis & Leek, 2018). Correctly formatting data is important for machine learning in medicine where datasets typically consist of data collected from different sites and studies. Here we describe the best practices for data formatting that work across a range of different software with a focus on tidy data formats and the use of headers (or embedded metadata) for case-wise information (Chen, 2017; Ellis & Leek, 2018; Wickham, 2014; Wickham & Grolemund, 2016). Although there are other formatting styles, here we only discuss formats that are readable/importable for most statistical software that can perform machine learning (e.g., Python, R, SPSS, SAS, STATA).

Best practice 1:

Use tidy data formats for machine learning and data sharing

Tidy datasets have a specific structure where each column contains one variable, rows contain one observation, and tables contain one observational unit (e.g., sample, group, or experiment) (Wickham, 2014). There are several advantages of tidy datasets relating to standardized practices in data visualization, exploration, screening, transformations, and analysis. For example, Tables 1–3 show three ways to arrange the same data where Tables 1 and 2 are considered messy data and Table 3 is considered Tidy data (i.e., long format). To perform most machine learning applications for data in Tables 1 and 2, information would need to be combined into a format that can be read and interpreted by statistical software. In Table 1, data for each participant is in a different subtable. In order to perform any visualization or analysis, these tables would need to be combined using additional steps, such as melting and casting. Furthermore, note that each subtable contains header information that is not, strictly speaking, inside the table itself (i.e., ID and Group) and would require some scripting to extract automatically. Header information can be included in a multitude of different ways. Embedded metadata, often found within the file properties, can provide information that may require specialized scripts or functions to extract. Information may also be contained in the filename itself (e.g., Part01_control, Part02_control, Part03_disease, Part04_disease) and may require string-splitting to extract this information. In more complicated cases, this header information may be missing or formatted in different ways (e.g., control_Part01, Part02_control, Part03_take2_disease, disease_Part04), requiring manual recoding and/or idiosyncratic code to standardize the data (e.g., sequences of if this then that conditional statements to fix deviant file naming conventions).

Table 1

Table 2

Table 3

Table 2 is also considered messy data, which may be surprising to SPSS (or similar software) users where within-subject designs typically require this format (i.e., wide format). This can be considered messy because the same variable is scattered across different columns representing different conditions. This makes it difficult to perform certain analyses (e.g., checking for a normal distribution) across all conditions without first transforming the data into long format; the format does not imply that the four columns (TaskA_Time1, TaskB_Time1, TaskA_Time2, TaskB_Time2) contain the same variable under different conditions (Tasks A and B, Times 1 and 2). To perform machine learning, these data need to be explicitly defined as belonging to these conditions. Tidy Data is a standardized data structure that maps the meaning of a dataset to its structure (Wickham, 2014, p. 4) where the data structure informs the data interpreter of related categorical and continuous variables across conditions. Data for machine learning diverge from one rule of Tidy Data (Wickham, 2014); relational data on a single subject or participant that is used as a feature or the target category (or categories) being classified must be repeated instead of being placed in a separate table (see Table 4).

Table 4

3: Statistical assumptions

Although machine learning is qualitatively different from other parametric tests and linear statistical methods, many approaches have similar statistical assumptions. Only when these assumptions are upheld do machine learning approaches provide reliable outcomes. For example, many data reduction techniques (e.g., principal components analysis) assume linear relationships between variables and deviations from linearity must be normally distributed, as per parametric assumptions. Some machine learning methods can be robust to mild violations of their statistical assumptions. Regardless of the machine learning method used, researchers must assess their model for gross violations of statistical assumptions as these violations can lead to biased, inaccurate, and unreliable predictions. Model assessments provide statistical support for the reliability of results and the appropriateness of the chosen machine learning approach. These assessments may also reveal latent unmodeled information from the data that could be included in the machine learning model to improve the results. Linear regression is a basic machine learning method and is used to model linear relationships between a dependent variable (output, e.g., class or group) and multiple predictor variables (input features). Here we discuss statistical assumptions in the context of general linear regression models that also apply to other linear machine learning approaches.

The assumption of linearity requires that the relationship between the dependent variable and the fitted values of features is linear when the other variables are held constant (Ernst & Albers, 2017). The fitted values, also called the predicted values, are the outcomes that are generated when fitting the linear model. Deviations from linearity can undermine the model and render it unsuitable for the data. This can happen when the data contain hidden nonlinear relationships that are unmodeled. The assumption of linearity can be assessed by viewing plots of the observed by predicted values or the residuals by the predicted values; there should be a linear trend between the observed and expected probabilities. Violations of the assumption of linearity can sometimes be mitigated through transformations or normalization (e.g., logarithmic or exponential transforms) of the original input features but may require the use of nonlinear models.

The assumption of homoscedasticity (meaning same variance) requires that the error (i.e., noise) of the relationship between the dependent variable and features is similar across all features. When this assumption is not upheld, the standard errors in the output are not reliable (Yang, Tu, & Chen, 2019). Homoscedasticity can be assessed by examining relationships between observed residuals and predicted values. If the plot shows a relationship or linear trend between residuals and features, then the linear model is not appropriate. Fig. 1 shows four examples of residual plots where Fig. 1A fulfills the assumption of linearity because residuals are randomly dispersed. Nonrandom residual dispersions, such as linear trends (Fig. 1B), U-shaped (Fig. 1C), or inverted U-shaped (Fig. 1D) patterns suggest nonlinear models are more appropriate for the data (e.g., nonlinear regression, see Bates & Watts, 1988). If the assumption of homoscedasticity is violated, heteroscedasticity-corrected errors can be calculated that account for the heteroscedasticity present in the model (see Hoechle, 2007).

Fig. 1

Fig. 1 Examples of residual dispersions that (A) fulfill the assumption of homoscedasticity or (B) violate this assumption through heteroscedasticity, or (C) nonlinear u-shaped or (D) inverted u-shaped distributions.

The assumption of independence requires that values within features are not correlated or derived from the same source. For example, data collected from the same participant over time or in different tasks are connected. If these values are serially correlated, the estimates of their variances will not be reliable. These effects can be mitigated by specifying known dependencies in the model and accounting for these random effects using linear mixed-effects models, or similar (Bates & DebRoy, 2004; Bates & Watts, 1988).

The assumption of normality requires that errors are normally distributed. This is sometimes referred to as a weak assumption as, if it is violated, unreliable results will only be obtained if the dataset is small. This assumption does not need to be upheld when dealing with large datasets (Schmidt & Finan, 2018). Normality can be assessed through visual inspection of normal probability or normal quantile plots to ensure the distribution of errors has a linear relationship with the theoretical standardized residual. Alternatively, normality can be tested using the Anderson-Darling test, the Jarque-Bera test, the Kolmogorov-Smirnov test, or the Shapiro-Wilk test. To mitigate violations in normality, the sample size should be increased to more than 10 observations per variable (Schmidt & Finan, 2018).

The assumption of linearly unrelated features (i.e., absence of multicollinearity) assumes features are not highly correlated with one another. Multicollinearity prevents a model from accurately associating variance in the dependent variable with the most informative predictor variable(s) and may lead to incorrect interpretations of the model. One can detect multicollinearity by looking at correlations between the input variables and collinearity statistics (e.g., tolerance and variable inflation factor). Multicollinearity can be mitigated through the data reduction techniques discussed later in the chapter (see Section 6).

Best practice 2:

Check statistical assumptions for machine learning processes and hidden steps

Some machine learning approaches do not necessarily need to meet parametric assumptions (e.g., decision trees and neural networks; Breiman, 2001b; Brownlee, 2016). However, some software contain hidden data reduction steps prior to machine learning that still require parametric assumptions to be met (e.g., Gao et al., 2018). We recommend that researchers specify the assumptions required for their chosen approach. If the assumptions are unknown due to proprietary algorithms, then a conservative approach should meet parametric assumptions even if this is not strictly necessary. We further suggest that any procedures used to manipulate the data are reported.

4: Sample size estimation

Building supervised machine learning models that are both accurate and generalizable requires large datasets. Health data often contain a large number of variables (i.e., high-dimensionality) from a small number of participants (e.g., fewer than 40). It is challenging to collect high-quality (i.e., low-noise, artifact-free) data from large clinical cohorts; patient populations tend to experience fatigue more rapidly leading to poorer performance over time within testing sessions or missing data due to premature session termination. Most clinical studies are acquired from a single site (e.g., clinic or hospital), or the population is rare. There are several initiatives around the world aiming to link health data from multiple locations and experiments to create large corpora of data (big data). These initiatives assist in building more reliable machine learning models for health applications (Farinelli, Barcellos De Almeida, & Linhares De Souza, 2015). However, many health studies that use machine learning are still conducted using very small numbers of participants, some less than 10 (Halilaj et al., 2018).

A common question when performing machine learning is what is an appropriate sample size? Unfortunately, there is no single answer to this question as there are several factors to consider, including the number of features/variables relative to the number of cases. From a theoretical perspective, training a model with a larger feature set relative to sample size will lead to a more idiosyncratic model that may not accurately classify cases when applied to new data. This will not always eventuate in practice; some variables can hinder classification, some are redundant in the presence of other variables, and others may have a negligible effect on classification accuracy. As discussed later, it is important to assess the usefulness of variables within any model (machine learning or otherwise) and variables that harm the model (or do not help the model) should be discarded.

Sample size calculators have been developed for some machine learning approaches, such as predictive logistic regression (Hsieh, Bloch, & Larsen, 1998; Palazón-Bru, Folgado-De La Rosa, Cortés-Castell, López-Cascales, & Gil-Guillén, 2017). Some have suggested other rules like 10–20 samples per feature (Concato, Peduzzi, Holford, & Feinstein, 1995; Peduzzi, Concato, Feinstein, & Holford, 1995; Peduzzi, Concato, Kemper, Holford, & Feinstein, 1996), or a minimum sample size of between 100 (Palazón-Bru et al., 2017) and 500 (Bujang, Sa’at, & Bakar, 2018) regardless of the number of features. Although models with small-to-moderate sample sizes typically overestimate relationships between variables using categorical classification (i.e., the finite-sample bias; King & Zeng, 2001), even large samples of 150 cases per feature may not completely attenuate this bias (van Smeden et al., 2016). There are certain machine learning approaches that are better suited to small sample sizes (Sharma & Paliwal, 2015) and/or large feature sets that reduce data dimensionality (Jollife & Cadima, 2016) as discussed in the following sections. There are also dynamic resampling techniques that can estimate appropriate sample sizes for training sets and measure classification performance with different sample sizes that are not discussed here (see Byrd, Chin, Nocedal, & Wu, 2012; Figueroa, Zeng-Treitler, Kandula, & Ngo, 2012). However, there are no universally accepted practices for a priori determination of sample sizes with the exception that larger sample sizes provide less biased estimates (suggested N = 75–100, Beleites, Neugebauer, Bocklitz, Krafft, & Popp, 2013). Instead, we discuss the best practices to estimate error variance and model fit, and advocate for data sharing and transparency when reporting results and methods to increase replicability and meta analyses.

Best practice 3:

Be cautious when interpreting results from small samples and use appropriate machine learning approaches

5: Choosing parsimonious models

It can be difficult to choose a machine learning approach, especially when starting out in the field. Some researchers may choose an approach based on common practice in their field or by what is within their capabilities. This decision-making strategy is understandable given the range of options for performing machine learning. There is no catch-all solution in machine learning and every approach has advantages and disadvantages (i.e., the no free lunch theorem; Wolpert, 1996). However, given that machine learning provides a situation where a near-infinite number of variables can be used to discriminate between categories using various approaches, it is prudent to discuss the concept of parsimony in model selection (Vandekerckhove, Matzke, & Wagenmakers, 2015). Parsimony follows the principle of Ockham’s razor Entities are not to be multiplied beyond necessity (William of Ockham, c. 1287–1347).

When choosing the parameters and features that are necessary in machine learning, it is considered best practice to select the simplest model that best explains the data (Zellner, Keuzenkamp, & McAleer, 2001). There are some metrics that can inform how well a model explains the data relative to the model’s complexity including Akaike’s information criterion (AIC) and Bayesian information criterion (BIC). These metrics penalize more complex models (i.e., models with more variables and interactions) relative to the goodness of fit, to achieve a tradeoff between simplicity and information loss (Burnham & Anderson, 2004). Lower AIC or BIC values indicate a more parsimonious model relative to other models with the same (or similar) feature sets; BIC penalizes complexity more than AIC and a correction can be applied to the AIC for small sample sizes (AICc). The choice of whether to use AIC or BIC depends on whether the researcher favors information retention or simplicity, respectively. However, AIC and BIC are rarely used when selecting variables and model parameters for machine learning (but see Demyanov, Bailey, Ramamohanarao, & Leckie, 2012).

The absence of parsimony metrics means that researchers must pay attention when selecting/removing variables and adjusting model parameters to ensure that the model is generalizable (i.e., does not overfit the data) and not needlessly complex. We advocate that metrics of parsimony be considered when selecting variables and model parameters, and that these values are reported for baseline and saturated models, in addition to the final model. With parsimony in mind, there are several aspects of machine learning that impact model complexity and generalizability.

The flexibility of the learning algorithm is an important aspect of machine learning. In some instances, a machine learning approach might overfit the sample data and not generalize to other datasets or the population from which the sample was drawn. Most supervised machine learning approaches can be adjusted (automatically or through set parameters) to provide a tradeoff between bias and variance (e.g., GridSearch or Genetic Programming; Nagarajah & Poravi, 2019). This ensures that learning algorithms are not so flexible (i.e., low bias) that they produce a different fit for each training set (i.e., high variance). We recommend that, where applicable, these parameters are reported to improve replicability and so the reader can assess the complexity and level of bias of a model.

Best practice 4:

Report adjustments to model parameters and any algorithms used to attain these values

Another aspect to consider is the function complexity and whether there are complex interactions between the input variables. Complex models require more data for the training set and a learning algorithm with low bias. Bayesian variable elimination methods are sensitive to complex interactions. They are useful in identifying whether simple or complex functions exist within a dataset and whether they explain enough variance to be considered parsimonious (Zhang, 2016). It is difficult to predict the function complexity prior to performing machine learning in the absence of preestablished theoretical or data-driven models that describe relationships between features (e.g., path models with mediator variables). While this problem cannot be completely avoided, we advocate that variables be sufficiently described for machine learning applications and that their inclusion is justified even for exploratory analyses.

An additional issue when identifying the most parsimonious model is the number of variables or features. High-dimensional data does not necessarily guarantee a better outcome when using machine learning; some variables may confuse algorithms and decrease classification accuracy. Here we discuss several methods for feature selection/elimination that can identify the relevant and irrelevant features for classifying a target.

6: Reduction of data dimensionality

Applying machine learning methods to datasets with small sample sizes and high dimensionality without careful consideration can lead to overfitting. Overfitting occurs when models perform well when predicting the training samples but fail when introduced to new data (e.g., test data or new samples). When building supervised machine learning models, feature engineering and dimension reduction are crucial steps that mitigate the risk of overfitting while maintaining high levels of accuracy. Before applying machine learning methods, it is critical researchers use their domain knowledge and construct feature sets with a justified number of features to reduce the dimensions of data while retaining sufficient relevant information.

Speech data, for example, undergo signal processing to arrive at a set of acoustic features that, theoretically, represent the quality of speech and potential deficits in speech articulators (Noffs et al., 2020; Vogel et al., 2017; Vogel et al., 2020; Vogel, Fletcher, & Maruff, 2010). There are multiple parameters and algorithms that can be used to measure voice attributes, and a plethora of acoustic features that show potential for classifying diseases that affect speech (cf. Hegde et al., 2019). This creates the possibility of high-dimensional data that may contain redundant variables or overfit the data when using machine learning approaches. Similar instances of high dimensionality also occur for other types of data including fMRI (cf. Kassraian-Fard et al., 2016), EEG (cf. Craik et al., 2019; Sun & Zhou, 2014), movement and motion capture (cf. Figueiredo et al., 2018; Kubota et al., 2016), and genetics (cf. Libbrecht & Noble, 2015). To remedy this, researchers can reduce data dimensions through variable selection and/or automated dimensionality reduction algorithms (Mares, Wang, & Guo, 2016).

6.1: Scaling

Before applying data reduction methods and (most) machine learning methods, variables should be scaled so all features have comparable ranges and distributions (Zheng & Casari, 2018). Scaling is particularly important for methods that use distance measures (e.g., Euclidean distance), such as linear regression, k-means, and k-nearest neighbors. In the absence of scaling, results may be unreliable. The most common scaling methods are z-scores (i.e., standard normalization) and range normalization so the values fall into a range between 0 and 1.

6.2: Variable selection

One way to reduce the number of variables in high-dimensional data is through variable selection and/or elimination. These methods examine covariation between variables to identify potential redundancies and discard variables that explain the least variance when distinguishing between target categories. For example, there are stepwise variable selection procedures based on parsimony metrics (AIC, BIC) that determine which variables account for the most variance and retain them and/or which variables account for the least variance and eliminate them (see Zhang, 2016). These procedures can also be performed with interactions between variables to determine complex relationships between features that may explain additional variance. However, increasing model parsimony may decrease classification accuracy due to the removal of variables that explain little variance relative to added model

Enjoying the preview?

Page 1 of 1

Big Data in Psychiatry and Neurology

About this ebook

Read more from Ahmed Moustafa

Related to Big Data in Psychiatry and Neurology

Related ebooks

Medical For You

Related podcast episodes

Related articles

Related categories

Reviews for Big Data in Psychiatry and Neurology

What did you think?

Book preview

Big Data in Psychiatry and Neurology - Ahmed Moustafa

Abstract

Keywords

Machine learning; Best practices; Big data; Health; Artificial intelligence

1: Introduction

2: Data formatting

Best practice 1:

3: Statistical assumptions

Best practice 2:

4: Sample size estimation

Best practice 3:

5: Choosing parsimonious models

Best practice 4:

6: Reduction of data dimensionality

6.1: Scaling

6.2: Variable selection