Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Machine Learning and Artificial Intelligence in Radiation Oncology: A Guide for Clinicians
Machine Learning and Artificial Intelligence in Radiation Oncology: A Guide for Clinicians
Machine Learning and Artificial Intelligence in Radiation Oncology: A Guide for Clinicians
Ebook986 pages10 hours

Machine Learning and Artificial Intelligence in Radiation Oncology: A Guide for Clinicians

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Machine Learning and Artificial Intelligence in Radiation Oncology: A Guide for Clinicians is designed for the application of practical concepts in machine learning to clinical radiation oncology. It addresses the existing void in a resource to educate practicing clinicians about how machine learning can be used to improve clinical and patient-centered outcomes.

This book is divided into three sections: the first addresses fundamental concepts of machine learning and radiation oncology, detailing techniques applied in genomics; the second section discusses translational opportunities, such as in radiogenomics and autosegmentation; and the final section encompasses current clinical applications in clinical decision making, how to integrate AI into workflow, use cases, and cross-collaborations with industry. The book is a valuable resource for oncologists, radiologists and several members of biomedical field who need to learn more about machine learning as a support for radiation oncology.
  • Presents content written by practicing clinicians and research scientists, allowing a healthy mix of both new clinical ideas as well as perspectives on how to translate research findings into the clinic
  • Provides perspectives from artificial intelligence (AI) industry researchers to discuss novel theoretical approaches and possibilities on academic collaborations
  • Brings diverse points-of-view from an international group of experts to provide more balanced viewpoints on a complex topic
LanguageEnglish
Release dateDec 2, 2023
ISBN9780128220016
Machine Learning and Artificial Intelligence in Radiation Oncology: A Guide for Clinicians

Related to Machine Learning and Artificial Intelligence in Radiation Oncology

Related ebooks

Biology For You

View More

Related articles

Reviews for Machine Learning and Artificial Intelligence in Radiation Oncology

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Machine Learning and Artificial Intelligence in Radiation Oncology - Barry S. Rosenstein

    Machine Learning and Artificial Intelligence in Radiation Oncology

    A Guide for Clinicians

    Edited by

    John Kang

    Department of Radiation Oncology, University of Washington, Seattle, WA, United States

    Tim Rattay

    Leicester Cancer Research Centre, Department of Genetics and Genome Biology, University of Leicester, Leicester, United Kingdom

    Barry S. Rosenstein

    Department of Radiation Oncology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States

    Table of Contents

    Cover image

    Title page

    Copyright

    Contributors

    Foreword from the editors

    Section I. Fundamentals and overview

    Chapter 1. Fundamentals of machine learning

    Chapter 2. Artificial intelligence, machine learning, and bioethics in clinical medicine

    Chapter 3. Machine learning applications in cancer genomics

    Chapter 4. Radiomics: unlocking the potential of medical images for precision radiation oncology

    Chapter 5. Deep learning for medical image segmentation

    Chapter 6. Natural language processing in oncology

    Chapter 7. Evaluating machine learning models: From development to clinical deployment

    Section II. Research applications

    Chapter 8. Germline genomics in radiotherapy

    Chapter 9. Tumor genomics in radiotherapy

    Chapter 10. Radiotherapy outcome prediction with medical imaging

    Chapter 11. Causal inference for oncology

    Chapter 12. Machine learning in quality assurance and treatment delivery

    Section III. Clinical applications and future developments

    Chapter 13. Case study: Deep learning in radiotherapy auto segmentation

    Chapter 14. Case study: adaptive radiotherapy in the clinic

    Chapter 15. Case study: Handling small datasets – Transfer learning for medical images

    Chapter 16. Case study: Lymph node malignancy classification for head and neck cancer radiation therapy

    Chapter 17. Training the current and next generation in machine learning and artificial intelligence applications in radiation oncology

    Chapter 18. Governance issues and commercialization

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    125 London Wall, London EC2Y 5AS, United Kingdom

    525 B Street, Suite 1650, San Diego, CA 92101, United States

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    Copyright © 2024 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    ISBN: 978-0-12-822000-9

    For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Stacy Masucci

    Acquisitions Editor: Linda Buschman

    Editorial Project Manager: Sara Pianavilla

    Production Project Manager: Neena S. Maheen

    Cover Designer: Vicky Pearson Esser

    Typeset by TNQ Technologies

    Contributors

    Sara Ahmed,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Tracy P.T. Au Yong,     Department of Radiology, The Royal Wolverhampton NHS Trust, Wolverhampton, United Kingdom

    Richard Bakst,     Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, NY, United States

    Gill Barnett,     Department of Oncology, Addenbrooke's Hospital, University of Cambridge, Cambridge, United Kingdom

    Sarah Bond,     Mirada Medical Limited, Oxford, United Kingdom

    Ian S. Boon,     Department of Clinical Oncology, University Hospital Southampton NHS Foundation Trust, Southampton, United Kingdom

    Cheng S. Boon,     Department of Clinical Oncology, Derby Teaching Hospitals NHS Foundation Trust, Derby, United Kingdom

    Michael Buckstein,     Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, NY, United States

    Liyuan Chen,     Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, United States

    Amit Kumar Chowdhry,     University of Rochester Medical Center, Departments of Radiation Oncology and Biostatistics, Rochester, NY, United States

    Sunan Cui,     Department of Radiation Oncology, University of Washington School of Medicine, Seattle, WA, United States

    Andre Dekker,     Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands

    Michael Dohopolski,     Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, United States

    Omar El-Charif,     Department of Medicine, Section of Hematology/Oncology, The University of Chicago, Chicago, IL, United States

    Issam El Naqa,     Department of Machine Learning, H. Lee Moffitt Cancer Center, Tampa, FL, United States

    Rianne Fijten,     Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands

    Clifton D. Fuller,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Andrew Green,     The University of Manchester, Manchester, United Kingdom

    Anshu Jain,     SERO, Charlotte, NC, United States

    Petros Kalendralis,     Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands

    Alan M. Kalet,     Department of Radiation Oncology, University of Washington, Seattle, WA, United States

    John Kang,     Department of Radiation Oncology, University of Washington, Seattle, WA, United States

    Benjamin H. Kann

    Department of Radiation Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States

    Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, MA, United States

    Sarah Kerns,     Medical College of Wisconsin, Milwaukee, WI, United States

    Ellen Kim,     Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, United States

    Kendall J. Kiser,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Ronald Levitin,     Department of Radiation Oncology, Corewell Health East, Farmington Hills, MI, United States

    Jinyuan Liu,     Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States

    Robert J. Lyon,     Department of Computer Science, Edge Hill University, Ormskirk, Lancashire, United Kingdom

    Lance A. McCoy,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Brigid A. McDonald,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Alan McWilliam,     The University of Manchester, Manchester, United Kingdom

    Michael Milano,     University of Rochester Medical Center, Department of Radiation Oncology, Rochester, NY, United States

    Samuel Mulder,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Stuti Nayak,     Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands

    Martijn Nobel

    Department of Radiology and Nuclear Medicine, University Medical Center+, Maastricht, The Netherlands

    School of Health Professions Education (SHE), Maastricht University, Maastricht, The Netherlands

    John Placide,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Evan Porter,     Department of Radiation Oncology, William Beaumont School of Medicine, Royal Oak, MI, United States

    Kathryn Preston,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Sander Puts,     Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands

    Matt Pybus,     Mirada Medical Limited, Oxford, United Kingdom

    Arif S. Rashid,     Department of Radiation Oncology, Winship Cancer Institute of Emory University, Atlanta, GA, United States

    Tim Rattay,     Leicester Cancer Research Centre, University of Leicester, Leicester, United Kingdom

    Barry S. Rosenstein,     Department of Radiation Oncology and Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States

    Keith L. Sanders,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Aneja Sanjay,     Department of Therapeutic Radiology, Yale School of Medicine, New Haven, CT, United States

    Russell Schwartz,     Computational Biology Department and Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States

    Christina Setareh Sharafi,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    David Sher,     Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, United States

    John Shumway,     Department of Radiation Oncology, University of North Carolina, Chapel Hill, NC, United States

    Zaid Siddiqui

    Department of Radiation Oncology, Baylor College of Medicine, Houston, TX, United States

    Department of Radiation Oncology, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States

    Corey Speers,     Department of Radiation Oncology, University Hospitals Seidman Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, United States

    Robert Strawderman,     Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United States

    Yifeng Tao,     Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United States

    Charles R. Thomas Jr. ,     Department of Radiation Oncology & Applied Sciences, Geisel School of Medicine at Dartmouth, Lebanon, NH, United States

    Reid F. Thompson

    Division of Hospital and Specialty Medicine, VA Portland Healthcare System, Portland, OR, United States

    Department of Radiation Medicine, Oregon Health & Science University, Portland, OR, United States

    Xin Ming Tu,     Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, UC San Diego Health Sciences, La Jolla, CA, United States

    Justin Xiang-Yuan Tu,     Department of Orthopedics, Emory Health Care, Emory University, Atlanta, GA, United States

    Martin Vallières

    Department of Computer Science, University of Sherbrooke, Sherbrooke, QC, Canada

    GRIIS, University of Sherbrooke, Sherbrooke, QC, Canada

    Lisanne V. van Dijk,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Juan Ventura,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Kareem A. Wahid,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Hedy S. Wald,     Department of Family Medicine, Warren Alpert Medical School of Brown University, Providence, RI, United States

    Jing Wang,     Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, TX, United States

    Jason Adam Wasserman

    Department of Foundational Medical Studies and Department of Pediatrics, Oakland University William Beaumont School of Medicine, Rochester, MI, United States

    Center for Moral Values in Health and Medicine, Oakland University, Rochester, MI, United States

    Catharine West,     University of Manchester, Christie Hospital, Manchester, United Kingdom

    Kun Yang,     Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, UC San Diego Health Sciences, La Jolla, CA, United States

    Moi Hoon Yap,     Department of Computing and Mathematics, The Manchester Metropolitan University, Manchester, United Kingdom

    Yading Yuan

    Department of Radiation Oncology, Columbia University Irving Medical Center, New York, NY, United States

    Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, NY, United States

    Ye Yuan,     Department of Radiation Oncology, NYU Langone, New York, NY, United States

    Catharina Zegers,     Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands

    Lin L. Zhu,     Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States

    Raed Zuhour,     UH Cleveland Medical Center, Cleveland, OH, United States

    Foreword from the editors

    Who should read this book? Our motivation for this book was to create a practical guide for clinicians without technical expertise who wished for a comprehensible and comprehensive introduction to clinical research and application in artificial intelligence/machine learning. Despite its growing prominence in radiation oncology, artificial intelligence/machine learning (AI/ML) has yet to penetrate physician or medical physics education. There are simply not enough folks who have undergone formal training in both AI/ML and medicine, let alone radiation oncology. Thus, we must lean on each other as a field, much like how we did when intensity-modulated radiation therapy was introduced—and similar to what we are still doing with advancements in adaptive radiotherapy and ultra-high-dose radiation.

    Unlike domains in car driving or speech recognition, medicine is full of uncertainty, confounding, and subjectivity, as a visit to any tumor board will tell you. Accordingly, this book is not just a technical treatise (though there are plenty) since our authors expertly write on topics spanning from ethics to genomics to text mining to education from practical experience facing such uncertainties in medicine. We organize clinical radiation oncology AI/ML into 18 chapters divided into three sections covering (1) Fundamentals, (2) Research Applications, and (3) Clinical Applications and Developments.

    In the Fundamentals section, our leadoff Chapter 1 by Lyon and Rattay introduces us to the nuts and bolts: what is AI/ML, what are the different types of learning, and—importantly—what causes AI to make errors? Errors made by AI are increasingly being looked at through the lens of bioethics, which is the focus of our Chapter 2 by Wasserman and Wald. They discuss several principles of bioethics in clinical medicine and research, and leave us with advice on how clinicians can shape the future of AI through an ethical partnership. We intentionally shine a light on bioethics early on to urge readers to keep these principles in mind throughout the book. The next four chapters cover the domains of genomics, radiomics, image segmentation, and natural language processing in radiation oncology. In Chapter 3, Schwartz et al. summarize genomic technologies and oncology applications such as tumor subtyping and driver mutation discovery that have kicked open the door to capturing—and exploiting—the molecular basis of cancer. They caution us as to challenges like sparsity and heterogeneity, and provide several solutions to these issues. In Chapter 4, Kalendralis et al. review how radiomics has allowed algorithms to process images as quantitative, customizable biomarkers, with recent international standardization enhancing the reproducibility of results. They lay a practical roadmap for radiomics research beginning from hypothesis generation to model interpretability and deployment. While radiomics provides breadth by allowing any ML model to use image data, deep learning provides depth in specialized models that learn features from images. In Chapter 5, Yuan et al. introduce deep learning through the task of organ auto-segmentation, which is currently a practice-changing catalyst in streamlining radiation oncology workflows (more on this later). They take us through the steps of training convolutional neural networks, and encourage us to push the field forward through friendly competition! Algorithms are not limited to pixels or voxels, and can also be trained to process raw text. In Chapter 6, Puts et al. showcase how natural language processing (NLP) can automate difficult tasks requiring clinical expertise, such as extracting diagnoses and matching patients to trials. They underscore the importance of involving physicians in projects and provide steps toward clinical NLP implementation. At this point, one may rightfully be skeptical of how well AI/ML lives up to its claims and may be wanting to learn how to perform due diligence scrutiny. In Chapter 7, Kang et al. demonstrate how model evaluation and development are intertwined, and how to critically scrutinize models step-by-step, from development to evaluation to real-world impact assessment. They discuss why clinical data provide unique obstacles requiring making tradeoffs and present case studies to illustrate how to avoid being misled when evaluating models.

    Our core section on Research Applications explores radiation oncology AI/ML active areas of investigation. The first two chapters cover genomic classifiers, which have a growing role in systemic therapy yet are currently not clinically used in radiotherapy. In Chapter 8, Rosenstein et al. spotlight ongoing research on predicting radiation toxicity using germline mutations. They discuss important strategies for handling the problem of low patient numbers, including both ML methods and consortia data pooling. Within tumor genomics in Chapter 9, Cui et al. compare conventional and deep learning ML classifiers for applications ranging from cancer diagnosis to driver gene prediction to radiotherapy response. They reinforce the common theme of adhering to standardized guidelines to improve reproducibility and generalizability. Moving on to imaging research in Chapter 10, Wahid et al. comprehensively review outcome predicting using radiomics in tumor control and toxicity prediction. They categorize studies by imaging modality and time (pretreatment or change mid/posttreatment). In a nod to prior chapters, they present challenges and potential solutions to standardization—especially in image acquisition and processing—and leave us hopeful about a future with radiomics-based treatment planning. Shifting gears to a hot topic in statistics in Chapter 11, Chowdhry et al. introduce causal inference and how it elucidates cause-and-effect through counterfactual (what could have happened) or causal graph (what causes what) approaches. They demonstrate how to use ML to estimate propensity scores and caution us to be mindful of erroneous causal claims in real-world settings of hidden confounding and broken randomization. We are reminded that to err is human in Chapter 12 as Shumway and Kalet illustrate the role of AI/ML in quality assurance (QA), outlining the evolution from rules-based to probabilistic ML approaches. They contrast plan QA tasks—detecting outliers, IMRT passing rates, and contouring mistakes—with radiation delivery QA tasks such as adaptive radiotherapy (more on this later) and the complex machinery itself.

    Anchoring our book is the Clinical Applications and Developments section where we showcase current AI use in the clinic. Auto-segmentation (also known as auto-contouring) has recently exploded in popularity and in Chapter 13 Boon et al. explore its clinical integration for breast, prostate, and head and neck cancers. Despite its widespread use, many challenges related to human–computer trustworthiness remain to be tackled and the authors offer potential solutions. Auto-segmentation is a crucial building block for adaptive radiotherapy (ART), which prior chapters have alluded to and that MacDonald et al. formally introduce in Chapter 14, taking us through a head and neck cancer use case of toxicity mitigation. They compare and contrast institutional workflows for offline and newer online ART, which can be facilitated by AI/ML for patient selection, auto-segmentation, deformable registration, and synthetic CT generation from MRI. On the topic of synthetic data, in Chapter 15, Green and McWilliam address the very real problem of small and imbalanced datasets in medical imaging through solutions such as sampling and data transformations. They highlight a core AI concept in transfer learning—advancing the starting point of a model by leveraging features learned in another model—and show its potential in a clinical use case for sarcopenia segmentation. In Chapter 16, Wang et al. culminate the lessons in this book in a walkthrough of how they combine radiomics and deep learning to predict for elective neck nodal radiation in an AI-driven clinical trial. They pull back the curtain on end-to-end model building, overcoming challenges like data shift, uncertainty, interpretability, and clinical deployment. Looking toward a brighter future in Chapter 17, Kim et al. make a compelling case for why AI/ML competency is in the interest of clinicians through arguments rooted in efficiency, ethics, and safety. They compare a plethora of resources including choose your adventure styles—open courseware, research tracks, and workshops—to formal pathways like industry partnerships and board-certificated fellowships. Educating clinicians is paramount as technology moves faster than regulation, which Bond and Pybus reminds us of in Chapter 18 where they discuss the quality management and planning required to address usability, cybersecurity, and governance in clinical AI/ML commercialization. In this evolving landscape, the ultimate question that regulators seek to answer is the same one that we all face: how does one understand a model designed to evolve?

    While the pace of progress can be intimidating, the practical points of clinical AI/ML captured in this book do not expire. The actual process of creating models is becoming easier with significant interest in accessibility and automated AI. The amount of useable data continues to increase with companies and consortia dedicated to pooling data from different sources. Yet even with limitless data and instant models, we still need domain experts to make sense of results. The repeated themes of validation, standardization, and interpretability come to play. Do these results represent a real effect or is there a hidden confounder? Is the model robust to a slightly different environment, such as another location? Can users understand the results of these algorithms and properly evaluate its fairness? After reading this book, you will be prepared to ask such questions, and perhaps help us provide the answers.

    We hope you enjoy this journey through AI/ML in radiation oncology as much as we have putting it together. We thank our tremendous collection of authors, which span from students to department chairs.

    Sincerely,

    John Kang

    Tim Rattay

    Barry S. Rosenstein

    Section I

    Fundamentals and overview

    Outline

    Chapter 1. Fundamentals of machine learning

    Chapter 2. Artificial intelligence, machine learning, and bioethics in clinical medicine

    Chapter 3. Machine learning applications in cancer genomics

    Chapter 4. Radiomics: unlocking the potential of medical images for precision radiation oncology

    Chapter 5. Deep learning for medical image segmentation

    Chapter 6. Natural language processing in oncology

    Chapter 7. Evaluating machine learning models: From development to clinical deployment

    Chapter 1: Fundamentals of machine learning

    Robert J. Lyona, and Tim Rattayb     aDepartment of Computer Science, Edge Hill University, Ormskirk, Lancashire, United Kingdom     bLeicester Cancer Research Centre, University of Leicester, Leicester, United Kingdom

    Abstract

    Readers who are already familiar with machine learning theory and mathematical modeling may wish to skip this introductory chapter or, alternatively, use it to refresh their knowledge and understanding. Machine learning concepts are to a large extent grounded in mathematics, specifically, statistical and probabilistic theory as well as computer science. For the unfamiliar reader, this chapter will provide a broad overview of the fundamental concepts and definitions of machine learning and basic modeling. It will cover relevant definitions and purposes of learning and describe some of the pitfalls of automated learning, before introducing the types of automated learning and, finally, introduce model evaluation. We have attempted to minimize the mathematical notation and used equations only where necessary to illustrate important concepts or theory.

    Keywords

    Clustering; Deep neural network; Machine learning; Mathematics; Prediction; Supervised learning

    Key points

    • Machine learning concepts are grounded in mathematics, specifically, statistical and probabilistic theory, as well as computer science.

    • To enable different forms of learning, such as prediction, classification, clustering and reduction, automated learning uses mathematics to connect features to labels using their inherent numerical properties.

    • Automated learning can be divided into generative versus discriminative paradigms, whilst machine learning can be sub-divided into supervised, semi-supervised, and unsupervised approaches.

    • Deep neural networks are now commonly applied to clinical scenarios, yet many problems in the clinical domain can be solved equally using decision trees or other classifiers.

    Artificial intelligence and machine learning

    Artificial intelligence (AI) is a field of study focused upon building autonomous intelligent systems, capable of learning and adapting from experience. It formed during the late 1950s at a conference which brought together those working in the area for the very first time (Attila & Lónyi, 2009; Buchanan, 2005; Crevier, 1993; McCarthy et al., 2006), after a series of cumulative developments in philosophy, mathematics, psychology and neuroscience (Newell, 1983). Today, AI has become an umbrella term encompassing various specializations that aim to replicate, and if possible, improve, the human capabilities listed in Table 1.1. The final discipline in Table 1.1, Machine Learning (ML), is the focus of this book. In general terms, ML seeks to construct computational machines, capable of automating complex decision-making processes. These machines accept information in the form of data and output some form of decision or prediction that can be actioned. In practice, the machines assume the form of sophisticated mathematical models. Such models are capable of finding patterns in an arbitrary decision-making process. However, they do not possess intelligence in the traditional sense, such as creativity or thinking outside-the-box. Rather they use mathematics to underpin decision making choices at any given moment. Where this works well, it gives rise to an illusion of intelligence.

    Table 1.1

    Capturing and quantifying experience

    The human brain is capable storing a lifetime of knowledge and experience. It can be viewed in basic terms as a biological analogue of a computer hard drive. While the biological and artificial mediums store information differently, the nature of what they store can be considered the same in both cases: data. In humans, data are stored via the interconnections of neurons in the brain. In machines, they are encoded collections of binary strings stored in computer chips, physical disks, or even upon magnetic tape. When this information is decoded, we typically find ourselves with two distinct forms of data; categorical or numerical. Categorical data are either nominal, defining mutually exclusive categories that have no order (e.g., smoker vs. non-smoker); or ordinal, where an implicit ordering exists (e.g., minimal, moderate, severe). While numerical data are either represented by discrete whole number values (e.g., number of outpatient appointments) or continuous real-valued numbers (e.g., height, weight etc). Collections of data form datasets, which are commonly presented for viewing using a tabular structure consisting of rows and columns. Yet it is impractical to present large datasets in this manner. For this reason, it is helpful to use mathematical notation to describe the data. Such notation can be introduced via exploring a simplified model of the data stored in human memory. This model facilitates an exploration of the concepts underpinning both biological and automated machine learning.

    It is convenient to represent memory as a single unified dataset. This view masks many complexities relating to memory structure, storage and retrieval, yet it is a convenient tool for unlocking an understanding of ML This dataset becomes populated over time as our senses (sight, smell, sound etc.) collect information describing the world around us. We can describe this "experience" dataset using some straightforward mathematical notation:

    Equation

    This defines an empty set Equation , where a set is simply a collection of items called elements. The elements may correspond to representations of imagery, textual or numerical information, or even auditory experiences. Brought together they form a memory. Experience is obtained when sensory information is collected and stored in Equation . The acquisition of experience can be represented via the ordered insertion of elements into Equation as follows:

    Equation

    Here Equation contains multiple distinct experiences. Each is numbered using the sub-script Equation , where Equation is a placeholder for a discrete value (a number without a floating-point or decimal component such as 0.01) allowing experiences to be differentiated. The sub-script also permits the straightforward identification of a specific experience expressed via Equation (e.g., Equation  not equal to  Equation ). There are Equation experiences in total, where Equation could represent any whole number greater than zero.

    Individual experiences can be characterized by one or more defining features, traditionally called variables. Humans are able to automatically and unconsciously extract features from sensory information with ease. For example, when meeting a new patient for the first time, a clinician is able to use key features of the patient with little or no effort to determine whether the patient appears well or not. This allows them to assess (almost instantaneously) the patient's age, height, weight, circulation via skin tone and so on. Such feature information can then be used to support complex decision-making processes, such as determining if the patient is ill or not. In this way, an experience is describable as a set comprised of features:

    Equation

    An experience is defined using m features, which may be interconnected or independent. Each can be uniquely identified via the superscript Equation , where Equation assumes a value between 1 and Equation . For instance, when a patient is unwell, they may experience one or more symptoms such that Equation = {cough, high temperature}, thus we have Equation = {cough} and Equation = {high temperature}. These symptoms are features relate to an experience of a sick patient.

    Experiences quantified in Equation can often be visualized when given appropriate context. Categorical features are mostly easily represented using histograms or bar charts. For example, the left-hand side of Fig. 1.1 is a bar chart that shows the fraction of patients a clinician has observed with some arbitrary disease. The vertical axis represents the fraction of patients observed. The unshaded bars represent the observations of the individual clinician, in other words, their experiences of those patients with (+) and without the disease (−). The shaded bars show the true fraction of those afflicted in the population at large. Clearly there is a disparity between the experience of the clinician and the true rate of the disease in the population; perhaps because the clinician is more likely to come into contact with sick patients. Irrespective of any disparity, the clinician could use such basic information to estimate the likelihood that a random patient has the disease. This estimate will be reasonably accurate, if past experience is representative of the true population.

    Fig. 1.1  On the left, a bar chart representing the fraction of patients a hypothetical clinician has observed testing positive (+) for a disease. These observations do not exactly match the true fraction of those with the disease in the population at large. On the right, the numerical distributions for three distinct patient features. It shows the difference in the distributions for those patients testingpositive (+), versus those testing negative ( − ). The distributions can be optimally separated using the line labeled φ (phi). This could correspond to a threshold over features such as age, or blood glucose levels.

    Numerical features on the other hand give rise to statistical distributions when plotted. Such distributions are shown on the right-hand side in Fig. 1.1. Here three continuous features have been recorded for a collection of patients with and without the same disease. The dots represent observations stored in Equation and the curves represent the true distribution of the two types of patients: those with (+) and those without (−) the disease. We find that two distinct groups of patient can be summarized using a Gaussian (bell shaped) curve for each feature. Suppose the first feature is actually patient age. We can interpret φ (phi) as a threshold over age, that allows disease and non-disease patients to be separated quite successfully. This separability only becomes apparent when there is enough information to form a model in this way. In reality there may not be enough of such information in Equation to form such well-defined curves, particularly because observations are generally imperfect due to the presence of noise in data.

    We can use features to derive an understanding that guides decision-making. However, for this to happen we require feedback. In the example in Fig. 1.1, feedback was provided regarding what the bars and curves represent. With this information it becomes possible to interpret and understand the information presented. Humans possess an innate cognitive ability that enables new knowledge to be inferred from feedback. This in turn leads to improved decision-making. For example, when learning to recognize the symptoms of a disease in patients, experience and feedback is acquired that allows for self-correction. Thus, if a mistake is made when learning, resulting in misdiagnosis, that feedback is coupled to an experience stored in Equation . Over time we recognize which experiences in Equation led to a correct diagnosis (success) and which led to misdiagnosis (failure). We are then able to avoid repeating those experiences likely to yield failure. Feedback is therefore crucial for enabling adaptation and improvement. It can be included in the mathematical notation discussed so far via introduction of a new symbol, Equation . This represents a feedback set. It contains all potential labels that can be attributed to an experience:

    Equation

    There are at most Equation feedback labels, where specific feedback can be referenced via Equation . For example, we may have Equation = {cold, flu, respiratory infection}, though feedback can be either numerical or categorical depending on the problem domain. If the labels accompanying experiences are undeniably correct, i.e., not subject to any doubt, they are known as ground truth labels. If provided with ground truth labels, we are capable of learning effectively. However, if labels are subject to error, so called mislabeling, then learning/decision making accuracy is often significantly reduced. In the real world, ground truth labels can be incredibly difficult to obtain, either due to the inherent cost of obtaining them or the uncertainty in the concept to be labeled. When trying to determine if the features of a clinical image are indicative of disease, there can be great uncertainty. For example, delineating between healthy and diseased tissue can be incredibly difficult. Yet where ground truth labels do exist, feedback can be directly coupled with experience via a straightforward modification to Equation :

    Equation

    Here each experience Equation is accompanied by exactly one feedback label described via Equation . This definition permits the sharing of labels amongst experiences. An example experience dataset that adheres to this definition is given in Table 1.2. It describes a collection of patients along with the ground truth label corresponding to the disease they were diagnosed to have. Each patient is represented by a row and each column represents a key feature. How can learning be achieved with this information - that is, how to learn to diagnose new unseen patients accurately based upon these past experiences?

    Table 1.2

    Learning, specifically, supervised learning becomes possible if we are able to develop a mapping between past experiences and feedback labels. A mapping is some heuristic used to link each Equation to a Equation . For example, the feature blood sugar level can be used to split patients into diabetic and non-diabetic groups, but only if we can find an optimal blood glucose level to split them upon. The larger Equation becomes, the more experience available and the greater the likelihood of understanding the relationship between features and labels. In the real world this is how clinicians learn to recognize disease at a fundamental level. They use experience gained during their careers to map patient symptoms to the most likely disease. Symptoms belonging to common diseases, like diabetes or the flu, are represented frequently in Equation . This allows a diagnosis to be made with ease in new patients. If presented symptoms are rare or do not correspond to anything represented in Equation , then further investigations may be needed, or an oracle consulted (e.g., a clinical colleague with experience in a specialist area).

    Learning from experience

    We can regard learning as a process that connects experience with labels to derive knowledge. This is useful for understanding the strengths and weaknesses of human decision making. Whilst we strive to make optimal decisions, rarely is this high standard met. To explore why, consider the visualisation of knowledge shown in Fig. 1.2. The box represents the complete set of all knowledge denoted by the set Equation . This is an idealised hypothetical set containing every piece of information stored in books, web resources, or the minds of other people. The circle labeled Equation is a subset of Equation that represents the portion of knowledge practically available to us. For example, your local medical library doesn't contain every book ever written. Nor do we have access to the knowledge of scientists working at the forefront of research. The circle corresponding to our personal knowledge and experience is denoted by Equation and is subset of Equation . The content of Equation varies from person to person according to, for example, educational background. Finally, the innermost circle represents the knowledge we use to make a decision at any given time Equation . This is denoted by Equation . This set is imperfect. After all, humans are incapable of recording and recalling information perfectly. This imperfection greatly reduces our capacity for optimal decision making, as we only ever have access to incomplete knowledge.

    Fig. 1.2  Visualisation of knowledge and experience used in human learning.

    This visualisation also provides an explanation for how decision-making performance can vary amongst people – no two individuals use exactly the same experiences and knowledge to make decisions. Hence two people, given the exact same information, can draw radically different conclusions regarding its meaning. To complicate matters, people possess a variety of cognitive biases that impede decision making and understanding. Biases can prevent the acceptance of logic and permit cognitive dissonance. This helps explain why, for example, communities vehemently opposed to vaccination refuse to change their view despite being given evidence to allay their fears. In this case more knowledge added to Equation will not change their view - rather their cognitive biases and thinking processes must be altered before new knowledge is accepted and internalized.

    How does this representation of human knowledge, experience and learning that we've introduced, relate to machine learning? Perhaps surprisingly, ML algorithms learn and make decisions similarly to humans. They are subject to the same problems of bias and flawed decision-making. Understanding the flaws in human learning is therefore useful for understanding the flaws in automated learning. Though of course, we note that the means by which these flaws and biases affect automated learning is very different and mathematically quantifiable. Consider the new visualisation shown in Fig. 1.3. It depicts in abstract terms the knowledge and experience used by automated learning algorithms. The blue box labeled Equation represents the set of perfect knowledge. The largest circle labeled Equation represents the set of available knowledge. The circle labeled Equation represents the knowledge stored in data, whilst Equation is the feature data supplied to an algorithm. The feature data therefore represents the subset of knowledge provided to an algorithm for learning and decision-making. This set is imperfect for reasons already outlined, limiting how well an algorithm can perform. It is also incomplete, due to gaps in our knowledge or flaws in data collection. In many respects, automated methods are given much less information to learn from then we are as humans. Yet they can achieve success by finding salient patterns in data that we often overlook or cannot find due to large data volumes. In both cases, automated methods are only as good as the data provided to them.

    Forms of learning

    As depicted in Fig. 1.3, artificial systems learn from experience described via feature data obtained from sensors (cameras, microphones etc.) or data storage systems tasked with managing information. Such data permits multiple forms of learning according to its inherent characteristics and underpins the different types of ML. Before proceeding to describe ML in more detail, we review these different forms of learning.

    Fig. 1.3  Visualisation of knowledge and experience used by machine learning algorithms.

    Prediction

    We can forecast an unknown numerical outcome based on past experience. For example, predicting patient height based on age and weight. Given an experience set Equation = {x, …, xn}, prediction aims to output a numerical value Equation that holds some predictive power. The success of a prediction system can be assessed by quantitatively measuring the difference between predictions and eventual outcomes in the real-world.

    Clustering

    This involves grouping numerical and/or categorical observations according to their inherent characteristics and similarity, without the need for labels to guide the grouping. Given Equation = {x1, …, xn} clustering aims to group data into Equation distinct clusters, where Equation is a parameter we control. If labels are available for the data in Equation , then clusters can be assigned labels. There are many approaches in practice, yet one way to achieve this is to assign a cluster the majority label of the examples belonging to it. When this happens, clusters can be used to make predictions on never before seen examples - simply by assigning new examples the label of which ever cluster they are closest to.

    Classification

    Classification involves assigning observations to predefined categorical labels known as classes. Given Equation = {(x1, yj), …, (xn, yj)}, classification aims to output a categorical or discrete value Equation to a never before seen example, thereby assigning it to a predicted class. There are binary classification problems which involve learning to assign observations to one of two potential classes where Equation = {0, 1}. In multi-class problems there are potentially many classes that can be assigned e.g., Equation = {Healthy, Disease, …, Unknown}. Classification may also involve outputting a probability estimate along with each predicted class. This can take the form of a tuple (a finite ordered list) e.g., Equation  = (0, 0.4, 1, 0.6), which indicates class zero is predicted with 40% probability, and class one with 60%.

    Reduction

    Past experience is used to distil large volumes of complex information into a subset of information relevant for learning or decision making. Given Equation = {(x1, yj), …, (xn, yj)}, reduction, more commonly known as dimensionality reduction, aims to output Equation which describes examples using only their most important characteristics. In practice this means Equation uses far fewer features than Equation , possibly by removing features that are redundant and irrelevant with respect to the feedback labels.

    Potential sources for error

    To enable these forms of learning, automated methods use mathematics to connect features to labels using their inherent numerical properties. As already discussed, this process is subject to error and bias, making automated systems fallible. There are multiple of sources of potential error in the learning/decision-making process.

    Poor quality data

    The data stored in Equation may be poor quality, perhaps due to the presence of noise or because information is low resolution (not granular enough). Poor quality information may be collected if the processes used to record data are prone to error. Whilst data sources used to populate Equation (e.g., databases or data warehouses) may be incomplete or contain missing values and entries. This may be due to systemic reasons, such as privacy concerns. A clinical records system, often populated via error-prone human input, is one potential real-world source of poor-quality data. Automated systems struggle to learn from poor quality data impacting learning performance.

    Data bias and imbalance

    It is possible for the data in Equation to be biased, perhaps due to way it was collected or recorded, or imbalanced due to the very nature of the data available in the real-world. For example, if attempting to train an automated system to recognize a rare disease in patients, Equation is likely to contain little experience of that disease, since its inherent rarity means there are so few examples around. Whilst when recording data, any pre-processing applied to it, such as converting continuous numerical values into discrete values (e.g., 1.321 to 1), can inadvertently introduce bias. In general, data pre-processing carried out prior to insertion into Equation is a potential source of bias (e.g., sampling bias).

    Information-poor features

    The features used to describe experiences are crucial for learning. Humans are able to innately extract key features whilst automated systems are not (with some rare exceptions). Thus, to help automated systems learn, we must identify a set of optimal features to store in Equation . We aim to use features that are information rich, as these are most helpful for learning. For example, when examining a patient suspected to have asthma, only information-rich features are used to make a diagnosis (e.g., spirometry measurements, presence of wheezing etc.). Features that are information poor (e.g., height, eye color, skin tone) are not considered. At some stage a decision must be made regarding which features to use. This process is known as feature selection. Feature selection is an important subfield within machine learning, though it is beyond the scope of this chapter to describe the underlying theory further. Interested readers are advised to follow-up elsewhere (Shannon & Weaver, 1949, Guyon & Elisseeff, 2003).

    In many cases the features we select represent implicit hypotheses. In selecting them we presume their utility for some learning task. Some (but not all) ML algorithms rely on feature selection either by design or for performance improvements. In many cases, these hypotheses are untested, thus the information stored in Equation can end up being information-poor, yielding sub-optimal learning performance. Whilst humans are often able to ascertain the utility of features with success, this is not always the case, where domain knowledge is limited or there are many unknowns. Where feature selection is poor, it is not usually the fault of the algorithm for inevitable poor performance, although in practice algorithms are often made scapegoats for poor human judgment and inappropriate feature selection. After all, algorithms are not smart enough to know that correlation does not equal causation. Thus, automated methods can easily attune to patterns in data that are spurious leading to odd results.

    Cognitive bias

    Algorithms are capable of learning in different ways. They also possess differing learning priorities. For example, suppose an algorithm is required to recognize an exceptionally rare disease in patient records as accurately as possible. To maximize accuracy, an algorithm may learn to simply always predict a non-disease label – as this approach will achieve the greatest accuracy possible. To illustrate, if the prevalence of a disease is 1 in every 10,000 patients, then always predicting no disease on a sample of 1 million patients (assuming 100 patients with the disease) will yield an accuracy of 99.99%. Clearly this means no positive diagnoses will ever be made. Yet the algorithm does not understand the inherent mistake it has made. It is simply biased toward maximising accuracy. This will be discussed further in Chapter 7 (Model Evaluation and Selection).

    Violating the i.i.d assumption

    Machine learning models are not possessed of magic. They are capable of identifying only those patterns that exist in the data provided to them. If those patterns do not exist in the counterpart data in the real-world, then those models will perform extremely poorly in practice. The good performance of a machine learning model is therefore predicated on a fundamental assumption holding true - that the data used to teach an algorithm will be independent of, and identically distributed to, data in the real world. This is known more formally as the independent and identically distributed (i.i.d) assumption. When we say similarly distributed, we mean similar statistical properties: mean, standard deviation, range, correlations etc. Where this assumption does not hold, one cannot expect a learning algorithm to generalize well beyond the data used to teach it.

    Consider this analogy: Suppose a medical student facing an examination on human anatomy is provided with learning materials that only describe the human brain. In this case the data used for learning is not identically distributed to the data required to pass the test, thus generalization will be poor. Alternatively, suppose for the same exam, all exam questions now focus on the brain. In this case the student passes with flying colors. Yet now the data used during learning and testing are not reflective of the knowledge needed in the real-world. Thus, we obtain a misleading impression of the student’s anatomy knowledge. Fundamental violations of the i.i.d assumption are incredibly common. For example, Google recently trained a system to recognize diabetes in patients based on retinal scans (Beede et al., 2020). It performed incredibly well in the lab. Yet when applied in the real world, performance was poor (Coldeway, 2020), primarily due to violations of the i.i.d assumption. This example is not intended to highlight the failures from colleagues at Google, but it could have been due to many reasons, such as sampling issues and artifacts seen in the data (photos). Nevertheless, it illustrates that, if the tech-giants with teams of Ph.D.-educated researchers can build systems susceptible to such issues, then the problem is bewilderingly common. In some cases, there is no option but to violate the assumption, as no real-world data sets exist in sufficient quantities for analysis.

    Nature of automated learning

    The data shown in Fig. 1.4 represents a collection of patient records described using two features. The stars represent patients with a disease and circles represent those without disease. The data are stored in an experience set that includes ground truth labels. From now on we refer to this as a training set where,

    Equation

    Fig. 1.4  Patient examples represented using two features (2-dimensional data). Feature 1 is represented on the horizontal x - axis and feature 2 on the vertical y -axis. The stars represent patients with some disease, while circles are those patients who do not have the disease. (b) shows the data separated sub-optimally using a single vertical line. (c) shows perfect separation using an equation of the line of the form y = mx + c .

    Given this data, we wish to teach a classification system to recognize disease. Once trained, we aim to use it to make predictions upon never before seen patients. For now we desire a straightforward system that will output predictions in the form of discrete numerical values. Either 0 corresponding to non-disease or 1 corresponding to disease, so that Equation = {0, 1}. A cursory glance at the data in Fig. 1.4 reveals that at least visually, the data is separable in 2-dimensional space. It is entirely possible to separate the data perfectly using a single line. One sub-optimal line is shown in Fig. 1.4b. This is the simplest possible line we can create. It is defined by a single Equation -axis value, or in other words, a vertical line corresponding to a specific value of feature Equation . This is also shown Fig. 1.5. The vertical line splits examples into two groups according to their value of feature Equation . If an example has an Equation value less than or equal to φ, it is assigned a 0 label, otherwise 1 is assigned. A line able to separate data so that examples residing on one side of the line are labeled 1s and those on the other side 0s, is known as a decision boundary. When the boundary is comprised of a single line, we have a linear decision boundary.

    Training error

    The boundary shown in Fig. 1.5 is clearly sub-optimal. We can work this out manually, by counting the number of mistakes made when classifying using this line. We can also quantify the total number of mistakes made for any given line mathematically, using the notion of training error. This is usually expressed mathematically as the sum of total errors normalized by the number of training samples. Training error represents important feedback for machine learning algorithms. Using such feedback, we can search through all possible threshold values for feature Equation and find the best separating value using training error to guide us. What we are describing here is a computational optimization problem: find a value for feature Equation that minimizes the training error. The hypothetical optimization search space is depicted in Fig. 1.6. Finding the optimum value is difficult; as it is possible to find solutions in the search space that correspond to local instead of global minima. In order to find a global minima (corresponding to the optimum threshold that minimises training error) we often need to exhaustively search all potential threshold values, which can be time consuming.

    Fig. 1.5  Patient examples split using a simple vertical line defined by a chosen value for feature x 1 . The line corresponds to a threshold, such that if an example has an x 1 value less than or equal to φ , it is assigned a 0 label, otherwise 1 is assigned.

    Fig. 1.6  The optimisation search space as function of training error over the number of trials.

    Using training error as a guide, the process of finding an optimum threshold value we can use to split the data represents our first example of machine learning. Indeed, we have just described how a model called the decision stump works at a fundamental level. Whilst the model is simple, the underlying learning process is similar throughout machine learning. In other words, learning is reduced to a mathematical optimisation procedure that aims to find optimal parameter values for an equation that minimises training error. With that in mind, we try to improve upon the simple decision stump, to introduce more complex aspects of machine learning.

    Looking back at Fig. 1.4c, we note that the data can be separated using a line with a gradient. You may recall that such lines can be described using the equation of a line,

    Equation

    where Equation is a numerical value representing the gradient of the line, and Equation the intercept i.e., the position on the Equation -axis crossed by the line when Equation = 0. In this case, learning is reduced to finding optimal values for Equation and Equation that create a line perfectly separating the data. As there are now two parameters ( Equation and Equation ), we have a slightly more complex optimisation problem than before. We must search through the possible values for both these parameters one at a time in small steps (e.g., m = 0, then m = 0.1, m = 0.2 etc.), measuring the

    Enjoying the preview?
    Page 1 of 1