Machine Learning with Noisy Labels: Definitions, Theory, Techniques and Solutions

Ebook642 pages6 hours

Machine Learning with Noisy Labels: Definitions, Theory, Techniques and Solutions

Name: Machine Learning with Noisy Labels: Definitions, Theory, Techniques and Solutions
Author: Gustavo Carneiro
ISBN: 9780443154423

By Gustavo Carneiro

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Most of the modern machine learning models, based on deep learning techniques, depend on carefully curated and cleanly labelled training sets to be reliably trained and deployed. However, the expensive labelling process involved in the acquisition of such training sets limits the number and size of datasets available to build new models, slowing down progress in the field. Alternatively, many poorly curated training sets containing noisy labels are readily available to be used to build new models. However, the successful exploration of such noisy-label training sets depends on the development of algorithms and models that are robust to these noisy labels.

Machine learning and Noisy Labels: Definitions, Theory, Techniques and Solutions defines different types of label noise, introduces the theory behind the problem, presents the main techniques that enable the effective use of noisy-label training sets, and explains the most accurate methods developed in the field.

This book is an ideal introduction to machine learning with noisy labels suitable for senior undergraduates, post graduate students, researchers and practitioners using, and researching into, machine learning methods.

Shows how to design and reproduce regression, classification and segmentation models using large-scale noisy-label training sets
Gives an understanding of the theory of, and motivation for, noisy-label learning
Shows how to classify noisy-label learning methods into a set of core techniques

Skip carousel

Intelligence (AI) & Semantics

LanguageEnglish

PublisherAcademic Press

Release dateFeb 23, 2024

ISBN9780443154423

Author

Gustavo Carneiro

Professor Gustavo Carneiro, Artificial Intelligence and Machine Learning, University of Surrey, UK.

Related authors

Skip carousel

Related to Machine Learning with Noisy Labels

Related ebooks

Skip carousel

Introduction to Algorithms for Data Mining and Machine Learning
Ebook
Introduction to Algorithms for Data Mining and Machine Learning
byXin-She Yang
Rating: 0 out of 5 stars
0 ratings
Recent Advances in Ensembles for Feature Selection
Ebook
Recent Advances in Ensembles for Feature Selection
byVerónica Bolón-Canedo
Rating: 0 out of 5 stars
0 ratings
Design of Experiments for Engineers and Scientists
Ebook
Design of Experiments for Engineers and Scientists
byJiju Antony
Rating: 0 out of 5 stars
0 ratings
State of the Art on Grammatical Inference Using Evolutionary Method
Ebook
State of the Art on Grammatical Inference Using Evolutionary Method
byHari Mohan Pandey
Rating: 0 out of 5 stars
0 ratings
Sensory Evaluation Practices
Ebook
Sensory Evaluation Practices
byHerbert Stone
Rating: 0 out of 5 stars
0 ratings
Deep Learning: Convergence to Big Data Analytics
Ebook
Deep Learning: Convergence to Big Data Analytics
byMurad Khan
Rating: 0 out of 5 stars
0 ratings
Sensory Evaluation Practices
Ebook
Sensory Evaluation Practices
byElsevier Books Reference
Rating: 5 out of 5 stars
5/5
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Ebook
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
bySteven Cooper
Rating: 3 out of 5 stars
3/5
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning with Python: A Problem-Solver's Guide to Building Real-World Intelligent Systems
Ebook
Practical Machine Learning with Python: A Problem-Solver's Guide to Building Real-World Intelligent Systems
byDipanjan Sarkar
Rating: 0 out of 5 stars
0 ratings
Introduction to Deep Learning Business Applications for Developers: From Conversational Bots in Customer Service to Medical Image Processing
Ebook
Introduction to Deep Learning Business Applications for Developers: From Conversational Bots in Customer Service to Medical Image Processing
byArmando Vieira
Rating: 0 out of 5 stars
0 ratings
Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition: Apply DL, GANs, VAEs, deep RL, unsupervised learning, object detection and segmentation, and more, 2nd Edition
Ebook
Advanced Deep Learning with TensorFlow 2 and Keras - Second Edition: Apply DL, GANs, VAEs, deep RL, unsupervised learning, object detection and segmentation, and more, 2nd Edition
byRowel Atienza
Rating: 0 out of 5 stars
0 ratings
Practical Machine Learning for Data Analysis Using Python
Ebook
Practical Machine Learning for Data Analysis Using Python
byAbdulhamit Subasi
Rating: 0 out of 5 stars
0 ratings
Professional Penetration Testing: Volume 1: Creating and Learning in a Hacking Lab
Ebook
Professional Penetration Testing: Volume 1: Creating and Learning in a Hacking Lab
byThomas Wilhelm
Rating: 4 out of 5 stars
4/5
Machine Learning in Manufacturing: Quality 4.0 and the Zero Defects Vision
Ebook
Machine Learning in Manufacturing: Quality 4.0 and the Zero Defects Vision
byCarlos A. Escobar
Rating: 0 out of 5 stars
0 ratings
Multi-criteria Decision Analysis: Methods and Software
Ebook
Multi-criteria Decision Analysis: Methods and Software
byAlessio Ishizaka
Rating: 0 out of 5 stars
0 ratings
Building Intelligent Information Systems Software: Introducing the Unit Modeler Development Technology
Ebook
Building Intelligent Information Systems Software: Introducing the Unit Modeler Development Technology
byThomas D. Feigenbaum
Rating: 0 out of 5 stars
0 ratings
Learning Analytics Cookbook: How to Support Learning Processes Through Data Analytics and Visualization
Ebook
Learning Analytics Cookbook: How to Support Learning Processes Through Data Analytics and Visualization
byRoope Jaakonmäki
Rating: 0 out of 5 stars
0 ratings
Data Analysis in the Cloud: Models, Techniques and Applications
Ebook
Data Analysis in the Cloud: Models, Techniques and Applications
byDomenico Talia
Rating: 0 out of 5 stars
0 ratings
The Decision Maker's Handbook to Data Science: A Guide for Non-Technical Executives, Managers, and Founders
Ebook
The Decision Maker's Handbook to Data Science: A Guide for Non-Technical Executives, Managers, and Founders
byStylianos Kampakis
Rating: 0 out of 5 stars
0 ratings
Multicriteria Portfolio Construction with Python
Ebook
Multicriteria Portfolio Construction with Python
byElissaios Sarmas
Rating: 0 out of 5 stars
0 ratings
Reliability Analysis and Plans for Successive Testing: Start-up Demonstration Tests and Applications
Ebook
Reliability Analysis and Plans for Successive Testing: Start-up Demonstration Tests and Applications
byNarayanaswamy Balakrishnan
Rating: 0 out of 5 stars
0 ratings
Fundamentals of Data Science: Theory and Practice
Ebook
Fundamentals of Data Science: Theory and Practice
byJugal K. Kalita
Rating: 0 out of 5 stars
0 ratings
Computation and Storage in the Cloud: Understanding the Trade-Offs
Ebook
Computation and Storage in the Cloud: Understanding the Trade-Offs
byDong Yuan
Rating: 5 out of 5 stars
5/5
Lifelong Learning for Engineers and Scientists in the Information Age
Ebook
Lifelong Learning for Engineers and Scientists in the Information Age
byAshok Naimpally
Rating: 0 out of 5 stars
0 ratings
Supervised Learning with Python: Concepts and Practical Implementation Using Python
Ebook
Supervised Learning with Python: Concepts and Practical Implementation Using Python
byVaibhav Verdhan
Rating: 0 out of 5 stars
0 ratings
Optimum-Path Forest: Theory, Algorithms, and Applications
Ebook
Optimum-Path Forest: Theory, Algorithms, and Applications
byAlexandre Xavier Falcao
Rating: 0 out of 5 stars
0 ratings
Executing Windows Command Line Investigations: While Ensuring Evidentiary Integrity
Ebook
Executing Windows Command Line Investigations: While Ensuring Evidentiary Integrity
byChet Hosmer
Rating: 0 out of 5 stars
0 ratings
Domain-Specific Knowledge Graph Construction
Ebook
Domain-Specific Knowledge Graph Construction
byMayank Kejriwal
Rating: 0 out of 5 stars
0 ratings
Computer Vision Using Deep Learning: Neural Network Architectures with Python and Keras
Ebook
Computer Vision Using Deep Learning: Neural Network Architectures with Python and Keras
byVaibhav Verdhan
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
Ebook
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
byJerry Kaplan
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
Podcast episode
Fast.ai, AutoML, and Software Engineering for ML: Jeremy Howard // Coffee Session #47
byMLOps.community
0 ratings
0% found this document useful
Privacy Engineering at CMU and Privacy Decision Making with Dr. Lorrie Cranor: Dr. Lorrie Cranor began her career in privacy 25 years ago and has been a professor at Carnegie Mellon University in the School of Computer Science for 19 years. Today, she serves as director and professor for the CMU privacy engineering program.In this ...
Podcast episode
Privacy Engineering at CMU and Privacy Decision Making with Dr. Lorrie Cranor: Dr. Lorrie Cranor began her career in privacy 25 years ago and has been a professor at Carnegie Mellon University in the School of Computer Science for 19 years. Today, she serves as director and professor for the CMU privacy engineering program.In this ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
What is Facilitated Communication? Session 199 with Jason Travers: If your social media consumption is anything like mine, you've likely seen some as of late that report on non-speaking students - generally students with Autism - who are graduating from college, giving valedictorian speeches, and so...
Podcast episode
What is Facilitated Communication? Session 199 with Jason Travers: If your social media consumption is anything like mine, you've likely seen some as of late that report on non-speaking students - generally students with Autism - who are graduating from college, giving valedictorian speeches, and so...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
#140 Isabelle Guyon: The Future of AI and Support Vector Machines: This episode is sponsored by MindStudio by YouAi. MindStudio is the best way to build an AI business. Start driving some serious revenue before everyone else. Mind Studio allows you to use conversational language to program incredibly powerful AI...
Podcast episode
#140 Isabelle Guyon: The Future of AI and Support Vector Machines: This episode is sponsored by MindStudio by YouAi. MindStudio is the best way to build an AI business. Start driving some serious revenue before everyone else. Mind Studio allows you to use conversational language to program incredibly powerful AI...
byEye On A.I.
0 ratings
0% found this document useful
Using Data To Find The Perfect College with Dave Hurwitt Summer of AI Series Transformative Principal 541
Podcast episode
Using Data To Find The Perfect College with Dave Hurwitt Summer of AI Series Transformative Principal 541
byTransformative Principal
0 ratings
0% found this document useful
Journal Review in Surgical Education: The OR Black Box
Podcast episode
Journal Review in Surgical Education: The OR Black Box
byBehind The Knife: The Surgery Podcast
0 ratings
0% found this document useful
344: Responsible Consumption and Production of Research with Elizabeth Engel and Polly Karpowicz: It’s critical that learning business professionals pay careful attention to the research they create and the research they rely on for making decisions. This means asking questions, knowing the research methods used, and understanding the...
Podcast episode
344: Responsible Consumption and Production of Research with Elizabeth Engel and Polly Karpowicz: It’s critical that learning business professionals pay careful attention to the research they create and the research they rely on for making decisions. This means asking questions, knowing the research methods used, and understanding the...
byLeading Learning Podcast
0 ratings
0% found this document useful
Amy Fleischer & Corinne Nelson - Implementing a Specific Language System First Approach to AAC Selection - Part 2: This week, we present part 2 of Chris’s interview with Amy Fleischer and Corinne Nelson! Amy and Corinne continue with their questions about changing their district to a “specific language system first” model of device selection, and how it can be adapt...
Podcast episode
Amy Fleischer & Corinne Nelson - Implementing a Specific Language System First Approach to AAC Selection - Part 2: This week, we present part 2 of Chris’s interview with Amy Fleischer and Corinne Nelson! Amy and Corinne continue with their questions about changing their district to a “specific language system first” model of device selection, and how it can be adapt...
byTalking With Tech AAC Podcast
0 ratings
0% found this document useful
48: Nanobiosensors: Nanotechnology for Next Generation Diagnostics (ft. Dr. Arben Merkoci): Nanotechnology is having a profound impact on the development of a new class of biosensors known as nanobiosensors. Episode 48 reveals the world of nanobiosensors, how they work, and their impactful applications as diagnostic devices. Check out our MSE...
Podcast episode
48: Nanobiosensors: Nanotechnology for Next Generation Diagnostics (ft. Dr. Arben Merkoci): Nanotechnology is having a profound impact on the development of a new class of biosensors known as nanobiosensors. Episode 48 reveals the world of nanobiosensors, how they work, and their impactful applications as diagnostic devices. Check out our MSE...
byIt's a Material World | Materials Science Podcast
0 ratings
0% found this document useful
MacVoices #21044: eGlass Enhances Both The Online and Blended Learning Experiences (1): The creators of eGlass, a new tool that enhances and improves the teaching experience, join us to not only demo the product, but also to discuss how it changes the teaching and presentation experience into something much more personal for...
Podcast episode
MacVoices #21044: eGlass Enhances Both The Online and Blended Learning Experiences (1): The creators of eGlass, a new tool that enhances and improves the teaching experience, join us to not only demo the product, but also to discuss how it changes the teaching and presentation experience into something much more personal for...
byMacVoices
0 ratings
0% found this document useful
Exploring Open-Source for Tissue Image Analysis and Data Science Business w/ Trevor McKee, Pathomics.io
Podcast episode
Exploring Open-Source for Tissue Image Analysis and Data Science Business w/ Trevor McKee, Pathomics.io
byDigital Pathology Podcast
0 ratings
0% found this document useful
Joshua Bloom - Machine Learning for the Stars & Productizing AI - TWiML Talk #5: My guest this time is Joshua Bloom. Josh is profe…
Podcast episode
Joshua Bloom - Machine Learning for the Stars & Productizing AI - TWiML Talk #5: My guest this time is Joshua Bloom. Josh is profe…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
ProductizeML: Assisting Your Team to Better Build ML Products // Adrià Romero // MLOps Meetup #47
Podcast episode
ProductizeML: Assisting Your Team to Better Build ML Products // Adrià Romero // MLOps Meetup #47
byMLOps.community
0 ratings
0% found this document useful
MacVoices #21045: eGlass Enhances Both The Online and Blended Learning Experiences (2): The second part of our conversation about eGlass with Ji Shen, the CEO of <a href="http://hovercam.com">HoverCam</a>, Dr. Matt...
Podcast episode
MacVoices #21045: eGlass Enhances Both The Online and Blended Learning Experiences (2): The second part of our conversation about eGlass with Ji Shen, the CEO of <a href="http://hovercam.com">HoverCam</a>, Dr. Matt...
byMacVoices
0 ratings
0% found this document useful
101: Quantum Disruption: The Future of Materials Discovery | (ft. Dr. David Muñoz Ramo): By leveraging the power of quantum computing (QC), scientists can quickly identify promising materials (new or existing) for ANY application. QC enables this while saving on hefty lab operation costs, enabling speedy and cheap materials discovery. In...
Podcast episode
101: Quantum Disruption: The Future of Materials Discovery | (ft. Dr. David Muñoz Ramo): By leveraging the power of quantum computing (QC), scientists can quickly identify promising materials (new or existing) for ANY application. QC enables this while saving on hefty lab operation costs, enabling speedy and cheap materials discovery. In...
byIt's a Material World | Materials Science Podcast
0 ratings
0% found this document useful
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
Podcast episode
Is data science something for you?: Interview with Cytel statisticians Yannis Jemiai and Rajat Mukherjee
byThe Effective Statistician - in association with PSI
0 ratings
0% found this document useful
Unlocking Digital Literacy with Practical Classroom Strategies with Dr. Jenna Kammer and Dr. Lauren Hays - 245: In this episode, I speak with Dr. Lauren Hays and Dr. Jenna Kammer, instructional technology experts, and co-authors, about strategies for integrating digital literacy skills across content areas and learning environments. You'll also hear actionable...
Podcast episode
Unlocking Digital Literacy with Practical Classroom Strategies with Dr. Jenna Kammer and Dr. Lauren Hays - 245: In this episode, I speak with Dr. Lauren Hays and Dr. Jenna Kammer, instructional technology experts, and co-authors, about strategies for integrating digital literacy skills across content areas and learning environments. You'll also hear actionable...
byEasy EdTech Podcast with Monica Burns
0 ratings
0% found this document useful
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
Podcast episode
Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel: Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. As the sophistication increases, so does the complexity, leading to challenges for user experience. Jignesh Patel has been researching these areas for several years in his work as a professor at Carnegie Mellon University. In this episode he illuminates the landscape of problems that we are faced with and how his research is aimed at helping to solve these problems.
byData Engineering Podcast
0 ratings
0% found this document useful
Kara Cotter: Creating Self-Paced Training for Communication Partners (Part 2): This week, we present Part 2 of Chris’s interview with Kara Cotter, a school-based AAC/AT Specialist who contacted Chris to ask about improving buy in, moving to the coaching model, making AAC more inclusive, and more! Before the interview, Chris shar...
Podcast episode
Kara Cotter: Creating Self-Paced Training for Communication Partners (Part 2): This week, we present Part 2 of Chris’s interview with Kara Cotter, a school-based AAC/AT Specialist who contacted Chris to ask about improving buy in, moving to the coaching model, making AAC more inclusive, and more! Before the interview, Chris shar...
byTalking With Tech AAC Podcast
0 ratings
0% found this document useful
343: Forging Effective Learning with Bror Saxberg
Podcast episode
343: Forging Effective Learning with Bror Saxberg
byLeading Learning Podcast
0 ratings
0% found this document useful
Sydney Elcan Birchfield: Assistive Technology Q&A with Chris Bugaj: This week, we share Chris’s interview with Sydney Elcan Birchfield, an OT Assistant working in the schools and a graduate student in Assistive Technology at George Mason University! Sydney interviews Chris about his career and approach to assistive t...
Podcast episode
Sydney Elcan Birchfield: Assistive Technology Q&A with Chris Bugaj: This week, we share Chris’s interview with Sydney Elcan Birchfield, an OT Assistant working in the schools and a graduate student in Assistive Technology at George Mason University! Sydney interviews Chris about his career and approach to assistive t...
byTalking With Tech AAC Podcast
0 ratings
0% found this document useful
#338: Site Selection for Clinical Trials
Podcast episode
#338: Site Selection for Clinical Trials
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
Ep 46: Dr. Sam Johnston on Design-based Research: On this episode, I am joined by Dr. Sam Johnston, a research scientist with The Center for Applied Special Technology, or CAST. With support of the Gates Foundation’s Open Professionals Education Network, she recently led the development of UDL On...
Podcast episode
Ep 46: Dr. Sam Johnston on Design-based Research: On this episode, I am joined by Dr. Sam Johnston, a research scientist with The Center for Applied Special Technology, or CAST. With support of the Gates Foundation’s Open Professionals Education Network, she recently led the development of UDL On...
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
Conversation with Dr. Guido Lang, Associate Professor, Quinnipiac University: Guido shares his insights on developing new ideas and bringing them to the market, especially in the technology space. He reinforces the idea that staying focused on solving one particular program first, before trying to scale, is key to a successful new business.
Podcast episode
Conversation with Dr. Guido Lang, Associate Professor, Quinnipiac University: Guido shares his insights on developing new ideas and bringing them to the market, especially in the technology space. He reinforces the idea that staying focused on solving one particular program first, before trying to scale, is key to a successful new business.
byRetail Revolution
0 ratings
0% found this document useful
#88 - Observability Engineering - Liz Fong-Jones
Podcast episode
#88 - Observability Engineering - Liz Fong-Jones
byTech Lead Journal
0 ratings
0% found this document useful
New Challenges for IT Professionals
Podcast episode
New Challenges for IT Professionals
byThe Cloudcast
0 ratings
0% found this document useful
100: Nanotechnology and the Brain: Fundamentals of Neuromorphic Computing | (ft. Dr. Jean Anne Incorvia): Biological brains can accomplish more than modern computing systems while using much less power. However, computers are much better at dealing with computation, while brains are (unsurprisingly) much better at interacting with ever-changing environme...
Podcast episode
100: Nanotechnology and the Brain: Fundamentals of Neuromorphic Computing | (ft. Dr. Jean Anne Incorvia): Biological brains can accomplish more than modern computing systems while using much less power. However, computers are much better at dealing with computation, while brains are (unsurprisingly) much better at interacting with ever-changing environme...
byIt's a Material World | Materials Science Podcast
0 ratings
0% found this document useful
Charting New Territories in PD: The Whitsby Story with ASCD: Welcome to a treasure from my ISTE 2023 vault. This is a podcast for all education trailblazers seeking the cutting edge of professional growth. As a teacher committed to improving teaching and improving our professional development, I'm thrilled to...
Podcast episode
Charting New Territories in PD: The Whitsby Story with ASCD: Welcome to a treasure from my ISTE 2023 vault. This is a podcast for all education trailblazers seeking the cutting edge of professional growth. As a teacher committed to improving teaching and improving our professional development, I'm thrilled to...
by10 Minute Teacher Podcast with Cool Cat Teacher
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
Human Factors Trials – An interview with Vicki Wall FCIEHF: Many practitioners believe that working on Trials and Assessments, where equipment and systems that have been designed and developed are tested with users, are some of the most enjoyable parts of being involved in Human Factors.
Podcast episode
Human Factors Trials – An interview with Vicki Wall FCIEHF: Many practitioners believe that working on Trials and Assessments, where equipment and systems that have been designed and developed are tested with users, are some of the most enjoyable parts of being involved in Human Factors.
by1202 - The Human Factors Podcast
0 ratings
0% found this document useful

Skip carousel

How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
PEOPLE ASSESSMENT in the Digital Age
The European Business Review
Article
PEOPLE ASSESSMENT in the Digital Age
May 25, 2021
8 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
What Do Academics Think?
The Big Issue Magazine
Article
What Do Academics Think?
May 19, 2023
3 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
AI And Design: Questions Of Ethics
Architecture Australia
Article
AI And Design: Questions Of Ethics
Mar 4, 2024
Artificial intelligence (AI) is a very old idea, but the term AI and the field of AI as it relates to modern programmable digital computing have taken their contemporary forms in the past 70 years.1Today, we interact with AI technologies constantly,
5 min read
The Era of Human + Machine Innovation
Rotman Management
Article
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
Innovation Amidst a Pandemic
Rotman Management
Article
Innovation Amidst a Pandemic
Jan 1, 2021
So much has changed in recent months. How is the pandemic affecting AI and machine learning innovation? Most machine learning is used to make predictions about the future and to help business leaders make better decisions today. Fortunately, the type
5 min read
Four Critical Skills For Tomorrow’s Innovation Workforce
Rotman Management
Article
Four Critical Skills For Tomorrow’s Innovation Workforce
Sep 1, 2020
12 min read
Ideas and Resources for Growing Youth Involvement in Amateur Radio
CQ Amateur Radio
Article
Ideas and Resources for Growing Youth Involvement in Amateur Radio
Mar 1, 2020
11 min read
The Future Is Now
Palm Beach Illustrated
Article
The Future Is Now
Aug 19, 2019
5 min read
Fact-check And Verify Information
Post South Africa
Article
Fact-check And Verify Information
Mar 13, 2024
Q: What is AI? A: AI is the acronym for artificial intelligence (AI) and refers to the development of computer systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-maki
3 min read
STEM Online Learning Resources
BBC Science Focus Magazine
Article
STEM Online Learning Resources
Sep 3, 2020
4 min read
The Amnesia Antidote
Marketing
Article
The Amnesia Antidote
Feb 11, 2019
4 min read
THE ART OF FUTURE DESIGN — PART II: Deployment, Wholeness, and Impact on Human Beings
The European Business Review
Article
THE ART OF FUTURE DESIGN — PART II: Deployment, Wholeness, and Impact on Human Beings
Oct 2, 2023
18 min read
Research in Large Australian Practices: A Roundtable Discussion
Architecture Australia
Article
Research in Large Australian Practices: A Roundtable Discussion
Jul 2, 2018
What research is actually happening in large architectural practices in Australia? How are practices pursuing research and what are their motivations? What do they see as the benefits and how are they justifying the cost? What are the challenges and
12 min read
Championing Triple Ownership Programs (TOPs) In Executive Education
The European Business Review
Article
Championing Triple Ownership Programs (TOPs) In Executive Education
Mar 1, 2022
6 min read
Human-centred Design
Facility Management
Article
Human-centred Design
Dec 23, 2018
There was a recent report in The Sydney Morning Herald that was very misleading in its representation of the discipline of ergonomics. Citing the testimony of an authority on back pain, Sydney University professor, Chris Maher, the piece fundamentall
4 min read
Arrested Development
Architecture Australia
Article
Arrested Development
Jul 2, 2018
Architectural firms are great at creating knowledge and value through design. But when it comes to R and D, architects are good at doing the R but not so good at the D. Arguably, the disintermediation of architects in procurement processes, the rise
4 min read
Change Sustainability Your ROI Health Check
Facility Management
Article
Change Sustainability Your ROI Health Check
Mar 28, 2019
Change sustainability programs are as unique as each of the workplace projects themselves. They can be developed either to dovetail into the end of a change program where businessas-usual kicks in, or they can be integrated into a prototype rotation
5 min read
Searching and Searching Again: Research in Practice
Architecture Australia
Article
Searching and Searching Again: Research in Practice
Jul 2, 2018
3 min read
Education 2.0: The Destructive Reconstruction of Higher Learning
Rotman Management
Article
Education 2.0: The Destructive Reconstruction of Higher Learning
Jan 1, 2018
8 min read
Quantum Jump
Business Today
Article
Quantum Jump
Dec 25, 2018
2 min read
Make A Career-smart Move
True Love
Article
Make A Career-smart Move
Sep 16, 2019
4IR technology: Self-checkout counters Your move: Learn how self-checkout works in the e-commerce space. There is no teller when you buy online, yet customers are able to make purchases of the goods they are looking for. Most e-commerce sites allow
1 min read
Technology At The Crossroads
AQ: Australian Quarterly
Article
Technology At The Crossroads
Dec 31, 2018
7 min read
“The Process Of Designing, Testing, Prototyping And Perfecting Is Never Ending”
PC Pro Magazine
Article
“The Process Of Designing, Testing, Prototyping And Perfecting Is Never Ending”
Apr 6, 2023
There are many things to do when starting a company. Find desk space, register the company, get a bank account, set up the website and all the other tasks that require different hats to be worn. If the idiom were reality, hatters and milliners would
7 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
AppleMagazine
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 29, 2024
4 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
TechLife News
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 30, 2024
4 min read
The Four Ways Of Organizing Innovation
The European Business Review
Article
The Four Ways Of Organizing Innovation
May 25, 2021
9 min read
Catch Up!
Facility Management
Article
Catch Up!
Feb 25, 2018
Workplace design and practices in Australia have evolved rapidly over the last decade or so. Businesses have taken note of trends in the creation of innovative workplaces when it comes to considerations such as aesthetics, functionality and, certainl
3 min read

Related categories

Skip carousel

Reviews for Machine Learning with Noisy Labels

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Machine Learning with Noisy Labels - Gustavo Carneiro

Chapter 1: Problem definition

Motivation, introduction, and challenges

Abstract

This chapter provides an informal definition of the label noise learning problem. We start by explaining how the development of robust machine learning models would be facilitated and accelerated by the successful exploration of large-scale training sets that have not been carefully annotated and consequently contain label noise. Then, we introduce the sources and models of label noise found in large-scale training sets, where we explain why label noise represents an inevitable problem in the training of machine learning models, leading to interesting challenges that are briefly discussed.

Keywords

Label noise learning; Bias-variance decomposition; Label transition methods; Label distribution

1.1 Motivation

The last couple of decades have witnessed an unprecedented development of machine learning (ML) (Bishop, 2006) and deep learning (DL) (LeCun et al., 2015) methods that are now integral part of many image classification (Druzhkov and Kustikova, 2016), speech recognition (Nassif et al., 2019), text classification (Minaee et al., 2021), and medical image analysis (Litjens et al., 2017) techniques. In turn, these techniques are being used for the development of several systems, such as self-driving cars (Daily et al., 2017), e-commerce (Laudon and Traver, 2013), chatbots (Adamopoulou and Moussiades, 2020), recommendation systems (Karimi et al., 2018), and spam filters (Bhowmick and Hazarika, 2018), which are shaping many aspects of our society.

Arguably, the successful development of ML and DL methods critically depends on the existence of well-curated large-scale labeled datasets. Such datasets are typically formed by carefully collecting and labeling each data sample that is guaranteed to belong to a pre-defined set of classes, with a label that reliably represents the sample contents. However, we are starting to witness the availability of an increasing number of minimally-curated large-scale datasets. In such datasets, each sample may have been annotated with a noisy label that does not reliably represent the sample contents. Hence, the development of ML and DL methods that are robust to label noise is attracting much research activity.

Dataset curation can be defined as the processes of data collection and labeling. The first step in the data collection process is to define the data source and the criteria to select the samples to be included in the dataset. For example, if the goal is to build a natural image dataset, then we can collect data based on the results returned by image search engines. Another example is if we want to build a dataset of chest X-ray (CXR) images, then it is necessary to collect the CXR images available from hospitals' picture archiving and communication systems (PACS). After collecting data, the next step is the data labeling, which consists of identifying relevant classes in each data sample. For example, Fig. 1.1 shows examples of images and different types of labeling, namely: 1) multi-class (top row), where each image contains a single visual class, e.g., the image with a piggy bank is labeled with the class ‘piggy bank’, the image of a teapot is labeled with the class ‘teapot’, etc.; 2) multi-label (second row, top to bottom), with each image annotated with a set of labels, e.g., an image of a park with trees, grassy field, and a river is labeled with the classes ‘tree’, ‘river’, and ‘grass’; 3) segmentation (third row, top to bottom), where each pixel is labeled with a single visual class, so an image of a kangaroo sitting on the ground has each pixel labeled as ‘kangaroo’ or ‘background’, depending if the pixel is part of the former or latter visual class; and 4) detection (last row), with the label consisting of the bounding box coordinates and a single visual class, so an image of a bat flying in the sky is labeled with a bounding box that covers the whole bat and is annotated with the class ‘bat’. When the curation process is carefully done, with well-designed and well-executed data collection and labeling processes, it is rare, but not impossible, to find label noise in the dataset. Fig. 1.2 shows a carefully collected dataset of handwritten digits that contains images which represent well the distribution of handwritten digits, and the dataset does not contain outliers (i.e., images that do not contain handwritten digits). Fig. 1.2 also displays a few ways that the labeling process can be implemented (e.g., crowdsourcing, expert annotation, or semi-automated annotation), where the amount of label noise can be negligible if the labeling is done by experts. However, when the labeling is executed by crowdsourcing or semi-automated tools, such amount of label noise can reach relatively high rates.

Figure 1.1 Labeling examples.Top row shows how multi-class images are labeled, with each image belonging to a single class. The second row displays examples of multi-label images, where each image can be annotated with a set of labels. While the segmentation label (third row) consists of a single label per image pixel, the detection label (fourth row) has not only the label of the object inside the region of interest (ROI), represented by a rectangle, but also the ROI coordinates.

Figure 1.2 Labeling strategies.The main strategies to label a dataset are: crowdsourcing (i.e., employing crowdworkers to label data samples), hiring experts who can label especial types of data samples (e.g., medical data), and semi-automated labeling based on systems that can provide labels with minimal human intervention. While the expert-labeling strategy enables a more careful labeling process, where labels can be assumed to be clean, they tend to be expensive and slow. On the other hand, crowdsourcing and semi-automated strategies tend to produce datasets with non-negligible amount of label noise, but they also enable the quick labeling of substantially larger datasets than the expert-labeling strategy.

An example of a carefully curated dataset is ImageNet (Deng et al., 2009), which is a popular computer vision dataset that has 15 million images annotated, with more than 20,000 labels by 49,000 labelers from 167 countries, in a labeling process that took 2.5 years to complete. Another example is PadChest (Bustos et al., 2020), which is a large-scale Chest X-ray dataset containing 160,000 images from 67,000 patients, collected from 2009 to 2017. This dataset has 27% of its images manually labeled by trained physicians (i.e., experts), with the remaining images being semi-automatically labeled and manually verified. Although datasets like ImageNet (Deng et al., 2009) and PadChest (Bustos et al., 2020) have been extremely important for the development of DL and ML methods, the time and cost involved in preparing similar datasets represent a major roadblock in the development of new ML and DL applications. As a result, the field is now working on alternative ways to build large-scale datasets.

There are currently numerous examples of large-scale datasets that have been prepared with considerably less manual curation than ImageNet (Deng et al., 2009) and PadChest (Bustos et al., 2020). For instance, Google¹ has built the private dataset JFT-300M (Sun et al., 2017), which has 300M images that have been labeled by semi-automated tools using 18,000 labels. JFT-300M has recently been extended to form the JFT-3B (Zhai et al., 2022) that contains 3 billion images, annotated with 30k labels. Both JFT-300M and JFT-3B have noisy labels as a consequence of the mistakes made by the semi-automated tools. Another example is the dataset YFCC100M (Thomee et al., 2016), which is a large-scale natural image dataset with 100 million media objects collected from Flickr,² with annotations extracted from inherently noisy meta data, such as title, tags, and automatically generated labels. Similarly prepared large-scale datasets have been collected for speaker recognition (VoxCeleb2 with 1M utterances from over 6k speakers) (Chung et al., 2018), video classification (Sports-1M with more than 1M videos from 487 sports-related) (Karpathy et al., 2014), activity recognition (HowTo100M with 136M video clips from 1.2M YouTube³ videos from 23k activities) (Miech et al., 2019), and medical image analysis (Chest X-Ray14 (Wang et al., 2017b) with more than 100k Chest X-Ray images from 32k patients, and CheXpert (Irvin et al., 2019) with more than 200k Chest X-Ray images from 65k patients).

The datasets above have been minimally curated, allowing them to be larger and available at a faster rate than previous well-curated datasets (Bustos et al., 2020; Deng et al., 2009). These two advantages have the potential to enable a quicker development of ML and DL models that can be trained more robustly because of the larger size of the training sets. However, these advantages are counter-balanced with many issues that can affect these datasets such as: label noise (Frénay and Verleysen, 2014; Han et al., 2020b; Song et al., 2022), data noise (Frénay and Verleysen, 2014), imbalanced distribution of samples per class (Johnson and Khoshgoftaar, 2019), missing labels (Yu et al., 2014), multiple labels per sample (Liu et al., 2021), out-of-distribution (OoD) data (Bengio et al., 2011), domain shift (Wang and Deng, 2018), etc. Hence, one of the main challenges that the machine learning community currently faces is the following: how can we use these large-scale minimally-curated datasets to robustly train ML and DL models?

Although all the issues raised above are important and need to be addressed, in this book we focus only on the label noise issue. The relevance of focusing on the label noise issue resides in the evidence shown by Zhang et al. (2021a), who demonstrate that DL models can easily overfit label noise. Overfitting happens when the model perfectly fits the training samples, but performs poorly on testing samples. Fig. 1.3 shows the modeling of a binary classifier using a noisy-label training set, where the model overfits the training data, producing a boundary (purple solid curve) that does not represent well the true boundary of the problem (black dashed curve). This poor representation of the true boundary forces the overfit model to produce inaccurate classification for testing data, particularly for samples lying in regions where the true classification boundary and the overfit classification boundary do not match. In fact, Zhang et al. (2021a) show an extreme case where the modeling of a DL classifier can overfit the entire training data, even when all training samples have their original labels randomly flipped to other labels. However, these models will have low prediction accuracy on previously unseen correctly labeled testing data. Therefore, the successful handling of label noise will facilitate the exploration of many of the existing minimally curated datasets that are currently available in the field.

Figure 1.3 Overfitting the label noise.Modeling of a binary classifier from training images of dogs and cats that contain a few noisy-label samples (i.e., samples that have been mislabeled, as indicated in the legend), where the model overfits the training data producing a boundary (purple solid curve) that represents poorly the true (latent) boundary of the problem (black dashed curve). Such overfit model will provide low accuracy classification for unseen test samples that lie close to the true boundary.

1.2 Introduction

Let us informally introduce the label noise learning problem using multi-class image classification as an example. In this task, images are annotated with one label that is selected from a set of training labels. A common example of a multi-class image classification dataset is MNIST (LeCun, 1998), which contains black and white images of handwritten digits that are labeled using the set of labels . Fig. 1.4, top frame, shows examples of MNIST images. For most MNIST images, the image is clear enough and can be easily labeled – such images are defined to have clean labels (see top frame of Fig. 1.4). According to Frénay and Verleysen (2014), there are four sources of label noise (Fig. 1.5, frame on the left): 1) the information in the image is not sufficiently reliable for a precise labeling; 2) the labeler is not reliable; 3) there is intrinsic variability among labelers; and 4) data encoding or communication error. The middle frame of Fig. 1.4 (titled closed-set noise) contains noisy label samples, where the first two images show examples of the first source of label noise (i.e., insufficiently reliable information), the third and fourth images display unreliable labeling, the next two examples (fifth and sixth) show intrinsically ambiguous images that can lead to large variability among labelers, and the last two images show encoding or communication errors. The main difference between the label noise from unreliable labeling versus encoding or communication errors is that in the former case, even though the labeler clearly made a mistake, the wrong label can be justified by a relatively ambiguous image. However, label noise from encoding or communication errors happen randomly, where in general, the label noise cannot be explained from image ambiguities.

Figure 1.4 Types of label noise.Top frame shows samples from the MNIST handwritten digit dataset ( LeCun, 1998). The middle frame displays different types of closed-set label noise, where images belong to one of the classes in the training set. In this frame the first two images show images with insufficient information to enable a reliable annotation, the next two images show challenging images that were mislabeled, then the next two images display intrinsically ambiguous cases, and the final two images show cases of data encoding or communication error. The bottom frame shows open-set label noise, where the images do not belong to any of the MNIST classes.

Figure 1.5 Label noise sources and models.The left hand side frame shows the sources of label noise ( Frénay and Verleysen, 2014), while the right hand side frame displays the noise label models being studied in the field ( Frénay and Verleysen, 2014; Han et al., 2020b; Song et al., 2022).

While we explain the sources of label noise above, below we discuss the characterization of label noise, which was originally proposed by Frénay and Verleysen (2014), but has suffered changes in naming and scope over the last few years. In (Frénay and Verleysen, 2014), label noise was characterized as (Fig. 1.5, frame on the right): 1) noise completely at random (NCAR), 2) noise at random (NAR), and 3) noise not at random (NNAR). In NCAR, the noise consists of flipping the class label from its original label to any of the other labels completely at random, where the new noisy label can switch to any of the wrong labels with equal probability. This NCAR model is related to the fourth source of label noise in Fig. 1.5, which happens from encoding or communication errors. NCAR is currently referred to as symmetric or uniform label noise (Han et al., 2020b; Song et al., 2022). NAR models the relationship of the class labels without taking into account any information available from the data, so using MNIST as an example, NAR estimates the probability that any image of digit ‘1’ can be mislabeled as ‘7’ (and vice-versa). In other words, NAR models a transition matrix (applicable to any training sample) with the probability of switching between correct and noisy labels. NAR is currently referred to as asymmetric, pair-flipping, label-dependent noise, or instance-independent noise (Han et al., 2020b; Song et al., 2022). NNAR extends NAR to also depend on the data, resulting in a unique transition matrix per sample. This takes into account that some samples can be harder to label than others, as shown in Fig. 1.4. Currently, NNAR is referred to as instance-dependent noise or semantic label noise (Han et al., 2020b; Song et al., 2022). Both NAR and NNAR happen when either the labeler is not reliable, or when there is intrinsic variability among labelers, accounting for error sources (2) and (3) above. These three label noise models are collectively referred to as closed-set label noise models because the dataset only contains in-distribution (ID) images that belong to classes that are in the set of training labels.

When the dataset contains OoD samples, which do not belong to any of the classes that are in the set of training labels, then we have the open-set label noise model (Wang et al., 2018). For example, considering the MNIST dataset in Fig. 1.4, this problem happens when images from a different dataset, such as OMNIGLOT (Lake et al., 2015) (containing 1623 different handwritten characters from 50 different alphabets) are placed in the dataset – see Fig. 1.4 (bottom frame titled open-set noise). Similarly to the closed-set label noise, open-set label noise can be symmetric, asymmetric or instance-dependent. In symmetric open-set label noise, the OoD samples are randomly labeled with any of the training labels – this type of noise would assign any of the 10 MNIST classes to each image of the bottom frame of Fig. 1.4 with probability 10%. The asymmetric open-set label noise models the relationship from the OoD labels to the ID labels (i.e., the training labels) without considering the input data information. An example of such noise model would label the , , and characters of the bottom frame of Fig. 1.4 with MNIST labels ‘2’ or ‘5’, each with probability 50%. Furthermore, the instance-dependence open-set label noise works similarly as the asymmetric case above, but taking into account the input data information, so taking the bottom frame of Fig. 1.4, the character is arguably more likely to be labeled as ‘2’ (and less likely as ‘5’), while the character is more likely to be labeled as ‘5’ (and less likely as ‘2’). Note that open-set models are related to the error source (1) above because there is insufficient information to reliably label the

Enjoying the preview?

Page 1 of 1

Machine Learning with Noisy Labels: Definitions, Theory, Techniques and Solutions

About this ebook

Gustavo Carneiro

Related authors

Related to Machine Learning with Noisy Labels

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning with Noisy Labels

What did you think?

Book preview

Machine Learning with Noisy Labels - Gustavo Carneiro

Abstract

Keywords

1.1 Motivation

1.2 Introduction