Statistical and Neural Classifiers: An Integrated Approach to Design

Ebook506 pages5 hours

Statistical and Neural Classifiers: An Integrated Approach to Design

Name: Statistical and Neural Classifiers: An Integrated Approach to Design
Author: Sarunas Raudys
ISBN: 9781447103592

By Sarunas Raudys

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Automatic (machine) recognition, description, classification, and groupings of patterns are important problems in a variety of engineering and scientific disciplines such as biology, psychology, medicine, marketing, computer vision, artificial intelligence, and remote sensing. Given a pattern, its recognition/classification may consist of one of the following two tasks: (1) supervised classification (also called discriminant analysis); the input pattern is assigned to one of several predefined classes, (2) unsupervised classification (also called clustering); no pattern classes are defined a priori and patterns are grouped into clusters based on their similarity. Interest in the area of pattern recognition has been renewed recently due to emerging applications which are not only challenging but also computationally more demanding (e. g. , bioinformatics, data mining, document classification, and multimedia database retrieval). Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have received increased attention. Neural networks and statistical pattern recognition are two closely related disciplines which share several common research issues. Neural networks have not only provided a variety of novel or supplementary approaches for pattern recognition tasks, but have also offered architectures on which many well-known statistical pattern recognition algorithms can be mapped for efficient (hardware) implementation. On the other hand, neural networks can derive benefit from some well-known results in statistical pattern recognition.

Skip carousel

Intelligence (AI) & Semantics

LanguageEnglish

PublisherSpringer

Release dateDec 6, 2012

ISBN9781447103592

Author

Sarunas Raudys

Related authors

Skip carousel

Related to Statistical and Neural Classifiers

Related ebooks

Skip carousel

Data Mining for the Social Sciences: An Introduction
Ebook
Data Mining for the Social Sciences: An Introduction
byPaul Attewell
Rating: 0 out of 5 stars
0 ratings
Recent Advances in Ensembles for Feature Selection
Ebook
Recent Advances in Ensembles for Feature Selection
byVerónica Bolón-Canedo
Rating: 0 out of 5 stars
0 ratings
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
Ebook
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
byDavid R Swinburne
Rating: 0 out of 5 stars
0 ratings
Neural Networks and Fuzzy Logic
Ebook
Neural Networks and Fuzzy Logic
byC. Naga Bhaskar
Rating: 0 out of 5 stars
0 ratings
Psychophysics: A Practical Introduction
Ebook
Psychophysics: A Practical Introduction
byFrederick A.A. Kingdom
Rating: 0 out of 5 stars
0 ratings
Schaum's Outline of Elements of Statistics I: Descriptive Statistics and Probability
Ebook
Schaum's Outline of Elements of Statistics I: Descriptive Statistics and Probability
byStephen Bernstein
Rating: 0 out of 5 stars
0 ratings
Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications
Ebook
Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings
Neural Data Science: A Primer with MATLAB® and Python™
Ebook
Neural Data Science: A Primer with MATLAB® and Python™
byErik Lee Nylen
Rating: 5 out of 5 stars
5/5
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics
Ebook
A Python Data Analyst’s Toolkit: Learn Python and Python-based Libraries with Applications in Data Analysis and Statistics
byGayathri Rajagopalan
Rating: 0 out of 5 stars
0 ratings
The Matrixial Brain: Experiments in Reality
Ebook
The Matrixial Brain: Experiments in Reality
byPaul Chaplin
Rating: 0 out of 5 stars
0 ratings
Joint Models of Neural and Behavioral Data
Ebook
Joint Models of Neural and Behavioral Data
byBrandon M. Turner
Rating: 0 out of 5 stars
0 ratings
Human Activity Recognition and Behaviour Analysis: For Cyber-Physical Systems in Smart Environments
Ebook
Human Activity Recognition and Behaviour Analysis: For Cyber-Physical Systems in Smart Environments
byLiming Chen
Rating: 0 out of 5 stars
0 ratings
Statistics in Psychology Using R and SPSS
Ebook
Statistics in Psychology Using R and SPSS
byDieter Rasch
Rating: 0 out of 5 stars
0 ratings
SPSS for Applied Sciences: Basic Statistical Testing
Ebook
SPSS for Applied Sciences: Basic Statistical Testing
byCole Davis
Rating: 3 out of 5 stars
3/5
Cognitive Information Systems in Management Sciences
Ebook
Cognitive Information Systems in Management Sciences
byLidia Dominika Ogiela
Rating: 0 out of 5 stars
0 ratings
SPSS for you
Ebook
SPSS for you
byA Rajathi
Rating: 4 out of 5 stars
4/5
Practical Text Analytics: Maximizing the Value of Text Data
Ebook
Practical Text Analytics: Maximizing the Value of Text Data
byMurugan Anandarajan
Rating: 0 out of 5 stars
0 ratings
Exploring Data Analysis: The Computer Revolution in Statistics
Ebook
Exploring Data Analysis: The Computer Revolution in Statistics
byW. J. Dixon
Rating: 0 out of 5 stars
0 ratings
How to Research Qualitatively: Tips for Scientific Working
Ebook
How to Research Qualitatively: Tips for Scientific Working
byMartin Gertler
Rating: 0 out of 5 stars
0 ratings
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
Ebook
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
bydaniel Huston
Rating: 0 out of 5 stars
0 ratings
Fundamental Statistical Principles for the Neurobiologist: A Survival Guide
Ebook
Fundamental Statistical Principles for the Neurobiologist: A Survival Guide
byStephen W. Scheff
Rating: 5 out of 5 stars
5/5
Exploratory and Multivariate Data Analysis
Ebook
Exploratory and Multivariate Data Analysis
byMichel Jambu
Rating: 0 out of 5 stars
0 ratings
Exploring the World of Data Science and Machine Learning
Ebook
Exploring the World of Data Science and Machine Learning
byNIBEDITA Sahu
Rating: 0 out of 5 stars
0 ratings
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
Ebook
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
Ways of Knowing in HCI
Ebook
Ways of Knowing in HCI
byJudith S. Olson
Rating: 5 out of 5 stars
5/5
Analysing Data For Your PhD: An Introduction: PhD Knowledge, #3
Ebook
Analysing Data For Your PhD: An Introduction: PhD Knowledge, #3
byHelen Kara
Rating: 0 out of 5 stars
0 ratings
Pattern Recognition: Fundamentals and Applications
Ebook
Pattern Recognition: Fundamentals and Applications
byFouad Sabry
Rating: 0 out of 5 stars
0 ratings
EEG Brain Signal Classification for Epileptic Seizure Disorder Detection
Ebook
EEG Brain Signal Classification for Epileptic Seizure Disorder Detection
bySandeep Kumar Satapathy
Rating: 0 out of 5 stars
0 ratings
A Bird's Eye view of Data Visualisation
Ebook
A Bird's Eye view of Data Visualisation
byNisarg Patel
Rating: 0 out of 5 stars
0 ratings
Introduction to Statistics in Metrology
Ebook
Introduction to Statistics in Metrology
byStephen Crowder
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
Ebook
Discovery Writing with ChatGPT: AI-Powered Storytelling: Three Story Method, #6
byJ. Thorn
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
AI for Educators: AI for Educators
Ebook
AI for Educators: AI for Educators
byMatt Miller
Rating: 5 out of 5 stars
5/5
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
Ebook
Ways of Being: Animals, Plants, Machines: The Search for a Planetary Intelligence
byJames Bridle
Rating: 4 out of 5 stars
4/5
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Cerebral Fluid Flow: Modellansatz 134
Podcast episode
Cerebral Fluid Flow: Modellansatz 134
byModellansatz - English episodes only
0 ratings
0% found this document useful
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
Podcast episode
4 + 1 Model of Data Science: Before diving into the complex world of data science it seemed to wise to establish a shared definition of the field. Here at the UVA School of Data Science, we have defined data science with the 4 + 1 Model. This model serves an outline for the first series of UVA Data Points. It also serves as a guiding definition within the School of Data Science, touching everything from research to course planning. In this introduction trailer, host Monica Manney discusses the history, development, and function of the 4 + 1 Model of Data Science with its main author, Raf Alvarado. Below is a brief expect from An Outline of the 4 + 1 Model of Data Science by Raf Alvarado: “The point of the 4 + 1 model, abstract as it is, is to provide a practical template for strategically planning the various elements of a school of data science. To serve as an effective template, a model must be general. But generality if often purchased at the cost of intuitive understanding. The fol
byUVA Data Points
0 ratings
0% found this document useful
Dr. Paul Lessard - Categorical/Structured Deep Learning
Podcast episode
Dr. Paul Lessard - Categorical/Structured Deep Learning
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
100x Improvements in Deep Learning Performance with Sparsity with Subutai Ahmad - #562
Podcast episode
100x Improvements in Deep Learning Performance with Sparsity with Subutai Ahmad - #562
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Dynamical Sampling: Modellansatz 173
Podcast episode
Dynamical Sampling: Modellansatz 173
byModellansatz - English episodes only
0 ratings
0% found this document useful
Dynamical Sampling
Podcast episode
Dynamical Sampling
byModellansatz
0 ratings
0% found this document useful
BI 141 Carina Curto: From Structure to Dynamics: Check out my free video series about whats missing in AI and Neuroscience Support the show to get full episodes and join the Discord community. Carina Curto is a professor in the Department of Mathematics at The Pennsylvania State Universit
Podcast episode
BI 141 Carina Curto: From Structure to Dynamics: Check out my free video series about whats missing in AI and Neuroscience Support the show to get full episodes and join the Discord community. Carina Curto is a professor in the Department of Mathematics at The Pennsylvania State Universit
byBrain Inspired
0 ratings
0% found this document useful
Probe Data: The Good, The Bad, and The Ugly
Podcast episode
Probe Data: The Good, The Bad, and The Ugly
bySLP Nerdcast
0 ratings
0% found this document useful
BI 158 Paul Rosenbloom: Cognitive Architectures: Support the show to get full episodes and join the Discord community. Paul Rosenbloom is Professor Emeritus of Computer Science at the University of Southern California. In the early 1980s, Paul , along with John Laird and the early AI pioneer Al
Podcast episode
BI 158 Paul Rosenbloom: Cognitive Architectures: Support the show to get full episodes and join the Discord community. Paul Rosenbloom is Professor Emeritus of Computer Science at the University of Southern California. In the early 1980s, Paul , along with John Laird and the early AI pioneer Al
byBrain Inspired
0 ratings
0% found this document useful
Bringing it All Together: Chaining Procedures in AAC
Podcast episode
Bringing it All Together: Chaining Procedures in AAC
bySLP Nerdcast
0 ratings
0% found this document useful
100: The New Micro-Neurosurgery — A Remarkable Interview with Dr. Mark Noble: The famed neuroscientist, Dr. Mark Noble, from the University of Rochester, has developed a strong interest in TEAM-CBT and has visited our Tuesday group and Sunday hikes on three occasions this year. I (David) feel very fortunate to have his...
Podcast episode
100: The New Micro-Neurosurgery — A Remarkable Interview with Dr. Mark Noble: The famed neuroscientist, Dr. Mark Noble, from the University of Rochester, has developed a strong interest in TEAM-CBT and has visited our Tuesday group and Sunday hikes on three occasions this year. I (David) feel very fortunate to have his...
byFeeling Good Podcast | TEAM-CBT - The New Mood Therapy
0 ratings
0% found this document useful
BI 105 Sanjeev Arora: Off the Convex Path: Sanjeev and I discuss some of the progress toward understanding how deep learning works, specially under previous assumptions it wouldnt or shouldnt work as well as it does. Deep learning theory poses a challenge for mathematics, because its methods aren
Podcast episode
BI 105 Sanjeev Arora: Off the Convex Path: Sanjeev and I discuss some of the progress toward understanding how deep learning works, specially under previous assumptions it wouldnt or shouldnt work as well as it does. Deep learning theory poses a challenge for mathematics, because its methods aren
byBrain Inspired
0 ratings
0% found this document useful
BI 183 Dan Goodman: Neural Reckoning: Support the show to get full episodes and join the Discord community. You may know my guest as the co-founder of Neuromatch, the excellent online computational neuroscience academy, or as the creator of the Brian spiking neural network simulat
Podcast episode
BI 183 Dan Goodman: Neural Reckoning: Support the show to get full episodes and join the Discord community. You may know my guest as the co-founder of Neuromatch, the excellent online computational neuroscience academy, or as the creator of the Brian spiking neural network simulat
byBrain Inspired
0 ratings
0% found this document useful
Mirjam Nielen - Introduction to Statistics: Always wondered why research papers often present rather complicated statistical analyses? Or wondered how to properly analyse the results of a pragmatic trial from your own practice? This talk will give an overview of basic statistical principles an...
Podcast episode
Mirjam Nielen - Introduction to Statistics: Always wondered why research papers often present rather complicated statistical analyses? Or wondered how to properly analyse the results of a pragmatic trial from your own practice? This talk will give an overview of basic statistical principles an...
byRCVS Knowledge - Evidence-based Veterinary Medicine
0 ratings
0% found this document useful
#17 — ”Global Workspace Theory: Exploring Evidence for Widespread Integration & Broadcasting of Conscious Signals - Part Two”: "I think in terms of consciousness, it seems to me that these Feelings of Knowing are perhaps the conscious tip of the iceberg for this huge amount of unconscious processing that's going on of all this information in our environment, where maybe I c...
Podcast episode
#17 — ”Global Workspace Theory: Exploring Evidence for Widespread Integration & Broadcasting of Conscious Signals - Part Two”: "I think in terms of consciousness, it seems to me that these Feelings of Knowing are perhaps the conscious tip of the iceberg for this huge amount of unconscious processing that's going on of all this information in our environment, where maybe I c...
byOn Consciousness & the Brain with Bernard Baars
0 ratings
0% found this document useful
BI 178 Eric Shea-Brown: Neural Dynamics and Dimensions: Support the show to get full episodes and join the Discord community. Check out my free video series about whats missing in AI and Neuroscience Eric Shea-Brown is a theoretical neuroscientist and principle investigator of the working gr
Podcast episode
BI 178 Eric Shea-Brown: Neural Dynamics and Dimensions: Support the show to get full episodes and join the Discord community. Check out my free video series about whats missing in AI and Neuroscience Eric Shea-Brown is a theoretical neuroscientist and principle investigator of the working gr
byBrain Inspired
0 ratings
0% found this document useful
Neuroscience from a Data Scientist's Perspective: ... or should this have been called data science from a neuroscientist's perspective? Either way, I'm sure you'll enjoy this discussion with Laurie Skelly. Laurie earned a PhD in Integrative Neuroscience from the Department of Psychology at the...
Podcast episode
Neuroscience from a Data Scientist's Perspective: ... or should this have been called data science from a neuroscientist's perspective? Either way, I'm sure you'll enjoy this discussion with Laurie Skelly. Laurie earned a PhD in Integrative Neuroscience from the Department of Psychology at the...
byData Skeptic
0 ratings
0% found this document useful
Special Episode: Visual Processing in Mice: I discuss the methods and results of my Master's Thesis, in which I analysed electrophysiological data of the mouse visual system using a variety of statistical and computational techniques. I consider some of the major research questions addressed in my...
Podcast episode
Special Episode: Visual Processing in Mice: I discuss the methods and results of my Master's Thesis, in which I analysed electrophysiological data of the mouse visual system using a variety of statistical and computational techniques. I consider some of the major research questions addressed in my...
byThe Science of Everything Podcast
0 ratings
0% found this document useful
75 - Russ Poldrack: What can neuroimaging research tell us about the brain and why is reproducible neuroscience important?
Podcast episode
75 - Russ Poldrack: What can neuroimaging research tell us about the brain and why is reproducible neuroscience important?
byStanford Psychology Podcast
0 ratings
0% found this document useful
Survey Raking: It's quite common for survey respondents not to b…
Podcast episode
Survey Raking: It's quite common for survey respondents not to b…
byLinear Digressions
0 ratings
0% found this document useful
Improving the precision of oxytocin research | A talk at the University of Copenhagen
Podcast episode
Improving the precision of oxytocin research | A talk at the University of Copenhagen
byPhysiology and Behavior
0 ratings
0% found this document useful
Episode 104 - Scraping Facts Online: If You Can’t Beat ’Em, Datum: At the time of this taping, Paul was in the middle of the Metis “bootcamp” program learning the capabilities, tools, and insights of data science. This conversation ranged widely in the realm of data analysis and management, examining its relevance to Pa...
Podcast episode
Episode 104 - Scraping Facts Online: If You Can’t Beat ’Em, Datum: At the time of this taping, Paul was in the middle of the Metis “bootcamp” program learning the capabilities, tools, and insights of data science. This conversation ranged widely in the realm of data analysis and management, examining its relevance to Pa...
byThat's So Second Millennium
0 ratings
0% found this document useful
Neural Nets (Part 1): There is no known learning algorithm that is more…
Podcast episode
Neural Nets (Part 1): There is no known learning algorithm that is more…
byLinear Digressions
0 ratings
0% found this document useful
When sociologists meet computer scientists: Information systems is a discipline nestled between the organizational sciences and the computer sciences. How do these different fields view each other? We ask Brian Pentland (an organizational sociologist) and Wil van der Aalst (a computer...
Podcast episode
When sociologists meet computer scientists: Information systems is a discipline nestled between the organizational sciences and the computer sciences. How do these different fields view each other? We ask Brian Pentland (an organizational sociologist) and Wil van der Aalst (a computer...
bythis IS research
0 ratings
0% found this document useful
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
Podcast episode
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
byThe Black Goat
0 ratings
0% found this document useful
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
Podcast episode
Keeping ourselves honest when we work with observational healthcare data: The abundance of data in healthcare, and the valu…
byLinear Digressions
0 ratings
0% found this document useful
Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science: Interesting in Data Science? Learn Data Science and Statistics from experts as they cover key topics in the field. The Data & Science podcast focusses on teaching data scientists how to think critically in order to solve data analysis problems across var...
Podcast episode
Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science: Interesting in Data Science? Learn Data Science and Statistics from experts as they cover key topics in the field. The Data & Science podcast focusses on teaching data scientists how to think critically in order to solve data analysis problems across var...
byData & Science with Glen Wright Colopy
0 ratings
0% found this document useful
A Relational Frame Theory Primer - Session 59 with Nick Berens: Well I meant to have this episode out before my interview with Steve Hayes, but the timing was such that it made more sense to have Steve's episode published ASAP in case people were interested in participating in the ACT Bootcamp for Behavior...
Podcast episode
A Relational Frame Theory Primer - Session 59 with Nick Berens: Well I meant to have this episode out before my interview with Steve Hayes, but the timing was such that it made more sense to have Steve's episode published ASAP in case people were interested in participating in the ACT Bootcamp for Behavior...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
BI 148 Gaute Einevoll: Brain Simulations: Check out my free video series about whats missing in AI and Neuroscience Support the show to get full episodes and join the Discord community. Gaute Einevoll is a professor at the University of Oslo and Norwegian University of Life Scien
Podcast episode
BI 148 Gaute Einevoll: Brain Simulations: Check out my free video series about whats missing in AI and Neuroscience Support the show to get full episodes and join the Discord community. Gaute Einevoll is a professor at the University of Oslo and Norwegian University of Life Scien
byBrain Inspired
0 ratings
0% found this document useful
Understanding Deep Learning - Prof. SIMON PRINCE [STAFF FAVOURITE]
Podcast episode
Understanding Deep Learning - Prof. SIMON PRINCE [STAFF FAVOURITE]
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful

Skip carousel

AI Is Helping Scientists Explain the Brain
Nautilus
Article
AI Is Helping Scientists Explain the Brain
Feb 16, 2022
8 min read
‘Deep Learning’ Goes Faster With Organized Data
Futurity
Article
‘Deep Learning’ Goes Faster With Organized Data
Jun 5, 2017
Researchers have found that a technique for speedy data lookup, called hashing, can dramatically reduce the amount of computation required for deep learning, a demanding form of machine learning. “This applies to any deep-learning architecture, and t
2 min read
What Tech Can Learn from the Fruit Fly’s Search Algorithm
Nautilus
Article
What Tech Can Learn from the Fruit Fly’s Search Algorithm
Nov 13, 2017
5 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Why the Brain Is So Noisy
Nautilus
Article
Why the Brain Is So Noisy
Jan 17, 2019
One of the core challenges of modern AI can be demonstrated with a rotating yellow school bus. When viewed head-on on a country road, a deep-learning neural network confidently and correctly identifies the bus. When it is laid on its side across the
8 min read
Here's Why Most Neuroscientists Are Wrong About the Brain
Nautilus
Article
Here's Why Most Neuroscientists Are Wrong About the Brain
Oct 25, 2015
4 min read
Stop Saying the Brain Learns By Rewiring Itself
Nautilus
Article
Stop Saying the Brain Learns By Rewiring Itself
May 5, 2017
4 min read
Algorithm Uses Brain ‘Fingerprints’ To Detect Autism
Futurity
Article
Algorithm Uses Brain ‘Fingerprints’ To Detect Autism
Apr 5, 2022
3 min read
Virtual Town Sheds Light On How Our Brains Perceive Places
Futurity
Article
Virtual Town Sheds Light On How Our Brains Perceive Places
Oct 9, 2019
3 min read
Light Decodes What A Person Sees From Brain Signals
Futurity
Article
Light Decodes What A Person Sees From Brain Signals
Feb 9, 2021
3 min read
Neuroscience Weighs in on Physics’ Biggest Questions
Nautilus
Article
Neuroscience Weighs in on Physics’ Biggest Questions
Oct 13, 2021
For an empirical science, physics can be remarkably dismissive of some of our most basic observations. We see objects existing in definite locations, but the wave nature of matter washes that away. We perceive time to flow, but how could it, really?
15 min read
A Theory of Consciousness Can Help Build a Theory of Everything
Nautilus
Article
A Theory of Consciousness Can Help Build a Theory of Everything
May 4, 2017
For an empirical science, physics can be remarkably dismissive of some of our most basic observations. We see objects existing in definite locations, but the wave nature of matter washes that away. We perceive time to flow, but how could it, really?
15 min read
How And Where You Use Machine-learning
APC
Article
How And Where You Use Machine-learning
Oct 7, 2019
4 min read
Mice Have More Synapses Connecting Neurons Than Primates
Futurity
Article
Mice Have More Synapses Connecting Neurons Than Primates
Sep 23, 2021
4 min read
The Human Brain Project Hasn’t Lived Up to Its Promise
The Atlantic
Article
The Human Brain Project Hasn’t Lived Up to Its Promise
Jul 22, 2019
6 min read
Scientists Make A Maze-running Artificial Intelligence Program That Learns To Take Shortcuts
Los Angeles Times
Article
Scientists Make A Maze-running Artificial Intelligence Program That Learns To Take Shortcuts
May 9, 2018
Call it an a-MAZE-ing development: A U.K.-based team of researchers has developed an artificial intelligence program that can learn to take shortcuts through a labyrinth to reach its goal. In the process, the program developed structures akin to thos
4 min read
Researchers Gain New Understanding From Simple AI
Nautilus
Article
Researchers Gain New Understanding From Simple AI
Apr 15, 2022
In the last two years, artificial intelligence programs have reached a surprising level of linguistic fluency. The biggest and best of these are all based on an architecture invented in 2017 called the transformer. It serves as a kind of blueprint fo
5 min read
The Implant That Can Control Your Brain
Nautilus
Article
The Implant That Can Control Your Brain
Oct 17, 2019
Shaun Patel has such a tranquil voice that it’s easy to see how he convinces patients to let him experiment in the depth of their brains. On the phone, in his office at Massachusetts General Hospital (he is also on faculty at Harvard Medical School),
9 min read
To See Unique Brains, Scientists Begin With Their Own
Futurity
Article
To See Unique Brains, Scientists Begin With Their Own
Jul 28, 2017
Scientists with little time and almost no funding created a “Midnight Scan Club” in their quest to analyze the unique features of individual human brains. Most efforts to analyze connections involve scanning many brains and averaging the data across
4 min read
AI Gets A Sense Of Number
Business Today
Article
AI Gets A Sense Of Number
May 27, 2019
3 min read
The Usefulness of a Memory Guides Where the Brain Saves It
Nautilus
Article
The Usefulness of a Memory Guides Where the Brain Saves It
Oct 6, 2023
5 min read
To Make Sense Of A.I. Decisions, ‘Peek Under The Hood’
Futurity
Article
To Make Sense Of A.I. Decisions, ‘Peek Under The Hood’
Oct 8, 2018
Now that humans have programmed computers to learn, we want to know exactly what they’ve learned and how they make decisions after their learning process is complete. The answers to such questions could shed light on our own decision-making processes
6 min read
NEUROMAR KETING IN THE DIGITAL AGE How we become consumers
Frontiers of Science
Article
NEUROMAR KETING IN THE DIGITAL AGE How we become consumers
Apr 21, 2020
10 min read
How Brain Scientists Forgot That Brains Have Owners
The Atlantic
Article
How Brain Scientists Forgot That Brains Have Owners
Feb 27, 2017
7 min read
1 Neuron May Be Enough For Recording Brain Data
Futurity
Article
1 Neuron May Be Enough For Recording Brain Data
Jun 27, 2019
2 min read
What Your Brain Is Doing When You’re Not Doing Anything
Nautilus
Article
What Your Brain Is Doing When You’re Not Doing Anything
Feb 22, 2024
5 min read
Quick Learners Have Speedier Neurons
Futurity
Article
Quick Learners Have Speedier Neurons
Apr 18, 2019
3 min read
Algorithms Decode Complex Thoughts From Brain Scans
Futurity
Article
Algorithms Decode Complex Thoughts From Brain Scans
Jun 27, 2017
Scientists can now use brain activation patterns to identify complex thoughts like “The witness shouted during the trial.” The research uses machine-learning algorithms and brain-imaging technology to “mind read.” The findings indicate that the mind’
2 min read
Will This “Neural Lace” Brain Implant Help Us Compete with AI?
Nautilus
Article
Will This “Neural Lace” Brain Implant Help Us Compete with AI?
Aug 29, 2016
7 min read

Related categories

Skip carousel

Reviews for Statistical and Neural Classifiers

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Statistical and Neural Classifiers - Sarunas Raudys

1 Quick Overview

Šarūnas Raudys¹

(1)

Data Analysis Department, Institute of Mathematics and Informatics, Akademijos 4, Vilnius, Lithuania, 2600

The main objective of this chapter is to define the terminology to be used and the primary issues to be considered in depth in the chapters that follow. In the first section, I present three of the simplest and most popular classification algorithms, and in the second section, I describe single and multilayer perceptrons and the training rules that are used to obtain classification algorithms. I give special attention to the most popular perceptron-training rule, back-propagation (BP). In the third section, I show that the three statistical-based classifiers described in Section 1.1 can be obtained via single-layer perceptron (SLP) training. The fourth section deals with performance, complexity and training-set size relationships, while the fifth section explains how to utilise these relationships while determining the optimal complexity of the decision-making algorithm. In the final section, I explain the overtraining effect that can arise while training artificial neural networks. Some bibliographical and historical remarks are included at the end of the chapter.

1.1 The Classifier Design Problem

The simplest formulation of a pattern classification task is to allocate an object characterised by a number of measurements into one of several distinct pattern classes. Let an unknown object (process, image) be characterised by the D-dimensional vector

and let ω1, ω2, ..., ωL be labels representing each of the potential pattern classes.

Then in allocatory pattern recognition, the objective is to construct an algorithm that will use the information contained in the components s1 s2, ..., sD of the vector SD to allocate this vector to one of the L distinct categories (pattern classes) listed. Typically, the pattern classifier calculates L functions (similarity measures) o1 o2, ..., oL and allocates the unlabeled vector SD to the class with the highest similarity measure.

In order to design such a classification algorithm one needs to fix the list of potential pattern classes. The measurements are chosen by their degree of relevance to the pattern classification problem at hand. The D measurements are obtained to formalise the large amount of information containing various a priori, unknown relationships among the pattern classes, among components of the vector SD, etc.

Many useful ideas and techniques have been proposed during 50 years of research into the field of pattern recognition. In order to simplify the design process, researchers have proposed that one split the pattern recognition problem into two stages: the feature extraction stage and the pattern classifier design stage. In the feature extraction stage, a comparatively small number of informative features (n) are obtained from a large number of initial measurements by means of linear or non-linear transformations (typically, n≪D). Then one develops an algorithm to classify an n-variate vector

into one of L pattern classes, where x1, x2, ..., xn are components of the unclassified vector X and Tdenotes the transpose operation.

Example 1. Let s1, s2, ..., sD contain D values of an acoustic signal. Suppose s1, s2, ..., sD are realisations of a stationary random process described by the autoregression equation st = x1st−1+ x2st−2 + ... +xnst−n where n≪D. Then we can estimate the unknown coefficients x1, x2, ..., xn from each segment of the signal and use x1, x2, ..., xn as the components of the feature vector X.

We discuss feature extraction methods in Chapter 6. In this book, we restrict ourselves to specific classifier design tasks and we assume that the n features are already extracted so that we can perform the classification of the n-dimensional vectors X. For simplicity, we mostly consider only the two category case (L = 2) throughout the book.

Typically, in designing the classifier, one selects a number of n-dimensional vectors with known class membership and uses this data set as the main information source to develop a pattern recognition algorithm. This data set is called a design set. Often, while designing the pattern recognition algorithms, one develops a number of classifier variants differing in the feature system, in the methods used to design the classifier, in the parameter estimation methods, etc. A problem arises in determining which classifier variant to use. In order to obtain a reliable unbiased estimate of the performance of different variants for the recognition algorithm, one should use part of the design set for estimation of the algorithm’s parameters (training) and another part for performance evaluation. The first part of the design set is called a training (learning) set, while the second part is called a validation set.

There exist different modifications for the data-splitting strategies. One of them is to perform the second test where the design and the validation sets are interchanged. We consider other strategies in Chapter 6. In order to evaluate the performance (generalisation error) of the final variant, we must use a new data set that have not been used for either training or for validation. This test set should be used only once.

An important source of information in developing a pattern classification algorithm involves assumptions and hypotheses concerning the structure and configuration of the pattern classes and the type of classifier decision boundary desired. One of the most popular hypotheses is that the n-dimensional vectors of distinct pattern classes are situated in different areas of a multivariate n-dimensional pattern space. One validation of this assumption is that these areas are compact. Going further, one can assume that the arithmetic mean represents the pattern class ωr sufficiently well. Then one can allocate an unknown vector X according to its Euclidean distance to the means of the L distinct categories. The Euclidean distance between two vectors is defined as

(1.1)

where aTb denotes a scalar product of vectors a and b.

This classifier is called the Euclidean distance classifier (EDC) or the nearest means classifier (Sebestyen, 1962). To design the EDC, one assumes that the pattern vectors of each class are similar among themselves and close in proximity to a typical member of this class (the centre represents the sample mean vector of this class). In Figure 1.1, we have depicted 300 bivariate vectors belonging to each of two pattern classes and the linear decision boundary of the EDC. The sample means of the classes, and , are denoted by circles.

Fig. 1.1.

Bivariate vectors sampled from two pattern classes, the sample means of the classes (circles), and the linear decision boundary of the Euclidean distance classifier.

In the two pattern-classes case, we perform allocation of the vector X according to the difference

and the classification algorithm can be written as a linear discriminant function (DF) h(X) = XTVE+vo with

(1.2)

The allocation of the unclassified vector X to one of the two pattern classes is performed according to the sign of the discriminant function. This classification rule can also be obtained from the statistical decision theory if one assumes that the vector X is random and its distribution is the spherical Gaussian distribution NX(Mr, σ²I). Here, Mr denotes the n-dimensional mean vector of the r-th pattern class and σ²I is the covariance matrix (CM). For this pattern-class data model the CM is equal to the identity matrix I multiplied by the scalar σ² (the variance of each feature). For this theoretical data model, the EDC is an asymptotically (when the number of training vectors N→∞) optimal classifier, according to the criterion of classification error minimisation.

In the general case, the n×n covariance matrix Σ of a multivariate pattern-class distribution is a function of n variances and n(n−1) correlations. If one assumes that both pattern classes share the same CM, we denote this assumption by . Otherwise, Σr represents the associated CM for the pattern-class ωr. Generally, , and the EDC is no longer an asymptotically optimal classification rule in terms of minimising the classification error.

Example 2. In Figure 1.2, we have a graph of 300 bivariate correlated vectors (×-marks and small pluses) of two Gaussian pattern classes and the linear decision boundary of the EDC, depicted as line E. The sample means of the classes are denoted by circles, parameters of the data model are:

Fig. 1.2.

A plot of 250+250 bi-variate vectors (×-marks and small pluses) sampled from two Gaussian pattern classes, sample means of the classes (circles), and the linear decision boundaries: the Euclidean distance classifier E, and the Fisher linear DF. Above is a plot of the two pattern-class densities f(h) in the direction h of their best separability.

Obviously, for this data model the EDC is not the best classification rule. To design an appropriate classifier, one needs to take into account the fact that the feature variances are unequal and that the features are correlated. For such a data model one can use a linear discriminant function of the form h(X) = XTVF+ v0 with

(1.3)

where

(1.4)

is the pooled sample covariance matrix, and allocation of the unlabeled vector X is performed according to the sign of the discriminant function (1.3). This is the first classification rule proposed in the statistical classification literature (Fisher, 1936) and it is now called the standard Fisher DF. The decision boundary corresponding to the Fisher DF is depicted by line F in Figure 1.2.

To obtain the Fisher rule, we project the training vectors onto a straight line h perpendicular to the decision boundary F. In the upper part of Figure 1.2, we have depicted the distribution densities f1(h) and f2(h) of the discriminant function values h(X) = XT VF + v0 on a projection line h. In the new univariate space, both pattern classes are almost nonintersecting. This projection is the best linear projection possible onto a line.

In the classification literature, a large number of other parametric classification rules have been proposed. In the statistical approach, we postulate that X, being the vector to be classified, is random. To obtain the classification rule (1.3), we can assume that the multivariate density of the vector X is Gaussian. Then we can estimate the unknown parameters of the Gaussian densities from the data and apply statistical decisions theory.

There is another approach to designing classification rules. Advocates of this alternative approach believe that one is unwise to make assumptions about the type of multivariate pattern-class density model and its parameters. Also, advocates of this approach insist that one should not estimate the classification-rule parameters in an indirect manner. Instead of constructing a parametric statistical classifier, these classifier designers suggest that a much better classifier-design method is to make assumptions about the classification rule structure (for example a linear discriminant function) and then estimate the unknown coefficients (weights) of the classification rule directly from the training data. As a criterion of goodness, they propose that one minimise the number of classification errors (empirical error) while classifying the training vectors. This is a typical formulation of the classifier design problem utilised in artificial neural network theory. One of several possible approaches to determining the unknown weight vector V is to adapt its components in an iterative manner. This weight-determination method is explained in the following section. Another entirely hypothetical way to estimate the weights is to generate M (a large number) random vectors V1, V2 ..., VM, and to estimate the empirical errors for all M variants and then to choose the best set of weights. However, this random search procedure is utilised only in theoretical analyses.

The EDC is an asymptotically optimal classification rule for spherically Gaussian pattern classes with unequal mean vectors. Here, the word asymptotically means that the training-set size N tends to infinity. The standard linear Fisher DF is asymptotically optimal for Gaussian pattern classes that share a common covariance matrix but have unequal mean vectors. However, this rule is no longer asymptotically optimal when the CMs of different classes are unequal or when the data is not Gaussian. For such types of data, the minimum empirical error (MEE) classifier is a good alternative to the EDC and Fisher’s DF, which are parametric statistical classifiers.

Example 3. In order to see the differences among the classifiers in Figure 1.3, we present their decision boundaries and the data from two bivariate Gaussian classes contaminated with 10 % additional Gaussian noise NX(0, N).

Fig. 1.3.

Decision boundaries of the Euclidean distance classifier (1), the Fisher linear DF (2), and the MEE classifier (3). The noise patterns are denoted by * and +.

The pattern class models have different means M1 = − M2 = M and share a common covariance matrix . The Gaussian components are represented by two small ellipses. Parameters of the data model are

In this book, we refer to this data configuration as data model A.

1.2 Single Layer and Multilayer Perceptrons

The non-linear single layer perceptron (SLP) is a simplified mathematical model of an elementary information processing cell in living organisms — a neurone. Animal and human brains are composed from billions of such cells, which are interconnected. The standard belief is that the neurones are organised into layers, and neurones in one layer are interconnected with neurones in the next one. The complexity of connections between the neurones resembles nets. In an analogous model, the information processing units, i.e. artificial neurones, are also structured into layers. The units in one layer are interconnected with units in the adjacent layer. Thus the term artificial neural nets appeared.

According to Rumelhart et al. (1986) the SLP has several inputs (say n), x1, x2, ..., xn, and one output, o, that is calculated according to the equation

(1.5)

where f(net) is a non-linear activation function, e.g. a sigmoid function:

(1.6)

and v0, VT = (v1, v2, ..., vp) are the weights of the discriminant function (DF).

The output of the SLP can be used to perform classification of the unclassified vector X. When the sigmoid function (1.6) is used, we compare the output o with the threshold value of 0.5. The decision boundary is linear as are the three linear boundaries depicted in Figure 1.3. To find the weights for the SLP, we minimise a certain training set’s cost function. The most popular cost function is the sum of squares cost function

(1.7)

where is the desired output (a target) for , the j-th training observation from class ωi, and Ni is the number of training vectors from class ωi.

unimportant. As a result we obtain the Euclidean distance classifier with the weights defined by the Equation (1.2).

In this pattern recognition algorithm, the desired outputs are associated with the class label. If the sigmoid function (1.6) is used, then typically one uses either or Outputs of the activation function (1.6) can vary in the interval of (0, 1). Therefore, we refer to the target values of 1 and 0 as boundary values, and we refer to the values 0.1 and 0.9 as proximal values (the values recommended by Rumelhart et al., 1986). Some authors use another activation function:

(1.8)

For this function (the boundary values), or (proximal values). Both activation functions (1.6) and (1.8) are smooth and can be differentiated. Such functions are commonly referred to as soft-limiting functions. In extreme cases, there are times when we analyse the hard-limiting activation function

(1.9)

In this case, and Typically, to determine the weights v0, (v1, v2, ..., vn) = VT, we minimise the cost function (1.7). In the gradient descent optimisation algorithm (delta rule), we calculate derivatives of the cost function and find the weights in an iterative manner. At step t, we update the weight vector V(t) according to equations

(1.10)

where η is the learning-step and is the gradient of the cost function.

When the hard-limiting cost function is used, we cannot calculate the gradient. In such cases, we must use other optimisation methods. To realise the training procedure (1.10), we must first choose the initial weights, V(t=0) and v0(t=0). For this purpose, the neural net literature recommends one use small random numbers. In this book we advocate that one use non-random weight selection methods that can often lead to classifiers with a smaller generalisation error.

If expertly used, the training rule (1.10) can produce different algorithms to design the statistical classifiers. In fact, all three boundaries depicted in Figure 1.3 can be obtained while training the non-linear single layer perceptron. In Section 1.3 and Chapter 4, we demonstrate how this is accomplished.

The multi-layer perceptron (MLP) is the oldest and most frequently used type of the artificial neural network (ANN). A typical MLP classifier consists of several layers of artificial neurones containing single layer perceptrons (see Figure 1.4). Input nodes of the MLP classifier correspond to the n components of the feature vector. The output-layer can have one or several (say L) output neurones. The input signals propagate from the input layer to the output layer. Such networks are called the feedforward ANN. Typically, each neurone in the output layer corresponds to a pattern class label. Inputs to L output neurones are outputs f1, f2, ..., fh of the h neurones that constitute a lower (hidden) layer. Complex MLP classifiers can have many hidden layers.

Fig. 1.4.

MLP with one hidden layer.

In order to find the classifier weights one must minimise a cost function similar to function (1.7). In the gradient descent optimisation algorithm, we need to calculate derivatives of the cost function and update the weights for all neurones in an iterative manner similar to the training of the SLP. While calculating the gradients of the hidden layer neurones, we propagate the error signal back to the lower layers. This technique is called Back Propagation (BP). The reader can find equations for the gradients of the output and hidden layer neurones in practically all artificial neural network monographs and textbooks.

When the hard limiting function is used, we obtain a piecewise-linear decision boundary. Smooth activation functions assist in obtaining smooth decision boundaries (compare boundaries 1 and 2 in Figure 1.5). The degree of non-linearity of the smooth activation function can be changed by multiplication of all components of the hidden layer weights by a certain positive constant α (Section 4.6.9). In fact, both boundaries in Figure 1.5 have been obtained by multiplying the weights of all neurones in the hidden layer of a well trained perceptron by the factor α, where α = 1000 for the hard limiting activation function and α = 1/3 for the soft limiting function. This bivariate data model is denoted by SF2 and is discussed in more depth in Chapter 4. We see that the non-linearity of the activation function is an essential factor that affects the shape of the decision boundary. In general, the perceptron with one hidden layer, a soft-limiting activation function, and a sufficiently large number of hidden units can realise arbitrarily complex decision surfaces.

Fig. 1.5.

30 bivariate training vectors and decision boundaries of MLP with 10 hidden units and hard-limiting (1) and smooth (2) activation functions.

1.3 The SLP as the Euclidean Distance and the Fisher Linear Classifiers

The objective of this section is to demonstrate that in the adaptive BP training for the two-category case (L = 2), the SLP can realise three known statistical classifiers. In this section, we assume with centred data, i.e., and analyse the cost function of the sum of squares (1.7) with activation function (1.8) and targets . Let the prior weights be zero, i.e., v0(0) = 0, V(0) = 0. We train the SLP with the standard delta rule (1.10) in a batch-mode (the total gradient). Note that in the total gradient, we change the weights after calculating the gradients for all training vectors. For the stochastic gradient, we change the weights after calculating the gradient for each single training vector. During the first iteration, the weights and a scalar product, , are close to zero. Therefore, for the tanh(net) activation function (1.8) we have

Thus,

(1.11)

where

After the first learning iteration in the training process, we get

(1.12)

This is the weight vector of the Euclidean distance (nearest means) classifier (EDC) scaled by η. In the first section, we mentioned that the EDC is the asymptotically optimal classification rule for the data model with two spherically Gaussian pattern classes. The fact that a single-layer perceptron becomes a comparatively good statistical classifier after only one iteration is a very nice property of the SLP! To demonstrate this point, we need the following conditions:

E1) the centre of the data is moved to the zero point;

E2) the training process starts from zero weights;

E3) if , we use symmetrical targets, i.e. for activation function (1.8)t2 = −t1;

E4) the total gradient training (batch mode) method is used.

Second order optimisation methods allow us to obtain the minimum of the quadratic cost function in one iteration. Equating the derivative of the cost (1.7) to zero for non-singular matrix K, we obtain v0 = 0, and . For (the centred data) we can use the representation

(1.13)

where is the pooled sample covariance matrix defined in (1.4).

Using Bartlett’s formula for the inversion of a symmetric positive-definite matrix, where A is an n × n symmetric positive-definite matrix, we can express (1.13) in the form

(1.14)

(1.15)

where , and

If and the classification task is performed according to the sign of the discriminant function, then the positive constant kn plays no role and expressions (1.15) and (1.3) are identical. Thus, in the linear SLP training process, we can obtain the standard Fisher linear discriminant function. This classifier can also be obtained using back propagation training when one uses the linear SLP (with a linear activation function, e.g. f(net) = net). We can approximate the Fisher classifier while training the non-linear perceptron. However, to do this we need to utilise the proximal targets.

Let us now consider the hard-limiting activation function (1.9). Then the cost (1.7) expresses the empirical classification error — the frequency of errors when the training-set is classified. Obviously, successful minimisation of this cost function leads to the minimum empirical error classifier. Unfortunately, in such a case the cost function is no longer differentiable and has many local minima. To tackle the local minima problems special discrete optimisation methods must be applied. In Chapter 4, we show how to approach the minimum empirical error classifier by the BP technique when the smooth activation function is utilised. Moreover, in zero empirical error situations, the expert training of the SLP can lead to a maximum margin classifier. For this classifier, the decision boundary is placed exactly in the middle between the closest training vectors of opposite categories. In Chapter 4, we also demonstrate that in BP training, more statistical classifiers can be obtained.

1.4 The Generalisation Error of the EDC and the Fisher DF

Asymptotic and Bayes error rates. In machine learning and classifier design theory, one of the main performance characteristics is the probability of misclassification. There are several types of classification errors or probabilities of misclassification. In order to differentiate among them, we consider the problem of classifying observations from the two Gaussian with a common covariance matrix (GCCM) pattern classes, and .

Assume that the vectors from the first and second categories are equiprobable, i.e. the prior probabilities, P1 and P2, of the pattern classes are equal. Thus, P2 = 1− P1 = ½. When all parameters are known, statistical discriminant analysis provides an optimal discriminant function that is linear, namely h(X) = v0 + with weights

(1. 16)

The probability of misclassification (PMC) of this rule is

where ε1 is the probability of error of the first sort, i.e. the probability that a vector from the first class will be allocated to the second pattern class: ε1 = Probability{h(X) < 0 ∣ X ∊ ω1}, and ε2 is the probability of error of the second sort, i.e. the probability that a vector from the second class will be allocated to the first pattern class: ε2 = Probability{h(X) ≥ 0 ∣ X∊ ω2}.

For the Gaussian data model described above, h(X) = XTV+ v0 is a Gaussian random variable. Thus,

(1.17)

where

dt is the standard Gaussian cumulative distribution function (a quick reminder: Φ {a}→ 0 as a →−∞, Φ {a} = 0.5 when

E h(X ∣X ∊ ωi) is the conditional mean:

and V h(X ∣X ∊ ωi) is the conditional variance:

where

(1.18)

is the squared generalised distance between the two pattern classes.

When (the n×n identity matrix, here all n features are uncorrelated and have variances equal to one), δ² denotes the conventional squared Euclidean distance between the two vectors, M1 and M2. The generalised squared distance (1.18) was introduced by the Indian statistician P.C. Mahalanobis and, therefore, it carries his name. We use the Mahalanobis distance as a measure of the separability of two GCCM pattern classes.

Inserting the conditional means and the conditional variance into (1.17) yields

(1.19)

Equation (1.19) shows that, for the pattern-class models and , the Mahalanobis distance uniquely determines the classification error of the optimal rule. For example, if δ = 2.56, then ∊ = 0.1, if δ = 3.76, then ∊ = 0.03, and if δ = 4.65, then ∊ = 0.01.

Statistical decision-theory based classifiers are formulated using Bayes formula for conditional probabilities. This classifier-design method allows optimal classification rules to be obtained when all parameters are known. Thus, for the GCCM data model, the discriminant function (1.16) is a Bayes rule, and Φ{−½δ} is its associated error rate. The minimal probability of misclassification is called the Bayes error and is denoted by ∊B. For the data model and the Bayes error coincides with the asymptotic error for the Fisher linear DF. As a reminder, the asymptotic error determines the performance of the classification rule when an arbitrarily large training-set is used to determine the classification rule’s coefficients (weights). In this case, we assume we know the parameters M1, M2, and exactly. Therefore, for the GCCM data model the asymptotic error of the Fisher classifier is

Consider the Euclidean distance classifier designed for pattern classes with known parameters. The weights of the discriminant function are:

(1.20)

The asymptotic error rate of this classifier is

where δ*, the effective distance between two pattern classes, is defined to be

(1.21)

Calculations show that δ*≤ δ. Hence, for the GCCM data model we have that . Note that the Bayes error does not depend on the particular classifier design method. Rather, it depends on the characteristics of the data model.

Example 4. Consider the data model with GCCM pattern classes with n = 20, ΔM = M1 − M2 = (−1.7040, 0.5244, 0.49700, ..., 0.0872, 0.0599, 0.0326)T. All correlations among the variables are equal: ρ = 0.213, and all n variables have unit variances. For this data model the effective and Mahalanobis distances are δ* = 3.7616 and δ = 4.65, respectively, and the asymptotic classification errors are and . In this book, we refer to this data configuration as data model C.

The expressions for the Mahalanobis and the effective distances, (1.18) and (1.21), respectively, as well as numerical calculation show that the asymptotic errors depend on the type of classifier design method. Both the EDC and the Fisher DF are linear classifiers. However, they use different methods to determine their respective weights. Therefore, in the general case, the asymptotic error values differ as well. We note that theoretically, for certain data models, both classifiers can have the same asymptotic errors.

Generalisation errors. Consider the sample-based EDC with the discriminant function (1.2). The sample mean vectors and are functions of the training vectors

. Therefore, and are random vectors. Thus, due to sampling error, , and differ from the true population mean vectors, M1, M2, and the vector VE and the threshold v0 in (1.2) differ from the ideal values determined by Equation (1.20). Hence, the decision boundary of the sample-based rule (1.2) differs from the ideal boundary of the Euclidean distance classifier (1.20). The probability of misclassification of the sample-based rule (the generalisation error) is larger than that of the ideal rule.

Example

Enjoying the preview?

Page 1 of 1

Statistical and Neural Classifiers: An Integrated Approach to Design

About this ebook

Sarunas Raudys

Related authors

Related to Statistical and Neural Classifiers

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Statistical and Neural Classifiers

What did you think?

Book preview

Statistical and Neural Classifiers - Sarunas Raudys

1

Quick Overview

1.1 The Classifier Design Problem

1.2 Single Layer and Multilayer Perceptrons

1.3 The SLP as the Euclidean Distance and the Fisher Linear Classifiers

1.4 The Generalisation Error of the EDC and the Fisher DF