Recent Advances and Trends in Nonparametric Statistics

Ebook1,044 pages9 hours

Recent Advances and Trends in Nonparametric Statistics

Name: Recent Advances and Trends in Nonparametric Statistics
Author: M.G. Akritas
ISBN: 9780080540375

By M.G. Akritas and D.N. Politis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

The advent of high-speed, affordable computers in the last two decades has given a new boost to the nonparametric way of thinking. Classical nonparametric procedures, such as function smoothing, suddenly lost their abstract flavour as they became practically implementable. In addition, many previously unthinkable possibilities became mainstream; prime examples include the bootstrap and resampling methods, wavelets and nonlinear smoothers, graphical methods, data mining, bioinformatics, as well as the more recent algorithmic approaches such as bagging and boosting. This volume is a collection of short articles - most of which having a review component - describing the state-of-the art of Nonparametric Statistics at the beginning of a new millennium.

Key features:

• algorithic approaches
• wavelets and nonlinear smoothers
• graphical methods and data mining
• biostatistics and bioinformatics
• bagging and boosting
• support vector machines
• resampling methods

Skip carousel

Mathematics

LanguageEnglish

PublisherElsevier Science

Release dateOct 31, 2003

ISBN9780080540375

Author

M.G. Akritas

Related authors

Skip carousel

Related to Recent Advances and Trends in Nonparametric Statistics

Related ebooks

Skip carousel

Introductory Statistics
Ebook
Introductory Statistics
bySheldon M. Ross
Rating: 0 out of 5 stars
0 ratings
Statistics and Causality: Methods for Applied Empirical Research
Ebook
Statistics and Causality: Methods for Applied Empirical Research
byWolfgang Wiedermann
Rating: 0 out of 5 stars
0 ratings
Estimation and Control of Large-Scale Networked Systems
Ebook
Estimation and Control of Large-Scale Networked Systems
byTong Zhou
Rating: 0 out of 5 stars
0 ratings
Mathematical Methods of Statistics (PMS-9), Volume 9
Ebook
Mathematical Methods of Statistics (PMS-9), Volume 9
byHarald Cramér
Rating: 3 out of 5 stars
3/5
Cooperative and Graph Signal Processing: Principles and Applications
Ebook
Cooperative and Graph Signal Processing: Principles and Applications
byPetar Djuric
Rating: 0 out of 5 stars
0 ratings
The Postgenomic Condition: Ethics, Justice, and Knowledge after the Genome
Ebook
The Postgenomic Condition: Ethics, Justice, and Knowledge after the Genome
byJenny Reardon
Rating: 0 out of 5 stars
0 ratings
Markov Chains
Ebook
Markov Chains
byD. Revuz
Rating: 0 out of 5 stars
0 ratings
Graph Theoretic Methods in Multiagent Networks
Ebook
Graph Theoretic Methods in Multiagent Networks
byMehran Mesbahi
Rating: 5 out of 5 stars
5/5
The Diagnostic Process: Graphic Approach to Probability and Inference in Clinical Medicine
Ebook
The Diagnostic Process: Graphic Approach to Probability and Inference in Clinical Medicine
byRudolf Zalter
Rating: 0 out of 5 stars
0 ratings
Automata Studies. (AM-34), Volume 34
Ebook
Automata Studies. (AM-34), Volume 34
byC. E. Shannon
Rating: 5 out of 5 stars
5/5
Bayesian Analysis of Stochastic Process Models
Ebook
Bayesian Analysis of Stochastic Process Models
byDavid Insua
Rating: 0 out of 5 stars
0 ratings
Integration of Omics Approaches and Systems Biology for Clinical Applications
Ebook
Integration of Omics Approaches and Systems Biology for Clinical Applications
byAntonia Vlahou
Rating: 0 out of 5 stars
0 ratings
Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms
Ebook
Biological Network Analysis: Trends, Approaches, Graph Theory, and Algorithms
byPietro Hiram Guzzi
Rating: 0 out of 5 stars
0 ratings
Stability of Linear Systems: Some Aspects of Kinematic Similarity
Ebook
Stability of Linear Systems: Some Aspects of Kinematic Similarity
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Nonparametric Analysis of Univariate Heavy-Tailed Data: Research and Practice
Ebook
Nonparametric Analysis of Univariate Heavy-Tailed Data: Research and Practice
byNatalia Markovich
Rating: 0 out of 5 stars
0 ratings
Genomic Signal Processing
Ebook
Genomic Signal Processing
byIlya Shmulevich
Rating: 5 out of 5 stars
5/5
Learning Automata: Theory and Applications
Ebook
Learning Automata: Theory and Applications
byK. Najim
Rating: 0 out of 5 stars
0 ratings
VLSI Implementations for Image Communications
Ebook
VLSI Implementations for Image Communications
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Neural Networks and Genome Informatics
Ebook
Neural Networks and Genome Informatics
byElsevier Books Reference
Rating: 4 out of 5 stars
4/5
Statistical Methods in Longitudinal Research: Principles and Structuring Change
Ebook
Statistical Methods in Longitudinal Research: Principles and Structuring Change
byAlexander von Eye
Rating: 0 out of 5 stars
0 ratings
Nonlinear Functional Analysis and Applications: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin, Madison, October 12-14, 1970
Ebook
Nonlinear Functional Analysis and Applications: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin, Madison, October 12-14, 1970
byLouis B. Rall
Rating: 0 out of 5 stars
0 ratings
Computing Methods in Optimization Problems: Proceedings of a Conference Held at University of California, Los Angeles January 30-31, 1964
Ebook
Computing Methods in Optimization Problems: Proceedings of a Conference Held at University of California, Los Angeles January 30-31, 1964
byA. V. Balakrishnan
Rating: 0 out of 5 stars
0 ratings
Generalized Linear and Nonlinear Models for Correlated Data: Theory and Applications Using SAS
Ebook
Generalized Linear and Nonlinear Models for Correlated Data: Theory and Applications Using SAS
byEdward F. Vonesh
Rating: 0 out of 5 stars
0 ratings
Design and Analysis of Experiments in the Health Sciences
Ebook
Design and Analysis of Experiments in the Health Sciences
byGerald van Belle
Rating: 0 out of 5 stars
0 ratings
Web Semantics: Cutting Edge and Future Directions in Healthcare
Ebook
Web Semantics: Cutting Edge and Future Directions in Healthcare
bySarika Jain
Rating: 0 out of 5 stars
0 ratings
Automated Theorem Proving: A Logical Basis
Ebook
Automated Theorem Proving: A Logical Basis
byD.W. Loveland
Rating: 0 out of 5 stars
0 ratings
Neural Data Science: A Primer with MATLAB® and Python™
Ebook
Neural Data Science: A Primer with MATLAB® and Python™
byErik Lee Nylen
Rating: 5 out of 5 stars
5/5
Statistical Data Analysis and Inference
Ebook
Statistical Data Analysis and Inference
byY. Dodge
Rating: 0 out of 5 stars
0 ratings
Exact Statistical Inference for Categorical Data
Ebook
Exact Statistical Inference for Categorical Data
byGuogen Shan
Rating: 0 out of 5 stars
0 ratings
Partial-Update Adaptive Signal Processing: Design Analysis and Implementation
Ebook
Partial-Update Adaptive Signal Processing: Design Analysis and Implementation
byKutluyil Doğançay
Rating: 0 out of 5 stars
0 ratings

Mathematics For You

Skip carousel

Introducing Game Theory: A Graphic Guide
Ebook
Introducing Game Theory: A Graphic Guide
byIvan Pastine
Rating: 4 out of 5 stars
4/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
Calculus For Dummies
Ebook
Calculus For Dummies
byMark Ryan
Rating: 4 out of 5 stars
4/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Geometry For Dummies
Ebook
Geometry For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
Basic Math Notes
Ebook
Basic Math Notes
byErnest Bywater
Rating: 5 out of 5 stars
5/5
Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
Ebook
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
byISAAC TODHUNTER
Rating: 0 out of 5 stars
0 ratings
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
A Mind for Numbers | Summary
Ebook
A Mind for Numbers | Summary
bySummary Station
Rating: 4 out of 5 stars
4/5
GED® Math Test Tutor, 2nd Edition
Ebook
GED® Math Test Tutor, 2nd Edition
bySandra Rush
Rating: 0 out of 5 stars
0 ratings
Things to Make and Do in the Fourth Dimension: A Mathematician's Journey Through Narcissistic Numbers, Optimal Dating Algorithms, at Least Two Kinds of Infinity, and More
Ebook
Things to Make and Do in the Fourth Dimension: A Mathematician's Journey Through Narcissistic Numbers, Optimal Dating Algorithms, at Least Two Kinds of Infinity, and More
byMatt Parker
Rating: 4 out of 5 stars
4/5
Algebra I For Dummies
Ebook
Algebra I For Dummies
byMary Jane Sterling
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

A New Map Traces the Limits of Computation: A major advance in computational complexity reveals deep connections between the classes of problems that computers can — and can’t — possibly do.
Podcast episode
A New Map Traces the Limits of Computation: A major advance in computational complexity reveals deep connections between the classes of problems that computers can — and can’t — possibly do.
byQuanta Science Podcast
0 ratings
0% found this document useful
David J. Hand, "Dark Data: Why What You Don't Know Matters" (Princeton UP, 2020): What you don't know can, in fact, hurt you...
Podcast episode
David J. Hand, "Dark Data: Why What You Don't Know Matters" (Princeton UP, 2020): What you don't know can, in fact, hurt you...
byNew Books in Mathematics
0 ratings
0% found this document useful
A Guide to Target Gene Validation With Quantitative RT-PCR
Podcast episode
A Guide to Target Gene Validation With Quantitative RT-PCR
byListen In - Bitesize Bio Webinar Audios
0 ratings
0% found this document useful
#338: Site Selection for Clinical Trials
Podcast episode
#338: Site Selection for Clinical Trials
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
The Bioinformatic artistry behind PCR assay design: Come get your assay on with us. If you’ve never considered how PCR assays are designed and developed, this episode will help you appreciate the skill and artistry involved.
Podcast episode
The Bioinformatic artistry behind PCR assay design: Come get your assay on with us. If you’ve never considered how PCR assays are designed and developed, this episode will help you appreciate the skill and artistry involved.
byAbsolute Gene-ius
0 ratings
0% found this document useful
[Bite] Data Science and the Scientific Method
Podcast episode
[Bite] Data Science and the Scientific Method
byDataCafé
0 ratings
0% found this document useful
AI-powered digital diagnostic tools for medical, veterinary and environmental laboratories. How Techcyte uses AI for digital cytology and smears w/ Ben Cahoon, Techcyte
Podcast episode
AI-powered digital diagnostic tools for medical, veterinary and environmental laboratories. How Techcyte uses AI for digital cytology and smears w/ Ben Cahoon, Techcyte
byDigital Pathology Podcast
0 ratings
0% found this document useful
Changepoint Detection: Secret Weapon of the Data Scientist
Podcast episode
Changepoint Detection: Secret Weapon of the Data Scientist
byDataCafé
0 ratings
0% found this document useful
Beyond 40 fluorescent probes for deep phenotyping of blood mononuclear cells, using spectral technology.
Podcast episode
Beyond 40 fluorescent probes for deep phenotyping of blood mononuclear cells, using spectral technology.
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
Neuromorphic Cytometry: Implementation on cell counting and size estimation
Podcast episode
Neuromorphic Cytometry: Implementation on cell counting and size estimation
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
Podcast episode
083R_Operationalising a concept: The systematic review of composite indicator building for measuring community disaster resilience (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
Reimbursement for digital pathology in the clinic – how does that work? w/ Esther Abels, Visiopharm
Podcast episode
Reimbursement for digital pathology in the clinic – how does that work? w/ Esther Abels, Visiopharm
byDigital Pathology Podcast
0 ratings
0% found this document useful
48. Big Data Wrangling for Core Sensing Technology
Podcast episode
48. Big Data Wrangling for Core Sensing Technology
byDiscovery to Recovery
0 ratings
0% found this document useful
17: How Extracting Gold From Your Data Accelerates Process Development w/ Ioscani Jiménez del Val - Part 1
Podcast episode
17: How Extracting Gold From Your Data Accelerates Process Development w/ Ioscani Jiménez del Val - Part 1
bySmart Biotech Scientist | Master Bioprocess CMC Development, Biologics Manufacturing & Scale-up for Busy Scientists
0 ratings
0% found this document useful
Machine Learning and Artificial Intelligence in the Clinical Microbiology Laboratory (JCM ed.): The idea of applying machine learning and digital pathology platforms to everyday workflows in the clinical microbiology laboratory has become increasing intriguing and appealing, especially as labs continue to optimize efficiency in the midst of...
Podcast episode
Machine Learning and Artificial Intelligence in the Clinical Microbiology Laboratory (JCM ed.): The idea of applying machine learning and digital pathology platforms to everyday workflows in the clinical microbiology laboratory has become increasing intriguing and appealing, especially as labs continue to optimize efficiency in the midst of...
byEditors in Conversation
0 ratings
0% found this document useful
Setting the Standard: Impact of Method Standardization in Chromatography
Podcast episode
Setting the Standard: Impact of Method Standardization in Chromatography
byThe Analytical Wavelength
0 ratings
0% found this document useful
What to consider when choosing an image analysis solution for phenotyping? (part 3) w/ Regan Baird, Visiopharm
Podcast episode
What to consider when choosing an image analysis solution for phenotyping? (part 3) w/ Regan Baird, Visiopharm
byDigital Pathology Podcast
0 ratings
0% found this document useful
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
Podcast episode
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
Multiplex PCR Technology: What Is It All About?
Podcast episode
Multiplex PCR Technology: What Is It All About?
byListen In - Bitesize Bio Webinar Audios
0 ratings
0% found this document useful
Counting Mitoses: SI(ze) matters!: In this episode, Dr. Ian Cree, Head of The WHO Tumour Classification discusses his team's recent open access publication in Modern Pathology. Historically, mitotic figures counting has been done by expressing the number of mitoses per n...
Podcast episode
Counting Mitoses: SI(ze) matters!: In this episode, Dr. Ian Cree, Head of The WHO Tumour Classification discusses his team's recent open access publication in Modern Pathology. Historically, mitotic figures counting has been done by expressing the number of mitoses per n...
byModPath Chat
0 ratings
0% found this document useful
Real-time spectral library matching for sample multiplexed quantitative proteomics.
Podcast episode
Real-time spectral library matching for sample multiplexed quantitative proteomics.
byPaperPlayer biorxiv cell biology
0 ratings
0% found this document useful
Beyond Biotech podcast 34: eClinical Solutions, GPCR Therapeutics, Orbsen Therapeutics
Podcast episode
Beyond Biotech podcast 34: eClinical Solutions, GPCR Therapeutics, Orbsen Therapeutics
byBeyond Biotech - the podcast from Labiotech
0 ratings
0% found this document useful
Olalekan Oluwole, MBBS, MD - The Time Is Now for Outpatient CAR-T: Essential Infrastructure, Lessons From Transplant, and New Evidence: Go online to PeerView.com/NEC860 to view the activity, download slides and practice aids, and complete the post-test to earn credit.
Podcast episode
Olalekan Oluwole, MBBS, MD - The Time Is Now for Outpatient CAR-T: Essential Infrastructure, Lessons From Transplant, and New Evidence: Go online to PeerView.com/NEC860 to view the activity, download slides and practice aids, and complete the post-test to earn credit.
byPeerView Internal Medicine CME/CNE/CPE Audio Podcast
0 ratings
0% found this document useful
Benefits of Imaging Data in Flow Cytometry Experiments: A Panel Discussion
Podcast episode
Benefits of Imaging Data in Flow Cytometry Experiments: A Panel Discussion
byListen In - Bitesize Bio Webinar Audios
0 ratings
0% found this document useful
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
221: CE: Don’t fear the blood smear! Learn how technology can help interpret it for you
Podcast episode
221: CE: Don’t fear the blood smear! Learn how technology can help interpret it for you
byThe Vet Blast Podcast
0 ratings
0% found this document useful
Episode 003: Heme Path Series Pt. 2 - Cytogenetics: Ronak Mistry, Vivek Patel, Dan Hausrath
Podcast episode
Episode 003: Heme Path Series Pt. 2 - Cytogenetics: Ronak Mistry, Vivek Patel, Dan Hausrath
byThe Fellow on Call: The Heme/Onc Podcast
0 ratings
0% found this document useful
#037 - Tour De Bayesian with Connor Tann
Podcast episode
#037 - Tour De Bayesian with Connor Tann
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
Revisiting the Minimalist Approach to Offline Reinforcement Learning: Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many inco...
Podcast episode
Revisiting the Minimalist Approach to Offline Reinforcement Learning: Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many inco...
byPapers Read on AI
0 ratings
0% found this document useful
EAST Papers That Should Change Your Practice
Podcast episode
EAST Papers That Should Change Your Practice
byBehind The Knife: The Surgery Podcast
0 ratings
0% found this document useful

Skip carousel

Graph Theory In A Nutshell
Linux Format
Article
Graph Theory In A Nutshell
May 5, 2020
A graph G(V,E) is a finite, non-empty set of vertices V (or nodes) and a set of edges E. If the edges E that have the form of (a,b) are an ordered pairs of vertices, then we have a directed graph. In contrast, if the edges E are unordered pairs of ve
1 min read
Why Is Chaos Theory Important?
All About Space
Article
Why Is Chaos Theory Important?
Jan 3, 2020
1 min read
Handing The Surgeon’s Scalpel To A Robot
Aster Medical Journal (AMJ)
Article
Handing The Surgeon’s Scalpel To A Robot
Sep 1, 2022
5 min read
THE WORLD’S BEST Smart Hospitals 2023
Newsweek
Article
THE WORLD’S BEST Smart Hospitals 2023
Sep 16, 2022
7 min read
Changing Dynamics of Healthcare Sector - Quantum Computers Taking A Leap
Techfastly
Article
Changing Dynamics of Healthcare Sector - Quantum Computers Taking A Leap
Oct 1, 2021
5 min read
‘Holy Grail’ Robotic Device Draws And Tests Blood
Futurity
Article
‘Holy Grail’ Robotic Device Draws And Tests Blood
Jun 14, 2018
1 min read
The World’s Best Smart Hospitals 2024
Newsweek
Article
The World’s Best Smart Hospitals 2024
Sep 15, 2023
3 min read
A.I. Can Say When Neurosurgeons Are Ready To Operate
Futurity
Article
A.I. Can Say When Neurosurgeons Are Ready To Operate
Aug 5, 2019
1 min read
THE WORLD’S BEST Smart Hospitals 2023
Newsweek International
Article
THE WORLD’S BEST Smart Hospitals 2023
Sep 16, 2022
3 min read
The World’s Best Smart Hospitals 2024
Newsweek International
Article
The World’s Best Smart Hospitals 2024
Sep 15, 2023
3 min read
The World’s Best smart Hospitals 2021
Newsweek
Article
The World’s Best smart Hospitals 2021
May 28, 2021
10 min read
‘STAMP’ Gets More Cancer Info From Least Invasive Biopsies
Futurity
Article
‘STAMP’ Gets More Cancer Info From Least Invasive Biopsies
Sep 23, 2019
3 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read
Robot Beats People At Quick, Accurate Blood Draws
Futurity
Article
Robot Beats People At Quick, Accurate Blood Draws
Feb 11, 2020
1 min read
THE WORLD’S BEST Specialized Hospitals 2023
Newsweek
Article
THE WORLD’S BEST Specialized Hospitals 2023
Sep 16, 2022
72 min read
Opening The ‘Black Box,’ Google DeepMind AI System Diagnoses Eye Diseases And Shows Its Work
STAT
Article
Opening The ‘Black Box,’ Google DeepMind AI System Diagnoses Eye Diseases And Shows Its Work
Aug 13, 2018
Experts said the level of accuracy is impressive, but the bigger breakthrough is the DeepMind system’s solution to the so-called “black box” problem of artificial intelligence.
5 min read
AI May Cut Mammogram False Positives Without Missing Cancer
Futurity
Article
AI May Cut Mammogram False Positives Without Missing Cancer
Apr 10, 2024
Using artificial intelligence to supplement radiologists’ evaluations of mammograms may improve breast cancer screening by reducing false positives without missing cases of cancer, according to a new study. The researchers developed an algorithm that
3 min read
Advancing Healthcare Medical Image Processing
Techfastly
Article
Advancing Healthcare Medical Image Processing
Dec 1, 2021
3 min read
DeepMind AI Predicts Acute Loss Of Kidney Function Two Days In Advance, Study Shows
STAT
Article
DeepMind AI Predicts Acute Loss Of Kidney Function Two Days In Advance, Study Shows
Jul 31, 2019
DeepMind's AI was able to predict 90% of acute kidney injury episodes that required dialysis, with a lead time of 48 hours.
2 min read
Cambridge-1 And The Future Of Medicine
PC Pro Magazine
Article
Cambridge-1 And The Future Of Medicine
Sep 9, 2021
7 min read
Open Source Tool Picks Best Chemo Drug 80% Of The Time
Futurity
Article
Open Source Tool Picks Best Chemo Drug 80% Of The Time
Dec 4, 2018
A new machine learning tool could help find the chemotherapy drug most likely to attack cancer in individual patients. The tool, which analyzes RNA expression tied to information about patient outcomes with specific drugs, predicted the chemotherapy
3 min read
Opinion: Synthetic Control Arms Can Save Time And Money In Clinical Trials
STAT
Article
Opinion: Synthetic Control Arms Can Save Time And Money In Clinical Trials
Feb 5, 2019
Synthetic control arms aren't the solution to all of the challenges facing randomized trials, but they represent a great way for drug development companies to start using real-world evidence.
4 min read
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
STAT
Article
Opinion: Real-world Evidence Is Changing The Way We Study Drug Safety And Effectiveness
Jan 29, 2019
Instead of waiting for fleshed-out protocols for using real-world evidence, pharmaceutical companies should dive in now using a commonsense approach that relies on rigor and reason.
4 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
The Mathematics Of Contagion
Frontiers of Science
Article
The Mathematics Of Contagion
Apr 21, 2020
4 min read
Integrating Genomics into Healthcare: A Global Responsibility
Aster Medical Journal (AMJ)
Article
Integrating Genomics into Healthcare: A Global Responsibility
Jul 30, 2020
20 min read
Opinion: The Emergence Of Laboratory Benefits Managers: PBM Déjà Vu?
STAT
Article
Opinion: The Emergence Of Laboratory Benefits Managers: PBM Déjà Vu?
Nov 5, 2019
If left to develop unmonitored, as happened with PBMs, laboratory benefits managers could worsen rather than ease issues related to health care costs and utilization without improving outcomes.
4 min read
AI Tool Could Predict A Patient’s Future Health Conditions, Study Finds
The Independent
Article
AI Tool Could Predict A Patient’s Future Health Conditions, Study Finds
Mar 20, 2024
3 min read
AI Tool Could Predict A Patient’s Future Health Conditions, Study Finds
Evening Standard
Article
AI Tool Could Predict A Patient’s Future Health Conditions, Study Finds
Mar 20, 2024
3 min read
The World’s Best Specialized Hospitals 2024
Newsweek International
Article
The World’s Best Specialized Hospitals 2024
Sep 15, 2023
4 min read

Related categories

Skip carousel

Reviews for Recent Advances and Trends in Nonparametric Statistics

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Recent Advances and Trends in Nonparametric Statistics - M.G. Akritas

millennium.

Algorithmic Approaches to Statistics

An Introduction to Support Vector Machines

Bernhard Schölkopa bernhard.schoelkopf@tuebingen.mpg.de a Max-Planck-Institut für biologische Kybernetik, Spemannstr. 38, Tübingen, Germany

This article gives a short introduction to the main ideas of statistical learning theory, support vector machines, and kernel feature spaces.¹

1 An Introductory Example

Suppose we are given empirical data

(1)

Here, the domain is some nonempty set that the patterns xi are taken from; the yi are called labels or targets. Unless stated otherwise, indices i and j will always be understood to run over the training set, i.e., i, j = 1,…, m.

other than it being a set. In order to study the problem of learning, we need additional structure. In learning, we want to be able to generalize , we want to predict the corresponding y ∈ {± 1}. By this we mean, loosely speaking, that we choose y such that (x, y) is in some sense similar and in {± 1}. The latter is easier, as two target values can only be identical or different. ² For the former, we require a similarity measure

(2)

i.e., a function that, given two examples x and x′, returns a real number characterizing their similarity. For reasons that will become clear later, the function k is called a kernel [12,1,6].

, the canonical dot product is defined as

(3)

Here, [x]n denotes the n-th entry of x.

The geometrical interpretation of this dot product is that it computes the cosine of the angle between the vectors x and x′, provided they are normalized to length 1. Moreover, it, allows computation of the length of a vector x , and of the distance between two vectors as the length of the difference vector. Therefore, being able to compute dot products amounts to being able to carry out all geometrical constructions that can be formulated in terms of angles, lengths and distances.

. To this end, we use a map

(4)

is called a feature spacehas three benefits.

(5)

2. It allows us to deal with the patterns geometrically, and thus lets us study learning algorithms using linear algebra and analytic geometry.

3. The freedom to choose the mapping Φ will enable us to design a large variety of learning algorithms. For instance, consider a situation where the inputs already live in a dot product space. In that case, we could directly define a similarity measure as the dot product. However, we might still choose to first apply a nonlinear map Φ to change the representation into one that is more suitable for a given problem and learning algorithm.

We are now in the position to describe a simple pattern recognition algorithm. The idea is to compute the means of the two classes in feature space,

(6)

(7)

where m1 and m2 are the number of examples with positive and negative labels, respectively. We then assign a new point x to the class whose mean is closer to it. This geometrical construction can be formulated in terms of dot products. Half-way in between c1 and c2 lies the point c := (c1 + c2) /2. We compute the class of x by checking whether the vector connecting c and x encloses an angle smaller than π/2 with the vector w := c1 − c2 connecting the class means, in other words

(8)

Here, we have defined the offset

(9)

It will prove instructive to rewrite this expression in terms of the patterns xi , all we have is the similarity measure k (cf. (5)). Therefore, we need to rewrite everything in terms of the kernel k evaluated on input patterns. To this end, substitute (6) and (7) into (8) to get the decision function

(10)

Similarly, the offset becomes

(11)

Let us consider one well-known special case of this type of classifier. Assume that the class means have the same distance to the origin (hence b = 0), and that k can be viewed as a density, i.e., it is positive and has integral 1,

(12)

If the above holds true, then (10) corresponds to the so-called Bayes decision boundary separating the two classes, subject to the assumption that the two classes were generated from two probability distributions that are correctly estimated by the Parzen windows estimators of the two classes,

(13)

(14)

Given some point x, the label is then simply computed by checking which of the two, p1(x) or p2(x), is larger, leading to (10). Note that this decision is the best we can do if we have no prior information about the probabilities of the two classes, or a uniform prior distribution. For further details, see [15].

The classifier (10) is quite close to the types of learning machines that we will be interested in. It is linear in the feature space (Equation (8)), while in the input domain, it is represented by a kernel expansion (Equation (10)). It is example-based in the sense that the kernels are centered on the training examples, i.e., one of the two arguments of the kernels is always a training example. This is a general property of kernel methods, due to the Representer Theorem [11,15]. The main point where the more sophisticated techniques to be discussed later will deviate from (10) is in the selection of the examples that the kernels are centered on, and in the weight that is put on the individual kernels in the decision function. Namely, it will no longer be the case that all training examples appear in the kernel expansion, and the weights of the kernels in the expansion will no longer be uniform. In the feature space representation, this statement corresponds to saying that we will study all normal vectors w of decision hyperplanes that can be represented as linear combinations of the training examples. For instance, we might want to remove the influence of patterns that are very far away from the decision boundary, either since we expect that they will not improve the generalization error of the decision function, or since we would like to reduce the computational cost of evaluating the decision function (cf. (10)). The hyperplane will then only depend on a subset of training examples, called support vectors.

2 Learning Pattern Recognition from Examples

With the above example in mind, let us now consider the problem of pattern recognition in a more formal setting, highlighting some ideas developed in statistical learning theory [18]. In two-class pattern recognition, we seek to estimate a function

(15)

based on input-output training data (1). We assume that the data were generated independently from some unknown (but fixed) probability distribution P(x, y). Our goal is to learn a function that will correctly classify unseen examples (x,y), i.e., we want f(x) = y for examples (x,y) that were also generated from P(x,y).

If we put no restriction on the class of functions that, we choose our estimate f from, however, even a function which does well on the training data, e.g. by satisfying f(xi) = yi for all i = 1,…,m, need not generalize well to unseen examples. To see this, note that for each function f , there exists another function f* such that f*(xi) = f(xi) for all i = 1,…,m. As we are only given the training data, we have no means of selecting which of the two functions (and hence which of the completely different sets of test label predictions) is preferable. Hence, only minimizing the training error (or empirical risk),

(16)

does not imply a small test error (called risk), averaged over test examples drawn from the underlying distribution P(x, y),

(17)

Statistical learning theory ([16], [18], [19]), or VC (Vapnik-Chervonenkis) theory, shows that it is imperative to restrict the class of functions that f is chosen from to one which has a capacity that is suitable for the amount of available training data. VC theory provides bounds on the test error. The minimization of these bounds, which depend on both the empirical risk and the capacity of the function class, leads to the principle of structural risk minimization. The best-known capacity concept of VC theory is the VC dimension, defined as the largest number h of points that can be separated in all possible ways using functions of the given class. An example of a VC bound is the following: if h < m is the VC dimension of the class of functions that the learning machine can implement, then for all functions of that class, with a probability of at least 1 − η, the bound

(18)

holds, where m is the number of training examples and the confidence term ψ is defined as

(19)

Tighter bounds can be formulated in terms of other concepts, such as the annealed VC entropy or the Growth function. These are usually considered to be harder to evaluate, but they play a fundamental role in the conceptual part of VC theory [18]. Alternative capacity concepts that can be used to formulate bounds include the fat shattering dimension [2].

The bound (18) deserves some further explanatory remarks. Suppose we wanted to learn a dependency where P(x, y) = P(x) · P(y), i.e., where the pattern x contains no information about the label y, with uniform P(y). Given a training sample of fixed size, we can then surely come up with a learning machine which achieves zero training error (provided we have no examples contradicting each other). However, in order to reproduce the random labellings, this machine will necessarily require a large VC dimension h. Thus, the confidence term (19), increasing monotonically with h, will be large, and the bound (18) will not support possible hopes that due to the small training error, we should expect a small test error. This makes it understandable how (18) can hold independently of assumptions about the underlying distribution P(x, y): it always holds (provided that h < m), but it does not always make a nontrivial prediction — a bound on an error rate becomes void if it, is larger than the maximum error rate. In order to get nontrivial predictions from (18), the function space must be restricted such that the capacity (e.g. VC dimension) is small enough (in relation to the available amount of data).

3 Optimal Margin Hyperplane Classifiers

In the present section, we shall describe a hyperplane learning algorithm that can be performed in a dot product space (such as the feature space that we introduced previously). As described in the previous section, to design learning algorithms, one needs to come up with a class of functions whose capacity can be computed.

Vapnik and Lerner [17] considered the class of hyperplanes

(20)

corresponding to decision functions

(21)

and proposed a learning algorithm for separable problems, termed the Generalized Portrait for constructing f from empirical data. It is based on two facts. First, among all hyperplanes separating the data (assuming that the data is separable), there exists a unique one yielding the maximum margin of separation between the classes,

(22)

Second, the capacity can be shown to decrease with increasing margin.

To construct this Optimum Margin Hyperplane (cf. Figure 1), one solves the following optimization problem:

(23)

(24)

Figure 1 A binary classification toy problem: separate balls from diamonds. The Optimum Margin Hyperplane is orthogonal to the shortest line connecting the convex hulls of the two classes (dotted), and intersects it half-way between the two classes. The problem being separable, there exists a weight vector w and a threshold b such that y i · (( w · x i ) + b ) > 0, ( i = 1,…, m ). Rescaling w and b such that the point(s) closest to the hyperplane satisfy |( w · x i ) + b | = 1, we obtain a canonical form ( w , b ) of the hyperplane, satisfying y i · (( w · x i ) + b ) ≥ 1. Note that in this case, the margin , measured perpendicularly to the hyperplane, equals 2/|| w ||. This can be seen by considering two points x 1 , x 2 on opposite sides of the margin, i.e., ( w · x 1 ) + b = 1, ( w · x 2 ) + b = − 1, and projecting them onto the hyperplane normal vector w /|| w (from [13]).

This constrained optimization problem is dealt with by introducing Lagrange multipliers αi ≥ 0 and a Lagrangian

(25)

The Lagrangian L has to be minimized with respect to the primal variables w and b and maximized with respect to the dual variables αi, (i.e., a saddle point has to be found). Let us try to get some intuition for this. If a constraint (24) is violated, then yi · ((w · xi) + b) − 1 < 0, in which case L can be increased by increasing the corresponding αi. At the same time, w and b will have to change such that L decreases. To prevent − αi(yi · ((w · xi) + b) − 1) from becoming arbitrarily large, the change in w and b will ensure that, provided the problem is separable, the constraint will eventually be satisfied. Similarly, one can understand that for all constraints which are not precisely met as equalities, i.e., for which yi · ((w · xi) + b) − 1 > 0, the corresponding αi must be 0, for this is the value of αi that maximizes L. This is the statement of the Karush-Kuhn-Tucker conditions of optimization theory [5].

The condition that at the saddle point, the derivatives of L with respect to the primal variables must vanish,

(26)

leads to

(27)

and

(28)

The solution vector thus has an expansion in terms of a subset of the training patterns, namely those patterns whose αi is non-zero, called Support Vectors. By the Karush-Kuhn-Tucker conditions

(29)

the Support Vectors satisfy yi((xi · w) + b) − 1 = 0, i.e., they lie on the margin (cf. Figure 1). All remaining examples of the training set are irrelevant: their constraint (24) does not play a role in the optimization, and they do not appear in the expansion (28). This nicely captures our intuition of the problem: as the hyperplane (cf. Figure 1) is geometrically completely determined by the patterns closest to it, the solution should not depend on the other examples.

By substituting (27) and (28) into L, one eliminates the primal variables and arrives at the Wolfe dual of the optimization problem (e.g., [5]): find multipliers αi which

(30)

(31)

By substituting (28) into (21), the hyperplane decision function can thus be written as

(32)

where b is computed using (29).

The structure of the optimization problem closely resembles those that typically arise in Lagrange’s formulation of mechanics. There, often only a subset of the constraints become active. For instance, if we keep a ball in a box, then it will typically roll into one of the corners. The constraints corresponding to the walls which are not touched by the ball are irrelevant, the walls could just as well be removed.

Seen in this light, it is not too surprising that it is possible to give a mechanical interpretation of optimal margin hyperplanes [7]: If we assume that each support vector xi exerts a perpendicular force of size αi and sign yi .

There are theoretical arguments supporting the good generalization performance of the optimal hyperplane ([16], [23], [3], [20]). In addition, it is computationally attractive, since it can be constructed by solving a quadratic programming problem.

4 Support Vector Classifiers

We now have all the tools to describe support vector machines , we thus need to employ (5), which expresses the dot product of bold face feature vectors x, x′ in terms of the kernel k evaluated on input patterns x, x′,

(33)

This can be done since all feature vectors only occured in dot products. The weight vector (cf. (28)) then becomes an expansion in feature space, and will thus typically no longer correspond to the image of a single vector from input space. We thus obtain decision functions of the more general form (cf. (32))

(34)

Figure 2 Example of a Support, Vector classifier found by using a radial basis function kernel k ( x , x ′) = exp(−|| x − x ′|| ² ). Both coordinate axes range from − 1 to + 1. Circles and disks are two classes of training examples; the middle line is the decision surface; the outer lines precisely meet the constraint, of the decision function (34) ) (from [13]).

and the following quadratic program (cf. (30)):

(35)

(36)

In practice, a separating hyperplane may not exist, e.g. if a high noise level causes a large overlap of the classes. To allow for the possibility of examples violating (24), one introduces slack variables [8]

(37)

in order to relax the constraints to

(38)

A classifier which generalizes well is then found by controlling both the classifier capacity (via ||w. The latter is done as it can be shown to provide an upper bound on the number of training errors which leads to a convex optimization problem.

One possible realization of a soft margin classifier is minimizing the objective function

(39)

subject to the constraints (37) and (38), for some value of the constant C > 0 determining the trade-off. Here, we use the shorthand ξ = (ξ1,…, ξm). Incorporating kernels, and rewriting it in terms of Lagrange multipliers, this again leads to the problem of maximizing (35), subject to the constraints

(40)

The only difference from the separable case is the upper bound C on the Lagrange multipliers αi. This way, the influence of the individual patterns (which could be outliers) gets limited. As above, the solution takes the form (34). The threshold b can be computed by exploiting the fact that for all SVs xi with αi < C, the slack variable ξi is zero (this again follows from the Karush-Kuhn-Tucker complementarity conditions), and hence

(41)

Another possible realization of a soft margin variant of the optimal hyperplane uses the ν-parametrization [15]. In it, the parameter C is replaced by a parameter ν , separation constraints

(42)

The margin parameter ρ is a variable of the optimization problem. The dual can be shown to consist of maximizing the quadratic part of (35), subject to 0 ≤ αi ≤ 1 /(νm. The advantage of the ν-SVM is its more intuitive parametrization.

We conclude this section by noting that the SV algorithm has been generalized to problems such as regression estimation [18] as well as one-class problems and novelty detection [15]. The algorithms involved are similar to the case of pattern recognition described above. Moreover, the kernel method for computing dot products in feature spaces is not restricted to SV machines. Indeed, it has been pointed out that it can be used to develop nonlinear generalizations of any algorithm that can be cast in terms of dot products, such as principal component analysis [15], and a number of developments have followed this example.

5 Polynomial Kernels

We now take a closer look at the issue of the similarity measure, or kernel, k.

, endowed with the canonical dot product (3).

5.1 Product Features

where most information is contained in the d-th order products (monomials) of entries [x]j of x,

(43)

where j1, …, jd ∈{1, …, N}. In that case, we might prefer to extract these product features, and work in the feature space F of all products of d entries. In visual recognition problems, where images are often represented as vectors, this would amount to extracting features which are products of individual pixels.

, we can collect all monomial feature extractors of degree 2 in the nonlinear map

(44)

(45)

This approach works fine for small toy examples, but it fails for realistically sized problems: for N-dimensional input patterns, there exist

(46)

different monomials (43), comprising a feature space F of dimensionality NF. For instance, already 16 × 16 pixel input images and a monomial degree d = 5 yield a dimensionality of 10¹⁰.

In certain cases described below, there exists, however, a way of computing dot products . Thus, if the subsequent processing can be carried out using dot products exclusively, we are able to deal with the high dimensionality.

The following section describes how dot products in polynomial feature spaces can be computed efficiently.

5.2 Polynomial Feature Spaces Induced by Kernels

In order to compute dot products of the form (Φ(x) · Φ(x′)), we employ kernel representations of the form

(47)

which allow us to compute the value of the dot product in F without having to carry out the map Φ. This method was used by Boser, Guyon and Vapnik [6] to extend the Generalized Portrait hyperplane classifier of Vapnik and Chervonenkis [16] to nonlinear Support Vector machines. Aizerman et al. [1] call F the linearization space, and used in the context of the potential funct ion classification method to express the dot product between elements of F in terms of elements of the input space.

What does k look like for the case of polynomial features? We start by giving an example [18] for N = d = 2. For the map

(48)

dot, products in F take the form

(49)

i. e., the desired kernel k is simply the square of the dot product in input space. The same works for arbitrary N[6]:

Proposition 1

Define Cd to map to the vector Cd(x) whose entries are all possible d-th. degree ordered products of the entries of x. Then the corresponding kernel computing the dot product, of vectors mapped by Cd is

(50)

Proof. We directly compute

(51)

(52)

Instead of ordered products, we can use unordered ones to obtain a map Φd which yields the same value of the dot product. To this end, we have to compensate for the multiple occurence of certain monomials in Cd by scaling the respective entries of Φd with the square roots of their numbers of occurence. Then, by this definition of Φd, and (50),

(53)

For instance, if n of the ji in (43) are equal, and the remaining ones are different then the coefficient in the corresponding component of Φd (for the general case, cf. [15]). For Φ2, this simply means that [18]

(54)

If x represents an image with the entries being pixel values, we can use the kernel (x · x′)d to work in the space spanned by products of any d pixels — provided that we are able to do our work solely in terms of dot products, without any explicit usage of a mapped pattern Φd(x). Using kernels of the form (50), we take into account higher-order statistics without the combinatorial explosion (cf. (46)) of time and memory complexity which goes along already with moderately high N and d.

To conclude this section, note that it is possible to modify (50) such that it maps into the space of all monomials up to degree d, defining

(55)

6 Examples of Kernels

When considering feature maps, it is also possible to look at things the other way around, and start with the kernel. Given a kernel function satisfying a condition termed positive definiteness, it is possible to construct a feature space such that the kernel computes the dot product in that feature space. This has been brought to the attention of the machine learning community by [1], [6], and [18]. In functional analysis, the issue has been studied under the heading of Hilbert space representations of kernels. A good monograph on the theory of kernels is [4]. A treatment which focuses on the aspects relevant to machine learning can be found in [15].

The condition of positive definiteness is equivalent to saying that no matter what are the training patterns xi, all eigenvalues of the Gram matrix (k(xi, xj)ij are nonnegative. Besides (50), a popular kernel satisfying this condition is the Gaussion radial basis function (RBF) [1]

(56)

where α > 0. Note that this kernel is translation invariant. It has some additional properties, which may serve as an example of how the choice of kernel can have rather strong implications on the geometry of the domain occupied by the data mapped into the feature space.

As the Gaussian RBF kernel satisfies k(x, x, each mapped example has unit length, ||Φ(x)|| = 1. In addition, as k(x, x, all points lie inside the same orthant in feature space. To see this, recall that, for unit length vectors, the dot product (3) equals the cosine of the enclosed angle. Hence

(57)

which amounts to saying that the enclosed angle between any two mapped examples is smaller than π/2.

The examples given so far apply to the case of vectorial data. In fact it is possible to construct kernels that are used to compute similarity scores for data drawn from rather different domains. This generalizes kernel learning algorithms to a large number of situations where a vectorial representation is not readily available (is not a vector space.

Example 1

Similarity of probabilistic events

If is a σ-algebra, and P a probability measure on , and A and B two events in , then

(58)

is a positive definite kernel.

Further examples include kernels for string matching, as proposed by [21] and [10].

There is an analogue of the kernel trick for distances rather than dot products, i.e., dissimilarities rather than similarities. This leads to the class of conditionally positive definite kernels. In a nutshell, these kernels can be represented by distances in Hilbert spaces, and they contain positive definite kernels as special cases. Interestingly, it turns out that SVMs and kernel PCA can be applied also with this larger class of kernels, due to their being translation invariant in feature space [15]. An even larger class of dissimilarity functions, namely all semi-metrics, is obtained by not requiring the embedding to be in a Hilbert space, but in the more general class of Banach spaces.

7 Conclusion

One of the most appealing features of kernel algorithms is the solid foundat ion provided by both statistical learning theory and functional analysis. Kernel methods let us interpret (and design) learning algorithms geometrically in feature spaces nonlinearly related to the input space, and combine statistics and geometry in a promising way. Kernels provide an elegant framework for studying three fundamental issues of machine learning:

• Similarity measures — the kernel can be viewed as a (nonlinear) similarity measure, and should ideally incorporate prior knowledge about the problem at hand

• Data representation — as described above, kernels induce representations of the data in a linear space

• Function class — due to the representer theorem, the kernel implicitly also determines the function class which is used for learning.

Acknowledgements

Thanks to Assaf Naor for useful remarks about the representation of dissimilarities, and to Isabelle Guyon, Alex Smola, and Jason Weston for help with earlier versions of the manuscript.

REFERENCES

1. Aizerman MA, Braverman É.M., Rozonoér LI. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 1964;25:821–837.

2. Alon N, Ben-David S, Cesa-Bianchi N, Haussler D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM. 1997;44(4):615–631.

3. Bartlett PL, Shawe-Taylor J. Generalization performance of support vector machines and other pattern classifiers. In: Schölkopf B, Burges CJC, Smola AJ, eds. Advances in Kernel Methods — Support Vector Learning. Cambridge, MA: MIT Press; 1999:43–54.

4. Berg C, Christensen JPR, Ressel P. Harmonic Analysis on Semigroups. New York: Springer-Verlag; 1984.

5. Bertsekas DP. Nonlinear Programming. Belmont, MA: Athena Scientific; 1995.

6. Boser BE, Guyon IM, Vapnik V. A training algorithm for optimal margin classifiers. In: Haussler D, ed. Proceedings of the 5th, Annual ACM Workshop on Computational Learning Theory; Pittsburgh, PA: ACM Press; July 1992:144–152.

7. Burges CJC, Schölkopf B. Improving the accuracy and speed of support vector learning machines. In: Mozer M, Jordan M, Petsche T, eds. Advances in Neural Information Processing Systems 9. Cambridge, MA: MIT Press; 1997:375–381.

8. Cortes C, Vapnik V. Support, vector networks. Machine Learning. 1995;20:273–297.

9. Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press; 2000.

10. Haussier D. Convolutional kernels on discrete structures. In: Technical Report UCSC-CRL-99-10, Computer Science Department, University of California at Santa Cruz; 1999.

11. Kimeldorf GS, Wahba G. Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications. 1971;33:82–95.

12. Mercer J. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London, A. 1909;209:415–446.

13. Schölkopf B. In: Support Vector Learning. München: R. Oldenbourg Verlag; 1997. Doktorarbeit, Technische Universität, Berlin. Available from http://www.kyb.tuebingen.mpg.de/~bs.

14. Schölkopf B, Burges CJC, Smola AJ. Advances in Kernel Methods — Support Vector Learning. Cambridge, MA: MIT Press; 1999.

15. Schölkopf B, Smola AJ. Learning with Kernels. Cambridge, MA: MIT Press; 2002.

16. Vapnik V, Chervonenkis A. Theory of Pattern Recognition [in Russian]. Moscow: Nauka; 1974. Wapnik W, Tscherwonenkis A. Theorie der Zeichenerkennung. Berlin: Akademie–Verlag; 1979 German Translation:.

17. Vapnik V, Lerner A. Pattern recognition using generalized portrait method. Automation and Remote Control. 1963;24:774–780.

18. Vapnik VN. The Nature of Statistical Learning Theory. New York: Springer Verlag; 1995.

19. Vapnik VN. Statistical Learning Theory. New York: Wiley; 1998.

20. von Luxburg U, Bousquet O, Schölkopf B. A compression approach to support vector model selection. In: Technical Report 101, Max Planck Institute for Biological Cybernetics; 2002.

21. Watkins C. Dynamic alignment kernels. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D, eds. Advances in Large Margin Classifiers. Cambridge, MA: MIT Press; 2000:39–50.

22. Weston J, Chapelle O, Elisseeff A, Schölkopf B, Vapnik V. Kernel dependency estimation. In: Becker S, Thrun S, Obermayer K, eds. Cambridge, MA, USA: MIT Press; . Advances in Neural Information Processing Systems. 2003;15.

23. Williamson RC, Smola AJ, Schölkopf B. Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. IEEE Transactions on Information Theory. 2001;47(6):2516–2532.

¹ The present article is partly based on Microsoft TR-2000-23, Redmond, WA.

, the situation is more complex. For algorithms to deal with this case. cf. [22].

Bagging, Subagging and Bragging for Improving some Prediction Algorithms

Peter Bühlmann Seminar für Statistik, ETH Zürich, CH-8092 Zürich, Switzerland

Bagging (bootstrap aggregating), proposed by Breiman [1], is a method to improve the predictive power of some special estimators or algorithms such as regression or classification trees. First, we review a recently developed theory explaining why bagging decision trees, or also the subagging (subsample aggregating) variant, yields smooth decisions, reducing the variance and mean squared error. We then propose bragging (bootstrap robust aggregating) as a new version of bagging which, in contrast to bagging, is empirically demonstrated to improve also the MARS algorithm which itself already yields continuous function estimates. Finally, bagging is demonstrated as a module in conjunction with boosting for an example about tumor classification using microarray gene expressions.

1 Introduction

Bagging [1], a sobriquet for bootstrap aggregating, is a method for improving unstable estimation or classification schemes. It is very useful for large, high dimensional data set problems where finding a good model or classifier is difficult.

Breiman [1] motivated bagging as a variance reduction technique for a given basis algorithm (i.e. an estimator) such as decision trees or a method that does variable selection and fitting in a linear model. It has attracted much attention and is quite frequently applied, although theoretical insights have been lacking until very recently [4-6], We present here parts of a theory from Bühlmann and Yu [4], indicating that bagging is a smoothing operation which turns out to be advantageous when aiming to improve the predictive performance of regression or classification trees. In case of regression trees, this theory confirms Breiman’s intuition that bagging is a variance reduction technique, reducing also the mean squared error (MSE). The same also holds for subagging (subsample aggregating) which is a computationally cheaper version than bagging. However, for other complex basis algorithms, the variance and MSE reduction effect of bagging is not necessarily true; this has also been shown by Buja and Stuetzle [5] in the simpler case where the estimator is a U-statistics.

Moreover, we propose bragging (bootstrap robust aggregating) as a simple, yet new modification which improves an estimation procedure not necessarily because of its smoothing effects but also due to averaging over unstable selection of variables or terms in complex models or algorithms; empirical examples are given for bragging MARS, where bagging is often not a useful device, whereas bragging turns out to be effective.

Bagging can also be useful as a module in other algorithms: BagBoosting [3] is a boosting algorithm [8] with a bagged learner, often a bagged regression tree. From our theory it will become evident that BagBoosting using bagged regression trees, which have smaller MSEs than trees, is better than boosting with regression trees. We demonstrate improvements of BagBoosting over boosting for a problem about tumor classification using microarray gene expression predictors.

2 Bagging and Subagging

Consider the regression or classification setting. The data is given as i.i.d. pairs (Xi, Yi) (i = 1,…, ndenotes the d(regression) or Yi ∈ {0,1,…, J –1} (classification with J classes). The goal is function estimation and the target function is usually E[Y|X = xfor classification. We denote in the sequel a function estimator, which is the outcome from a given basis algorithm, by

defined by the function hn(·). Examples of such estimators include linear regression with variable selection, regression trees such as CART [2] or MARS [9].

Definition 1

(Bagging). Theoretically, bagging is defined as follows.

(I) Construct a bootstrap sample (X*1,Y*1), …, (X*n,Y*n) by random drawing n times with replacement from the data (X1, Y1), …, (Xn, Yn).

(II) Compute the bootstrapped estimator by the plug-in

Enjoying the preview?

Page 1 of 1

Recent Advances and Trends in Nonparametric Statistics

About this ebook

M.G. Akritas

Related authors

Related to Recent Advances and Trends in Nonparametric Statistics

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Recent Advances and Trends in Nonparametric Statistics

What did you think?

Book preview

Recent Advances and Trends in Nonparametric Statistics - M.G. Akritas

An Introduction to Support Vector Machines

1 An Introductory Example

(8)

(10)

(11)

2 Learning Pattern Recognition from Examples

(19)

3 Optimal Margin Hyperplane Classifiers

(22)

(24)

(25)

(26)

(29)

(30)

(31)

(32)

4 Support Vector Classifiers

(34)

(35)

(36)

(38)

(40)

5 Polynomial Kernels

5.1 Product Features

(45)

5.2 Polynomial Feature Spaces Induced by Kernels

(48)

(49)

Proposition 1

(51)

(53)

6 Examples of Kernels

(57)

Example 1

Similarity of probabilistic events

7 Conclusion

Acknowledgements

REFERENCES

Bagging, Subagging and Bragging for Improving some Prediction Algorithms

1 Introduction

2 Bagging and Subagging

Definition 1