Statistical Implications of Turing's Formula

Ebook533 pages2 hours

Statistical Implications of Turing's Formula

Name: Statistical Implications of Turing's Formula
Author: Zhiyi Zhang
ISBN: 9781119237099

By Zhiyi Zhang

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Features a broad introduction to recent research on Turing’s formula and presents modern applications in statistics, probability, information theory, and other areas of modern data science

Turing's formula is, perhaps, the only known method for estimating the underlying distributional characteristics beyond the range of observed data without making any parametric or semiparametric assumptions. This book presents a clear introduction to Turing’s formula and its connections to statistics. Topics with relevance to a variety of different fields of study are included such as information theory; statistics; probability; computer science inclusive of artificial intelligence and machine learning; big data; biology; ecology; and genetics. The author provides examinations of many core statistical issues within modern data science from Turing's perspective. A systematic approach to long-standing problems such as entropy and mutual information estimation, diversity index estimation, domains of attraction on general alphabets, and tail probability estimation is presented in light of the most up-to-date understanding of Turing's formula. Featuring numerous exercises and examples throughout, the author provides a summary of the known properties of Turing's formula and explains how and when it works well; discusses the approach derived from Turing's formula in order to estimate a variety of quantities, all of which mainly come from information theory, but are also important for machine learning and for ecological applications; and uses Turing's formula to estimate certain heavy-tailed distributions.

In summary, this book:

• Features a unified and broad presentation of Turing’s formula, including its connections to statistics, probability, information theory, and other areas of modern data science

• Provides a presentation on the statistical estimation of information theoretic quantities

• Demonstrates the estimation problems of several statistical functions from Turing's perspective such as Simpson's indices, Shannon's entropy, general diversity indices, mutual information, and Kullback–Leibler divergence

• Includes numerous exercises and examples throughout with a fundamental perspective on the key results of Turing’s formula

Statistical Implications of Turing's Formula is an ideal reference for researchers and practitioners who need a review of the many critical statistical issues of modern data science. This book is also an appropriate learning resource for biologists, ecologists, and geneticists who are involved with the concept of diversity and its estimation and can be used as a textbook for graduate courses in mathematics, probability, statistics, computer science, artificial intelligence, machine learning, big data, and information theory.

Zhiyi Zhang, PhD, is Professor of Mathematics and Statistics at The University of North Carolina at Charlotte. He is an active consultant in both industry and government on a wide range of statistical issues, and his current research interests include Turing's formula and its statistical implications; probability and statistics on countable alphabets; nonparametric estimation of entropy and mutual information; tail probability and biodiversity indices; and applications involving extracting statistical information from low-frequency data space. He earned his PhD in Statistics from Rutgers University.

Skip carousel

Mathematics

LanguageEnglish

PublisherWiley

Release dateOct 21, 2016

ISBN9781119237099

Author

Zhiyi Zhang

Related authors

Skip carousel

Related to Statistical Implications of Turing's Formula

Related ebooks

Skip carousel

Quantile Regression: Theory and Applications
Ebook
Quantile Regression: Theory and Applications
byCristina Davino
Rating: 0 out of 5 stars
0 ratings
Complex Surveys: A Guide to Analysis Using R
Ebook
Complex Surveys: A Guide to Analysis Using R
byThomas Lumley
Rating: 0 out of 5 stars
0 ratings
Probability, Statistics, and Stochastic Processes
Ebook
Probability, Statistics, and Stochastic Processes
byPeter Olofsson
Rating: 0 out of 5 stars
0 ratings
Health and Numbers: A Problems-Based Introduction to Biostatistics
Ebook
Health and Numbers: A Problems-Based Introduction to Biostatistics
byChap T. Le
Rating: 0 out of 5 stars
0 ratings
Analyzing Quantitative Data: An Introduction for Social Researchers
Ebook
Analyzing Quantitative Data: An Introduction for Social Researchers
byDebra Wetcher-Hendricks
Rating: 0 out of 5 stars
0 ratings
Statistical Inference: A Short Course
Ebook
Statistical Inference: A Short Course
byMichael J. Panik
Rating: 4 out of 5 stars
4/5
Data Analysis with R
Ebook
Data Analysis with R
byFischetti Tony
Rating: 5 out of 5 stars
5/5
Data Analysis: What Can Be Learned From the Past 50 Years
Ebook
Data Analysis: What Can Be Learned From the Past 50 Years
byPeter J. Huber
Rating: 0 out of 5 stars
0 ratings
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
Ebook
Analyzing the Large Number of Variables in Biomedical and Satellite Imagery
byPhillip I. Good
Rating: 0 out of 5 stars
0 ratings
Numbers
Ebook
Numbers
byHenry F. De Francesco
Rating: 0 out of 5 stars
0 ratings
Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals
Ebook
Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals
byDavid Aronson
Rating: 4 out of 5 stars
4/5
Statistical Hypothesis Testing with SAS and R
Ebook
Statistical Hypothesis Testing with SAS and R
byDirk Taeger
Rating: 0 out of 5 stars
0 ratings
Latent Class Analysis of Survey Error
Ebook
Latent Class Analysis of Survey Error
byPaul P. Biemer
Rating: 0 out of 5 stars
0 ratings
Statistics Super Review, 2nd Ed.
Ebook
Statistics Super Review, 2nd Ed.
byThe Editors of REA
Rating: 5 out of 5 stars
5/5
Common Errors in Statistics (and How to Avoid Them)
Ebook
Common Errors in Statistics (and How to Avoid Them)
byPhillip I. Good
Rating: 0 out of 5 stars
0 ratings
Computational Statistics
Ebook
Computational Statistics
byGeof H. Givens
Rating: 5 out of 5 stars
5/5
Essential Statistics, Regression, and Econometrics
Ebook
Essential Statistics, Regression, and Econometrics
byGary Smith
Rating: 0 out of 5 stars
0 ratings
Statistics Essentials For Dummies
Ebook
Statistics Essentials For Dummies
byDeborah J. Rumsey
Rating: 3 out of 5 stars
3/5
Statistics: 1,001 Practice Problems For Dummies (+ Free Online Practice)
Ebook
Statistics: 1,001 Practice Problems For Dummies (+ Free Online Practice)
byConsumer Dummies
Rating: 3 out of 5 stars
3/5
Statistics for Physical Sciences: An Introduction
Ebook
Statistics for Physical Sciences: An Introduction
byBrian Martin
Rating: 0 out of 5 stars
0 ratings
Statistical Arbitrage: Algorithmic Trading Insights and Techniques
Ebook
Statistical Arbitrage: Algorithmic Trading Insights and Techniques
byAndrew Pole
Rating: 3 out of 5 stars
3/5
Statistics Workbook For Dummies with Online Practice
Ebook
Statistics Workbook For Dummies with Online Practice
byDeborah J. Rumsey
Rating: 0 out of 5 stars
0 ratings
Fundamental Statistical Principles for the Neurobiologist: A Survival Guide
Ebook
Fundamental Statistical Principles for the Neurobiologist: A Survival Guide
byStephen W. Scheff
Rating: 5 out of 5 stars
5/5
Multiple Imputation and its Application
Ebook
Multiple Imputation and its Application
byJames Carpenter
Rating: 0 out of 5 stars
0 ratings
Bayesian Inference in the Social Sciences
Ebook
Bayesian Inference in the Social Sciences
byIvan Jeliazkov
Rating: 0 out of 5 stars
0 ratings
R All-in-One For Dummies
Ebook
R All-in-One For Dummies
byJoseph Schmuller
Rating: 0 out of 5 stars
0 ratings
TASC For Dummies
Ebook
TASC For Dummies
byNicole Hersey
Rating: 0 out of 5 stars
0 ratings
Statistical Methods for Ranking Data
Ebook
Statistical Methods for Ranking Data
byMayer Alvo
Rating: 0 out of 5 stars
0 ratings
Statistics in Psychology Using R and SPSS
Ebook
Statistics in Psychology Using R and SPSS
byDieter Rasch
Rating: 0 out of 5 stars
0 ratings
Business Statistics For Dummies
Ebook
Business Statistics For Dummies
byAlan Anderson
Rating: 5 out of 5 stars
5/5

Mathematics For You

Skip carousel

My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
Ebook
Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis
byJ Scott
Rating: 0 out of 5 stars
0 ratings
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Little Book of Mathematical Principles, Theories & Things
Ebook
The Little Book of Mathematical Principles, Theories & Things
byRobert Solomon
Rating: 3 out of 5 stars
3/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Flatland
Ebook
Flatland
byEdwin A. Abbott
Rating: 4 out of 5 stars
4/5
Algebra I For Dummies
Ebook
Algebra I For Dummies
byMary Jane Sterling
Rating: 4 out of 5 stars
4/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
Logicomix: An epic search for truth
Ebook
Logicomix: An epic search for truth
byApostolos Doxiadis
Rating: 4 out of 5 stars
4/5
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
Ebook
The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives
byKit Yates
Rating: 4 out of 5 stars
4/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
Basic Math Notes
Ebook
Basic Math Notes
byErnest Bywater
Rating: 5 out of 5 stars
5/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5
A Mind for Numbers | Summary
Ebook
A Mind for Numbers | Summary
bySummary Station
Rating: 4 out of 5 stars
4/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5

Related podcast episodes

Skip carousel

#147 – Spencer Greenberg on stopping valueless papers from getting into top journals: Can you trust the things you read in published scientific research? Not really. 
Podcast episode
#147 – Spencer Greenberg on stopping valueless papers from getting into top journals: Can you trust the things you read in published scientific research? Not really. 
by80,000 Hours Podcast
0 ratings
0% found this document useful
“No One Listens to the Radio Anymore”
Podcast episode
“No One Listens to the Radio Anymore”
byWizard of Ads Monday Morning Memo
0 ratings
0% found this document useful
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
Podcast episode
Alignment Newsletter #168: Four technical topics for which Open Phil is soliciting grant proposals: Four technical topics for which Open Phil is soliciting grant proposals
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
Rick Kubina Returns! Session 93: I’m thrilled to bring back Dr. Rick Kubina of Central Reach and the University of Pennsylvania. It’s been quite a while since he was first on the podcast, so we spend the initial segment of the program catching up on some exciting things that...
Podcast episode
Rick Kubina Returns! Session 93: I’m thrilled to bring back Dr. Rick Kubina of Central Reach and the University of Pennsylvania. It’s been quite a while since he was first on the podcast, so we spend the initial segment of the program catching up on some exciting things that...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
James D. Stein, "The Fate of Schrodinger's Cat: Using Math and Computers to Explore the Counterintuitive" (World Scientific, 2020): Stein shows how high-school algebra and basic probability theory, with the invaluable assistance of computer simulations, can be used to investigate both the intuitive and the counterintuitive....
Podcast episode
James D. Stein, "The Fate of Schrodinger's Cat: Using Math and Computers to Explore the Counterintuitive" (World Scientific, 2020): Stein shows how high-school algebra and basic probability theory, with the invaluable assistance of computer simulations, can be used to investigate both the intuitive and the counterintuitive....
byNew Books in Mathematics
0 ratings
0% found this document useful
41: Reality Is More Than Complex (Group Theory and Physics): Children who are being taught mathematics often balk at the idea of negative numbers, thinking them to be fictional entities, and often only learn later that they are useful for expressing opposite extremes of things, such as considering a debt an...
Podcast episode
41: Reality Is More Than Complex (Group Theory and Physics): Children who are being taught mathematics often balk at the idea of negative numbers, thinking them to be fictional entities, and often only learn later that they are useful for expressing opposite extremes of things, such as considering a debt an...
byBreaking Math Podcast
0 ratings
0% found this document useful
The DANGEROUS ASSUMPTIONS You're Making In Your Investing | SDITalk.com/310
Podcast episode
The DANGEROUS ASSUMPTIONS You're Making In Your Investing | SDITalk.com/310
bySelf Directed Investor Talk: Alternative Asset Investing through Self-Directed IRA's & Solo 401k's
0 ratings
0% found this document useful
90. LEAN Theorem Provers used to model Physics and Chemistry: http://breakingmath.io Breaking Math Email: BreakingMathPodcast@gmail.com Email us for copies of the transcript! Resources on the LEAN theorem prover and programming language can be found at the bottom of the show notes (scroll to the bottom). ...
Podcast episode
90. LEAN Theorem Provers used to model Physics and Chemistry: http://breakingmath.io Breaking Math Email: BreakingMathPodcast@gmail.com Email us for copies of the transcript! Resources on the LEAN theorem prover and programming language can be found at the bottom of the show notes (scroll to the bottom). ...
byBreaking Math Podcast
0 ratings
0% found this document useful
SI291: Trend Following...What should it mean to Investors? ft. Richard Brennan
Podcast episode
SI291: Trend Following...What should it mean to Investors? ft. Richard Brennan
byTop Traders Unplugged
0 ratings
0% found this document useful
SI221: Will 2023 be another Trend Following Year? ft. Alan Dunne
Podcast episode
SI221: Will 2023 be another Trend Following Year? ft. Alan Dunne
byTop Traders Unplugged
0 ratings
0% found this document useful
WLP186 Thoughtful Thursday: Reading Between the Headlines: Today’s unusual episode brings us an interview originally recorded for another show, but it’s a great fit for our audience here, with whom we often share and discuss interesting research findings from the changing world of work. Journalism and...
Podcast episode
WLP186 Thoughtful Thursday: Reading Between the Headlines: Today’s unusual episode brings us an interview originally recorded for another show, but it’s a great fit for our audience here, with whom we often share and discuss interesting research findings from the changing world of work. Journalism and...
by21st Century Work Life and leading remote teams
0 ratings
0% found this document useful
SI244: Trend Following - How to Hunt them & Select a Manager ft. Richard Brennan
Podcast episode
SI244: Trend Following - How to Hunt them & Select a Manager ft. Richard Brennan
byTop Traders Unplugged
0 ratings
0% found this document useful
SI258: Absolute Momentum in a Chaotic Environment ft. Richard Brennan
Podcast episode
SI258: Absolute Momentum in a Chaotic Environment ft. Richard Brennan
byTop Traders Unplugged
0 ratings
0% found this document useful
Bad Data with Peter Schryvers: In This Episode, You Will Learn: Uncovering bad data as it relates to sports science. The in’s and out’s of the logic model. How to simplify complex systems. Measurement fallacies and biases you should be aware of. Resources + Links:...
Podcast episode
Bad Data with Peter Schryvers: In This Episode, You Will Learn: Uncovering bad data as it relates to sports science. The in’s and out’s of the logic model. How to simplify complex systems. Measurement fallacies and biases you should be aware of. Resources + Links:...
byThe High Performance Hockey Podcast
0 ratings
0% found this document useful
VOL09: The Magical Properties of Money ft. David Orrell
Podcast episode
VOL09: The Magical Properties of Money ft. David Orrell
byTop Traders Unplugged
0 ratings
0% found this document useful
The Inaugural Inside JABA Series: Session 102 with Drs. LeBlanc, St. Peter, and Tiger: Welcome to the first installment of The Inside JABA Series. A few months ago, Drs. Linda LeBlanc and Dorothea Lerman approached me about creating an ongoing podcast series that highlights and disseminates the work of The Journal of Applied Behavior...
Podcast episode
The Inaugural Inside JABA Series: Session 102 with Drs. LeBlanc, St. Peter, and Tiger: Welcome to the first installment of The Inside JABA Series. A few months ago, Drs. Linda LeBlanc and Dorothea Lerman approached me about creating an ongoing podcast series that highlights and disseminates the work of The Journal of Applied Behavior...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
The Cosmic Connection: Considering Messages of UFO Intelligence Within the Spiral of Consciousness
Podcast episode
The Cosmic Connection: Considering Messages of UFO Intelligence Within the Spiral of Consciousness
byPoint of Convergence
0 ratings
0% found this document useful
ABA in the Juvenile Justice System: Session 223: This is a fun conversation to share, not only because it involves chatting with three very smart grad students from my alma mater, , but also because the topic tackles an issue that is outside of what we might consider the "mainstream" of Applied...
Podcast episode
ABA in the Juvenile Justice System: Session 223: This is a fun conversation to share, not only because it involves chatting with three very smart grad students from my alma mater, , but also because the topic tackles an issue that is outside of what we might consider the "mainstream" of Applied...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
Podcast episode
Does Not Compute: Scientific journal articles have a lot of numbers. Scientists are smart people with even smarter computers, so an outsider might think that, if nothing else, you can count on the math checking out. But modern data analysis is complicated, and computation...
byThe Black Goat
0 ratings
0% found this document useful
SI03: Should you trade Futures or ETFs?
Podcast episode
SI03: Should you trade Futures or ETFs?
byTop Traders Unplugged
0 ratings
0% found this document useful
Aligning AI with our values: A conversation with Brian Christian
Podcast episode
Aligning AI with our values: A conversation with Brian Christian
byMany Minds
0 ratings
0% found this document useful
166: How to Understand and Talk About Climate Change
Podcast episode
166: How to Understand and Talk About Climate Change
byEveryday Disciple Podcast
0 ratings
0% found this document useful
#60 - Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better: Have you ever been infuriated by a doctor's unwillingness to give you an honest, probabilistic estimate about what to expect? Or a lawyer who won't tell you the chances you'll win your case?
Podcast episode
#60 - Prof Tetlock on why accurate forecasting matters for everything, and how you can do it better: Have you ever been infuriated by a doctor's unwillingness to give you an honest, probabilistic estimate about what to expect? Or a lawyer who won't tell you the chances you'll win your case?
by80,000 Hours Podcast
0 ratings
0% found this document useful
SI56: The risk of smooth and steady returns for risk avers investors
Podcast episode
SI56: The risk of smooth and steady returns for risk avers investors
byTop Traders Unplugged
0 ratings
0% found this document useful
#87 – Russ Roberts on whether it's more effective to help strangers, or people you know: If you want to make the world a better place, would it be better to help your niece with her SATs, or try to join the State Department and lower the risk that the US and China go to war? 
Podcast episode
#87 – Russ Roberts on whether it's more effective to help strangers, or people you know: If you want to make the world a better place, would it be better to help your niece with her SATs, or try to join the State Department and lower the risk that the US and China go to war? 
by80,000 Hours Podcast
0 ratings
0% found this document useful
GAL07: Avoiding Apocalypse Through Quantum Physics & AI ft. Klee Irwin
Podcast episode
GAL07: Avoiding Apocalypse Through Quantum Physics & AI ft. Klee Irwin
byTop Traders Unplugged
0 ratings
0% found this document useful
318: AI Literacy Experiments for the Family: In this episode, Jeff and Tricia discuss their favorite holiday food, introduce a new free guide on experimenting with AI for families, announce an upcoming series on mental health, promote an AI literacy cohort for PLCs and committees, highlight...
Podcast episode
318: AI Literacy Experiments for the Family: In this episode, Jeff and Tricia discuss their favorite holiday food, introduce a new free guide on experimenting with AI for families, announce an upcoming series on mental health, promote an AI literacy cohort for PLCs and committees, highlight...
byShifting Schools: Conversations for K12 Educators
0 ratings
0% found this document useful
CM 091: Seth Stephens-Davidowitz on Big Data as Truth Serum: Do you really know your neighbors or coworkers? To understand human behavior, we need research participants who act and respond truthfully. But that is a tall order when it comes to topics that are embarrassing or even incriminating.
Podcast episode
CM 091: Seth Stephens-Davidowitz on Big Data as Truth Serum: Do you really know your neighbors or coworkers? To understand human behavior, we need research participants who act and respond truthfully. But that is a tall order when it comes to topics that are embarrassing or even incriminating.
byCurious Minds at Work
0 ratings
0% found this document useful
33| Neuropsychology 3.0 – With Dr. Bob Bilder: Research has repeatedly demonstrated the benefits of neuropsychological evaluations to patients and their families. However, there is great potential for advancement and improvement in the field. For example, there is growing interest in...
Podcast episode
33| Neuropsychology 3.0 – With Dr. Bob Bilder: Research has repeatedly demonstrated the benefits of neuropsychological evaluations to patients and their families. However, there is great potential for advancement and improvement in the field. For example, there is growing interest in...
byNavigating Neuropsychology
0 ratings
0% found this document useful
Have You Misinterpreted the Data?
Podcast episode
Have You Misinterpreted the Data?
byWizard of Ads Monday Morning Memo
0 ratings
0% found this document useful

Skip carousel

The Ham Notebook
CQ Amateur Radio
Article
The Ham Notebook
Jul 1, 2021
7 min read
Why Do People Believe the Earth Is Flat?
Nautilus
Article
Why Do People Believe the Earth Is Flat?
May 22, 2023
3 min read
Taking Risks
New Philosopher
Article
Taking Risks
Jun 6, 2022
12 min read
Size Matters
New Zealand Listener
Article
Size Matters
Jan 8, 2024
2 min read
Thinking Like a Scientist Will Make You Happier
Nautilus
Article
Thinking Like a Scientist Will Make You Happier
Jun 15, 2022
11 min read
Certain And Confident: Predicting The Future In A Climate-Changing World
NPR
Article
Certain And Confident: Predicting The Future In A Climate-Changing World
Nov 10, 2017
The Climate Science Special Report, released by the White House last week, is a valuable read — it's a primer on how science works when it overlaps with the need to make informed bets on our future.
4 min read
In A New 'Anti-Science' Era, Bill Nye 'Saves The World' With Same Optimism
NPR
Article
In A New 'Anti-Science' Era, Bill Nye 'Saves The World' With Same Optimism
Apr 22, 2017
3 min read
Google Isn’t Grad School
The Atlantic
Article
Google Isn’t Grad School
Jul 6, 2023
4 min read
Microsoft’s Technology Chief Pivots To Pandemic Response
AppleMagazine
Article
Microsoft’s Technology Chief Pivots To Pandemic Response
May 1, 2020
3 min read
Microsoft’s Technology Chief Pivots To Pandemic Response
TechLife News
Article
Microsoft’s Technology Chief Pivots To Pandemic Response
May 2, 2020
3 min read
The Real Reason You’re Voting for Clinton or Trump
Nautilus
Article
The Real Reason You’re Voting for Clinton or Trump
Oct 21, 2016
6 min read
Letter: Is ‘The Geography of Partisan Prejudice’ Knowable?
The Atlantic
Article
Letter: Is ‘The Geography of Partisan Prejudice’ Knowable?
Mar 15, 2019
4 min read
The Present Phase of Stagnation in the Foundations of Physics Is Not Normal
Nautilus
Article
The Present Phase of Stagnation in the Foundations of Physics Is Not Normal
Nov 23, 2018
5 min read
Should You Tell Everyone They’re Honest?
Nautilus
Article
Should You Tell Everyone They’re Honest?
Jun 28, 2018
Here is the predicament that most of us seem to be in. We are not virtuous people. We simply do not have characters that are good enough to qualify as honest, compassionate, wise, courageous, and the like. We are not vicious people either—dishonest,
10 min read
How Do You Know Whether You Can Trust Poll Results? Here’s What To Watch Out For | Rob Vance
The Guardian
Article
How Do You Know Whether You Can Trust Poll Results? Here’s What To Watch Out For | Rob Vance
Aug 13, 2019
3 min read
Widely Used Algorithm For Follow-up Care In Hospitals Is Racially Biased, Study Finds
STAT
Article
Widely Used Algorithm For Follow-up Care In Hospitals Is Racially Biased, Study Finds
Oct 24, 2019
Researchers have found that a common algorithm used by hospitals often classified white patients overall as being more ill than black patients — even when they were just as sick.
4 min read
The Human Factor
AQ: Australian Quarterly
Article
The Human Factor
Mar 31, 2019
10 min read
For National STEM Day, Argonne Lab’s Valerie Taylor Talks About AI, ‘Star Trek’ And Diversity In The Sciences
Chicago Tribune
Article
For National STEM Day, Argonne Lab’s Valerie Taylor Talks About AI, ‘Star Trek’ And Diversity In The Sciences
Nov 9, 2023
5 min read
Harkanwal Singh
NZ Marketing
Article
Harkanwal Singh
Jun 18, 2017
2 min read
The Illusion Of Control
New Philosopher
Article
The Illusion Of Control
Jun 6, 2022
13 min read
Why It’s So Difficult to Change People’s Minds
Rotman Management
Article
Why It’s So Difficult to Change People’s Minds
May 1, 2020
AT THE END OF TODAY, four million blogs will have been posted, 80 million Instagram photos uploaded, and 600 million Tweets released into cyberspace. That’s more than 7,000 tweets per second. Why do we spend so many precious moments every day sharing
6 min read
The Cognitive Biases Tricking Your Brain
The Atlantic
Article
The Cognitive Biases Tricking Your Brain
Aug 4, 2018
19 min read
The Economist Who Says Schools Are Safer Than You Think
Reason
Article
The Economist Who Says Schools Are Safer Than You Think
Jan 21, 2021
12 min read
How Uncertainty Can Help Fight Science Denialism
Nautilus
Article
How Uncertainty Can Help Fight Science Denialism
Jun 26, 2013
3 min read
A Little Bit of Science Knowledge Is a Dangerous Thing
Nautilus
Article
A Little Bit of Science Knowledge Is a Dangerous Thing
Sep 27, 2023
The world brims with wonders and puzzles and pressing concerns. Scientists of all stripes seek these out, following their curiosity where it leads. The people who read about their discoveries increase their knowledge—about sea-level rise, the consequ
3 min read
What No One Understands About Your Job
The Atlantic
Article
What No One Understands About Your Job
Oct 5, 2022
22 min read
Opinion: Bots Started Sabotaging My Online Research. I Fought Back
STAT
Article
Opinion: Bots Started Sabotaging My Online Research. I Fought Back
Nov 21, 2019
I launched an online study aimed at understanding what influences eating behaviors and eating disorders among individuals who identify as in LGBTQ+. Little did I know I was actually launching…
4 min read
Three Ways to Tell If Research Is Bunk
The Atlantic
Article
Three Ways to Tell If Research Is Bunk
Nov 30, 2023
5 min read
Emergency Communications Special: emergency communications
CQ Amateur Radio
Article
Emergency Communications Special: emergency communications
Oct 1, 2022
7 min read
The Next Frontier in Healthcare Innovation
Rotman Management
Article
The Next Frontier in Healthcare Innovation
Sep 1, 2019
9 min read

Related categories

Skip carousel

Reviews for Statistical Implications of Turing's Formula

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Statistical Implications of Turing's Formula - Zhiyi Zhang

To my family and all my teachers

Preface

This book introduces readers to Turing's formula and then re-examines several core statistical issues of modern data science from Turing's perspective. Turing's formula was a remarkable invention of Alan Turing during World War II in an early attempt to decode the German enigmas. The formula looks at the world of randomness through a unique and powerful binary perspective – unmistakably of Turing. However, Turing's formula was not well understood for many years. Research amassed during the last decade has brought to light profound and new statistical implications of the formula that were previously not known. Recently, and only recently, a relatively clear and systematic description of Turing's formula, with its statistical properties and implications, has become possible. Hence this book.

Turing's formula is often perceived as having a mystical quality. I was awestruck when I first learned of the formula 10 years ago. Its anti-intuitive implication was simply beyond my immediate grasp. However, I was not along in this regard. After turning it over in my mind for a while, I mentioned to two of my colleagues, both seasoned mathematicians, that there might be a way to give a nonparametric characterization to tail probability of a random variable beyond data range. To that, their immediate reaction was, tell us more when you have figured it out. Some years later, a former doctoral student of mine said to me, I used to refuse to think about anti-intuitive mathematical statements, but after Turing's formula, I would think about a statement at least twice however anti-intuitive it may sound. Still another colleague of mine recently said to me, I read everything you wrote on the subject, including details of the proofs. But I still cannot see intuitively why the formula works. To that, I responded with the following two points:

Our intuition is a bounded mental box within which we conduct intellectual exercises with relative ease and comfort, but we must admit that this box also reflects the limitations of our experience, knowledge, and ability to reason.

If a fact known to be true does not fit into one's current box of intuition, is it not time to expand the boundary of the box to accommodate the true fact?

My personal journey in learning about Turing's formula has proved to be a rewarding one. The experience of observing Turing's formula totally outside of my box of intuition initially and then having it gradually snuggled well within the boundary of my new box of intuition is one I wish to share.

Turing's formula itself, while extraordinary in many ways, is not the only reason for this book. Statistical science, since R.A. Fisher, has come a long way and continues to evolve. In fact, the frontier of Statistics has largely moved on to the realm of nonparametrics. The last few decades have witnessed great advances in the theory and practice of nonparametric statistics. However in this realm, a seemingly impenetrable wall exists: how could one possibly make inference about the tail of a distribution beyond data range? In front of this wall, many, if not most, are discouraged by their intuition from exploring further. Yet it is often said in Statistics that it is all in the tail. Statistics needs a trail to get to the other side of the wall. Turing's formula blazes a trail, and this book attempts to mark that trail.

Turing's formula is relevant to many key issues in modern data sciences, for example, Big Data. Big Data, though as of yet not a field of study with a clearly defined boundary, unambiguously points to a data space that is a quantum leap away from what is imaginable in the realm of classical statistics in terms of data volume, data structure, and data complexity. Big Data, however defined, issues fundamental challenges to Statistics. To begin, the task of retrieving and analyzing data in a vastly complex data space must be in large part delegated to a machine (or software), hence the term Machine Learning. How does a machine learn and make judgment? At the very core, it all boils down to a general measure of association between two observable random elements (not necessarily random variables). At least two fundamental issues immediately present themselves:

High Dimensionality. The complexity of the data space suggests that a data observation can only be appropriately registered in a very high-dimensional space, so much so that the dimensionality could be essentially infinite. Quickly, the usual statistical methodologies run into fundamental conceptual problems.

Discrete and Non-ordinal Nature. The generality of the data space suggests that possible data values may not have a natural order among themselves: different gene types in the human genome, different words in text, and different species in an ecological population are all examples of general data spaces without a natural neighborhood concept.

Such issues would force a fundamental transition from the platform of random variables (on the real line) to the platform of random elements (on a general set or an alphabet). On such an alphabet, many familiar and fundamental concepts ofStatistics and Probability no longer exist, for example, moments, correlation, tail, and so on. It would seem that Statistics is in need of a rebirth to tackle these issues.

The rebirth has been taking place in Information Theory. Its founding father, Claude Shannon, defined two conceptual building blocks: entropy (in place of moments) and mutual information (in place of correlation) in his landmark paper (Shannon, (1948). Just as important as estimating moments and coefficient of correlation for random variables, entropy and mutual information must be estimated for random elements in practice. However, estimation of entropy and estimation of mutual information are technically difficult problems due to the curse of "High Dimensionality and Discrete and Non-ordinal Nature." For about 50 years since (Shannon, (1948), advances in this arena have been slow to come. In recent years however, research interest, propelled by the rapidly increasing level of data complexity, has been reinvigorated and, at the same time, has been splintered into many different perspectives. One in particular is Turing's perspective, which has brought about significant and qualitative improvement to these difficult problems. This book presents an overview of the key results and updates the frontier in this research space.

The powerful utility of Turing's perspective can also be seen in many other areas. One increasingly important modern concept is Diversity. The topics of what it is and how to estimate it are rapidly moving into rigorous mathematical treatment. Scientists have passionately argued about them for years but largely without consensus. Turing's perspective gives some very interesting answers to these questions. This book gives a unified discussion of diversity indices, hence making good reading for those who are interested in diversity indices and their estimation. The final two chapters of the book speak to the issues of tail classification and, if classified, how to perform a refined analysis for a parametric tail model via Turing's perspective. These issues are scientifically relevant in many fields of study.

I intend this book to serve two groups of readers:

Textbook for graduate students. The material is suitable for a topic course at the graduate level for students in Mathematics, Probability, Statistics, Computer Science (Artificial Intelligence, Machine Learning, Big Data), and Information Theory.

Reference book for researchers and practitioners. This book offers an informative presentation of many of the critical statistical issues of modern data science and with updated new results. Both researchers and practitioners will find this book a good learning resource and enjoy the many relevant methodologies and formulas given and explained under one cover.

For a better flow of the presentation, some of the lengthy but instructive proofs are placed at the end of each chapter.

The seven chapters of this book may be naturally organized into three groups. Group 1 includes Chapters 1 and 2. Chapter 1 gives an introduction to Turing's formula; and Chapter 2 translates Turing's formula into a particular perspective (referred to as Turing's perspective) as embodied in a class of indices (referred to as Generalized Simpson's Indices). Group 1 may be considered as the theoretical foundation of the whole book. Group 2 includes Chapters 3–5. Chapter 3 takes Turing's perspective into entropy estimation, Chapter 4 takes it into diversity estimation, and Chapter 5 takes it into estimation of various information indices. Group 2 may be thought of as consisting of applications of Turing's perspective. Chapters 6 and 7 make up Group 3. Chapter 6 discusses the notion of tail on alphabets and offers a classification of probability distributions. Chapter 7 offers an application of Turing's formula in estimating parametric tails of random variables. Group 3 may be considered as a pathway to further research.

The material in this book is relatively new. In writing the book, I have made an effort to let the book, as well as its chapters, be self-contained. On the one hand, I wanted the material of the book to flow in a linearly coherent manner for students learning it for the first time. In this regard, readers may experience a certain degree of repetitiveness in notation definitions, lemmas, and even proofs across chapters. On the other hand, I wanted the book to go beyond merely stating established results and referring to proofs published elsewhere. Many of the mathematical results in the book have instructive value, and their proofs indicate the depth of the results. For this reason, I have included many proofs that might be judged overly lengthy and technical in a conventional textbook, mostly in the appendices.

It is important to note that this book, as the title suggests, is essentially a monograph on Turing's formula. It is not meant to be a comprehensive learning resource on topics such as estimation of entropy, estimation of diversity, or estimation of information. Consequently, many worthy methodologies in these topics have not been included. By no means do I suggest that the methodologies discussed in this book are the only ones with scientific merit. Far from it, there are many wonderful ideas proposed in the existing literature but not mentioned among the pages of this book, and assuredly many more are yet to come.

I wish to extend my heartfelt gratitude to those who have so kindly allowed me to bend their ears over the years. In particular, I wish to thank Hongwei Huang, Stas Molchanov, and Michael Grabchak for countless discussions on Turing's formula and related topics; my students, Chen Chen, Li Liu, Ann Stewart, and Jialin Zhang for picking out numerous errors in an earlier draft of the book; and The University of North Carolina at Charlotte for granting me a sabbatical leave in Spring 2015, which allowed me to bring this book to a complete draft. Most importantly, I wish to thank my family, wife Carol, daughter Katherine, and son Derek, without whose love and unwavering support this book would not have been possible.

Zhiyi Zhang

Charlotte, North Carolina

May 2016

Chapter 1

Turing's Formula

Consider the population of all birds in the world along with all its different species, say c01-math-0001 , and denote the corresponding proportion distribution by c01-math-0002 where c01-math-0003 is the proportion of the c01-math-0004 th bird species in the population. Suppose a random sample of c01-math-0005 is to be taken from the population, and let the bird counts for the different species be denoted by c01-math-0006 . If it is of interest to estimate c01-math-0007 the proportion of birds of species c01-math-0008 in the population, then c01-math-0009 is an excellent estimator; and similarly so is c01-math-0010 for c01-math-0011 for every particular c01-math-0012 .

To illustrate, consider a hypothetical sample of size c01-math-0013 with bird counts given in Table 1.1 or a version rearranged in decreasing order of the observed frequencies as in Table 1.2.

Table 1.1 Bird sample

Table 1.2 Rearranged bird sample

With this sample, one would likely estimate c01-math-0030 by c01-math-0031 and c01-math-0032 by c01-math-0033 , and so on.

The total number of bird species observed in this sample is 30. Yet it is clear that the bird population must have more than just 30 different species. A natural follow-up question would then be as follows:

What is the total population proportion of birds belonging to species other than those observed in the sample?

The follow-up question implies a statistical problem of estimation with a target, or estimand, being the collective proportion of birds of the species not represented in the sample. For convenience, let this target be denoted as c01-math-0034 . It is important to note that c01-math-0035 is a random quantity depending on the sample and therefore is not an estimand in the usual statistical sense. In the statistics literature, c01-math-0036 is often referred to as the sample coverage of the population, or in short sample coverage , or just coverage. Naturally c01-math-0037 may be referred to as the noncoverage.

The noncoverage c01-math-0038 defined with a random sample of size c01-math-0039 is an interesting quantity. It is sometimes interpreted as the probability of discovering a new species, because, in a loose sense, the chance that the next bird is of a new, or previously unobserved, species is c01-math-0040 . This interpretation is however somewhat misleading. The main issue of such an interpretation is the lack of clarification of the underlying experiment (and its sample space). Words such as probability and next can only have meaning in a well-specified experiment. While it is quite remarkable that c01-math-0041 could be reasonably and nonparametrically estimated by Turing's formula, c01-math-0042 is not a probability associated with the sample space of the experiment where the sample of size c01-math-0043 is drawn. Further discussion of this point is given in Section 1.5.

Turing's formula, also sometimes known as the Good–Turing formula, is an estimator of c01-math-0044 introduced by Good (1953) but largely credited to Alan Turing. Let c01-math-0045 denote the number of species each of which is represented by exactly one observation in a random sample of size c01-math-0046 . Turing's formula is given by c01-math-0047 . For the bird example given in Table 1.2, c01-math-0048 , c01-math-0049 , and therefore c01-math-0050 or c01-math-0051 .

1.1 Turing's Formula

Let c01-math-0052 be a countable alphabet with letters c01-math-0053 ; and let c01-math-0054 be a probability distribution on c01-math-0055 . Let c01-math-0056 be the observed frequencies of the letters in an identically and independently distributed ( c01-math-0057 ) random sample of size c01-math-0058 . For any integer c01-math-0059 , c01-math-0060 , let the number of letters in the alphabet that are each represented exactly c01-math-0061 times in the sample be denoted by

1.1 equation

where c01-math-0063 is the indicator function. Let the total probability associated with letters that are represented exactly c01-math-0064 times in the sample be denoted by

1.2 equation

Of special interest is the case of c01-math-0066 when

equation

and

1.3 equation

representing, respectively,

c01-math-0069 : the number of letters of c01-math-0070 that each appears exactly once; and

c01-math-0071 : the total probability associated with the unobserved letters of c01-math-0072 in an c01-math-0073 sample of size c01-math-0074 .

The following expression is known as Turing's formula:

1.4 equation

Turing's formula has been extensively studied during the many years following its introduction by Good (1953) and has been demonstrated, mostly through numerical simulations, to provide a satisfactory estimate of c01-math-0076 for a wide range of distributions. These studies put forth a remarkable yet puzzling implication: the total probability on the unobserved subset of c01-math-0077 may be well estimated nonparametrically. No satisfactory interpretation was given in the literature until Robbins (1968).

Robbins' Claim

Let c01-math-0078 be defined with an c01-math-0079 random sample of size c01-math-0080 as in (1.3). Let Turing's formula be defined with an augmented c01-math-0081 sample of size c01-math-0082 by adding one new c01-math-0083 observation to the original sample of size c01-math-0084 ; and let the resulting estimator be denoted by c01-math-0085 . Then c01-math-0086 is an unbiased estimator of c01-math-0087 , in the sense of c01-math-0088 .

Robbins' claim is easily verified. Let c01-math-0089 be the number of letters represented exactly once in the augmented sample of size c01-math-0090 , with observed frequencies c01-math-0091 .

equation

On the other hand, with the sample of size c01-math-0093 ,

equation

Hence, c01-math-0095 .

Robbins' claim provides an intuitive interpretation in the sense that

c01-math-0096 is an unbiased and, therefore, a good estimator of c01-math-0097 ;

the difference between c01-math-0098 and c01-math-0099 should be small; and therefore

c01-math-0100 should be a good estimator of c01-math-0101 .

However, Robbins' claim still leaves much to be desired. Suppose for a moment that

Enjoying the preview?

Page 1 of 1

Statistical Implications of Turing's Formula

About this ebook

Zhiyi Zhang

Related authors

Related to Statistical Implications of Turing's Formula

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Statistical Implications of Turing's Formula

What did you think?

Book preview

Statistical Implications of Turing's Formula - Zhiyi Zhang

Preface

1.1 Turing's Formula

Robbins' Claim