Information Geometry and Its Applications

Ebook852 pages4 hours

Information Geometry and Its Applications

Name: Information Geometry and Its Applications
Author: Shun-ichi Amari
ISBN: 9784431559788

By Shun-ichi Amari

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This is the first comprehensive book on information geometry, written by the founder of the field. It begins with an elementary introduction to dualistic geometry and proceeds to a wide range of applications, covering information science, engineering, and neuroscience. It consists of four parts, which on the whole can be read independently. A manifold with a divergence function is first introduced, leading directly to dualistic structure, the heart of information geometry. This part (Part I) can be apprehended without any knowledge of differential geometry. An intuitive explanation of modern differential geometry then follows in Part II, although the book is for the most part understandable without modern differential geometry. Information geometry of statistical inference, including time series analysis and semiparametric estimation (the Neyman–Scott problem), is demonstrated concisely in Part III. Applications addressed in Part IV include hot current topics in machine learning,signal processing, optimization, and neural networks. The book is interdisciplinary, connecting mathematics, information sciences, physics, and neurosciences, inviting readers to a new world of information and geometry. This book is highly recommended to graduate students and researchers who seek new mathematical methods and tools useful in their own fields.

Skip carousel

Mathematics

LanguageEnglish

PublisherSpringer

Release dateFeb 2, 2016

ISBN9784431559788

Author

Shun-ichi Amari

Related authors

Skip carousel

Related to Information Geometry and Its Applications

Titles in the series (1)

Skip carousel

Information Geometry and Its Applications
Ebook
Information Geometry and Its Applications
byShun-ichi Amari
Rating: 0 out of 5 stars
0 ratings

Related ebooks

Skip carousel

I: Functional Analysis
Ebook
I: Functional Analysis
byMichael Reed
Rating: 4 out of 5 stars
4/5
Deep Learning and Physics
Ebook
Deep Learning and Physics
byAkinori Tanaka
Rating: 0 out of 5 stars
0 ratings
Mathematical Aspects of Scheduling and Applications: Modern Applied Mathematics and Computer Science
Ebook
Mathematical Aspects of Scheduling and Applications: Modern Applied Mathematics and Computer Science
byR. Bellman
Rating: 0 out of 5 stars
0 ratings
How We Understand Mathematics: Conceptual Integration in the Language of Mathematical Description
Ebook
How We Understand Mathematics: Conceptual Integration in the Language of Mathematical Description
byJacek Woźny
Rating: 0 out of 5 stars
0 ratings
The Calabi–Yau Landscape: From Geometry, to Physics, to Machine Learning
Ebook
The Calabi–Yau Landscape: From Geometry, to Physics, to Machine Learning
byYang-Hui He
Rating: 0 out of 5 stars
0 ratings
Concept Mapping in Mathematics: Research into Practice
Ebook
Concept Mapping in Mathematics: Research into Practice
byKaroline Afamasaga-Fuata'i
Rating: 0 out of 5 stars
0 ratings
Detection, Estimation, and Modulation Theory, Part III: Radar-Sonar Signal Processing and Gaussian Signals in Noise
Ebook
Detection, Estimation, and Modulation Theory, Part III: Radar-Sonar Signal Processing and Gaussian Signals in Noise
byHarry L. Van Trees
Rating: 0 out of 5 stars
0 ratings
Splines and Variational Methods
Ebook
Splines and Variational Methods
byP. M. Prenter
Rating: 5 out of 5 stars
5/5
Mathematical Methods of Statistics (PMS-9), Volume 9
Ebook
Mathematical Methods of Statistics (PMS-9), Volume 9
byHarald Cramér
Rating: 3 out of 5 stars
3/5
Mathematical Knowledge and the Interplay of Practices
Ebook
Mathematical Knowledge and the Interplay of Practices
byJosé Ferreirós
Rating: 0 out of 5 stars
0 ratings
Information Theory: Coding Theorems for Discrete Memoryless Systems
Ebook
Information Theory: Coding Theorems for Discrete Memoryless Systems
byImre Csiszár
Rating: 5 out of 5 stars
5/5
Computability, Complexity, Logic
Ebook
Computability, Complexity, Logic
byE. Börger
Rating: 0 out of 5 stars
0 ratings
Modern Mathematics for the Engineer: Second Series
Ebook
Modern Mathematics for the Engineer: Second Series
byMagnus R. Hestenes
Rating: 0 out of 5 stars
0 ratings
Real Analysis with an Introduction to Wavelets and Applications
Ebook
Real Analysis with an Introduction to Wavelets and Applications
byDon Hong
Rating: 5 out of 5 stars
5/5
III: Scattering Theory
Ebook
III: Scattering Theory
byMichael Reed
Rating: 0 out of 5 stars
0 ratings
Signal Processing in Electronic Communications: For Engineers and Mathematicians
Ebook
Signal Processing in Electronic Communications: For Engineers and Mathematicians
byM J Chapman
Rating: 0 out of 5 stars
0 ratings
Geophysical Data Analysis: Discrete Inverse Theory: MATLAB Edition
Ebook
Geophysical Data Analysis: Discrete Inverse Theory: MATLAB Edition
byWilliam Menke
Rating: 3 out of 5 stars
3/5
Nonlinearity and Functional Analysis: Lectures on Nonlinear Problems in Mathematical Analysis
Ebook
Nonlinearity and Functional Analysis: Lectures on Nonlinear Problems in Mathematical Analysis
byMelvyn S. Berger
Rating: 0 out of 5 stars
0 ratings
Spontaneous Phenomena: A Mathematical Analysis
Ebook
Spontaneous Phenomena: A Mathematical Analysis
byFlemming Topsoe
Rating: 0 out of 5 stars
0 ratings
Numerical Methods for Stochastic Computations: A Spectral Method Approach
Ebook
Numerical Methods for Stochastic Computations: A Spectral Method Approach
byDongbin Xiu
Rating: 5 out of 5 stars
5/5
Statistics for Physical Sciences: An Introduction
Ebook
Statistics for Physical Sciences: An Introduction
byBrian Martin
Rating: 0 out of 5 stars
0 ratings
Stability of Parallel Flows
Ebook
Stability of Parallel Flows
byR. Betchov
Rating: 0 out of 5 stars
0 ratings
Modeling Information Diffusion in Online Social Networks with Partial Differential Equations
Ebook
Modeling Information Diffusion in Online Social Networks with Partial Differential Equations
byHaiyan Wang
Rating: 0 out of 5 stars
0 ratings
An Introduction to Information Theory
Ebook
An Introduction to Information Theory
byFazlollah M. Reza
Rating: 0 out of 5 stars
0 ratings
Popular Lectures on Mathematical Logic
Ebook
Popular Lectures on Mathematical Logic
byHao Wang
Rating: 0 out of 5 stars
0 ratings
The Traveling Salesman Problem: A Computational Study
Ebook
The Traveling Salesman Problem: A Computational Study
byDavid L. Applegate
Rating: 5 out of 5 stars
5/5
Essential Computational Modeling in Chemistry
Ebook
Essential Computational Modeling in Chemistry
byPhilippe G. Ciarlet
Rating: 0 out of 5 stars
0 ratings
Discovering Wavelets
Ebook
Discovering Wavelets
byEdward Aboufadel
Rating: 0 out of 5 stars
0 ratings
Concepts of Probability Theory: Second Revised Edition
Ebook
Concepts of Probability Theory: Second Revised Edition
byPaul E. Pfeiffer
Rating: 3 out of 5 stars
3/5
Differential Forms with Applications to the Physical Sciences
Ebook
Differential Forms with Applications to the Physical Sciences
byHarley Flanders
Rating: 5 out of 5 stars
5/5

Mathematics For You

Skip carousel

Introducing Game Theory: A Graphic Guide
Ebook
Introducing Game Theory: A Graphic Guide
byIvan Pastine
Rating: 4 out of 5 stars
4/5
Basic Math & Pre-Algebra For Dummies
Ebook
Basic Math & Pre-Algebra For Dummies
byMark Zegarelli
Rating: 4 out of 5 stars
4/5
Calculus For Dummies
Ebook
Calculus For Dummies
byMark Ryan
Rating: 4 out of 5 stars
4/5
Algebra - The Very Basics
Ebook
Algebra - The Very Basics
byMetin Bektas
Rating: 5 out of 5 stars
5/5
Geometry For Dummies
Ebook
Geometry For Dummies
byMark Ryan
Rating: 5 out of 5 stars
5/5
Basic Math Notes
Ebook
Basic Math Notes
byErnest Bywater
Rating: 5 out of 5 stars
5/5
Quantum Physics for Beginners
Ebook
Quantum Physics for Beginners
byMax Thomson
Rating: 4 out of 5 stars
4/5
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
Ebook
Statistics 101: From Data Analysis and Predictive Modeling to Measuring Distribution and Determining Probability, Your Essential Guide to Statistics
byDavid Borman
Rating: 4 out of 5 stars
4/5
Game Theory: A Simple Introduction
Ebook
Game Theory: A Simple Introduction
byK.H. Erickson
Rating: 4 out of 5 stars
4/5
My Best Mathematical and Logic Puzzles
Ebook
My Best Mathematical and Logic Puzzles
byMartin Gardner
Rating: 5 out of 5 stars
5/5
Algebra I Workbook For Dummies
Ebook
Algebra I Workbook For Dummies
byMary Jane Sterling
Rating: 3 out of 5 stars
3/5
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
Ebook
The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English!
byChristopher Monahan
Rating: 4 out of 5 stars
4/5
Mental Math Secrets - How To Be a Human Calculator
Ebook
Mental Math Secrets - How To Be a Human Calculator
byRandy Silverman
Rating: 5 out of 5 stars
5/5
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
Ebook
Build a Mathematical Mind - Even If You Think You Can't Have One: Become a Pattern Detective. Boost Your Critical and Logical Thinking Skills.
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
Ebook
The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need
byChristopher Monahan
Rating: 5 out of 5 stars
5/5
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
Ebook
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
byS. Deviant
Rating: 4 out of 5 stars
4/5
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
Ebook
See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head
byEditors of Portable Press
Rating: 4 out of 5 stars
4/5
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
Ebook
Mathematical Thinking - For People Who Hate Math: Level Up Your Analytical and Creative Thinking Skills. Excel at Problem-Solving and Decision-Making.
byAlbert Rutherford
Rating: 3 out of 5 stars
3/5
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
Ebook
The Everything Guide to Pre-Algebra: A Helpful Practice Guide Through the Pre-Algebra Basics - in Plain English!
byJane Cassie
Rating: 5 out of 5 stars
5/5
Calculus Made Easy
Ebook
Calculus Made Easy
bySilvanus P. Thompson
Rating: 4 out of 5 stars
4/5
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
Ebook
The Elements of Euclid for the Use of Schools and Colleges (Illustrated)
byISAAC TODHUNTER
Rating: 0 out of 5 stars
0 ratings
The Golden Ratio: The Divine Beauty of Mathematics
Ebook
The Golden Ratio: The Divine Beauty of Mathematics
byGary B. Meisner
Rating: 5 out of 5 stars
5/5
Is God a Mathematician?
Ebook
Is God a Mathematician?
byMario Livio
Rating: 4 out of 5 stars
4/5
ACT Math & Science Prep: Includes 500+ Practice Questions
Ebook
ACT Math & Science Prep: Includes 500+ Practice Questions
byKaplan Test Prep
Rating: 3 out of 5 stars
3/5
The Thirteen Books of the Elements, Vol. 1
Ebook
The Thirteen Books of the Elements, Vol. 1
byEuclid
Rating: 0 out of 5 stars
0 ratings
Relativity: The special and the general theory
Ebook
Relativity: The special and the general theory
byAlbert Einstein
Rating: 5 out of 5 stars
5/5
A Mind for Numbers | Summary
Ebook
A Mind for Numbers | Summary
bySummary Station
Rating: 4 out of 5 stars
4/5
GED® Math Test Tutor, 2nd Edition
Ebook
GED® Math Test Tutor, 2nd Edition
bySandra Rush
Rating: 0 out of 5 stars
0 ratings
Things to Make and Do in the Fourth Dimension: A Mathematician's Journey Through Narcissistic Numbers, Optimal Dating Algorithms, at Least Two Kinds of Infinity, and More
Ebook
Things to Make and Do in the Fourth Dimension: A Mathematician's Journey Through Narcissistic Numbers, Optimal Dating Algorithms, at Least Two Kinds of Infinity, and More
byMatt Parker
Rating: 4 out of 5 stars
4/5
Algebra I For Dummies
Ebook
Algebra I For Dummies
byMary Jane Sterling
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Dialogues between Mathematics and Physics: Yang-Hui He interviewed by Daniel Wang
Podcast episode
Dialogues between Mathematics and Physics: Yang-Hui He interviewed by Daniel Wang
byThe Springer Math Podcast
0 ratings
0% found this document useful
Dynamical Sampling: Modellansatz 173
Podcast episode
Dynamical Sampling: Modellansatz 173
byModellansatz - English episodes only
0 ratings
0% found this document useful
Dynamical Sampling
Podcast episode
Dynamical Sampling
byModellansatz
0 ratings
0% found this document useful
Cerebral Fluid Flow: Modellansatz 134
Podcast episode
Cerebral Fluid Flow: Modellansatz 134
byModellansatz - English episodes only
0 ratings
0% found this document useful
Episode 9: Bjorn Stevens: Bjorn Stevens’ main scientific interest is in the role of clouds in the climate system. He established himself early in his career as a leader in the study of marine stratus-topped boundary layers. That eventually led him to a broader climate research ...
Podcast episode
Episode 9: Bjorn Stevens: Bjorn Stevens’ main scientific interest is in the role of clouds in the climate system. He established himself early in his career as a leader in the study of marine stratus-topped boundary layers. That eventually led him to a broader climate research ...
byDeep Convection
0 ratings
0% found this document useful
Geometric Statistics in Machine Learning w/ geomstats with Nina Miolane - TWiML Talk #196: In this episode we’re joined by Nina Miolane, researcher and lecturer at Stanford University. Nina and I recently spoke about her work in the field of geometric statistics in machine learning. Specifically, we discuss the application of Riemannian...
Podcast episode
Geometric Statistics in Machine Learning w/ geomstats with Nina Miolane - TWiML Talk #196: In this episode we’re joined by Nina Miolane, researcher and lecturer at Stanford University. Nina and I recently spoke about her work in the field of geometric statistics in machine learning. Specifically, we discuss the application of Riemannian...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Telescope topology: n this podcast we bring you breaking news from the world of topology! Four mathematicians, all in earlier stages of their career, have resolved the long-standing telescope conjecture which explores holes in spheres – of any dimension! The result was...
Podcast episode
Telescope topology: n this podcast we bring you breaking news from the world of topology! Four mathematicians, all in earlier stages of their career, have resolved the long-standing telescope conjecture which explores holes in spheres – of any dimension! The result was...
byMaths on the Move
0 ratings
0% found this document useful
Math's Paths (rebroadcast): If you bake, you can appreciate math’s transformative properties. Admiring the stackable potato chip is to admire a hyperbolic sheet. Find out why there’s no need to fear math - you just need to think outside the cuboid. Also,...
Podcast episode
Math's Paths (rebroadcast): If you bake, you can appreciate math’s transformative properties. Admiring the stackable potato chip is to admire a hyperbolic sheet. Find out why there’s no need to fear math - you just need to think outside the cuboid. Also,...
byBig Picture Science
0 ratings
0% found this document useful
Math's Paths: If you bake, you can appreciate math’s transformative properties. Admiring the stackable potato chip is to admire a hyperbolic sheet. Find out why there’s no need to fear math - you just need to think outside the cuboid. Also,...
Podcast episode
Math's Paths: If you bake, you can appreciate math’s transformative properties. Admiring the stackable potato chip is to admire a hyperbolic sheet. Find out why there’s no need to fear math - you just need to think outside the cuboid. Also,...
byBig Picture Science
0 ratings
0% found this document useful
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020): An interview with Vicky Neale
Podcast episode
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020): An interview with Vicky Neale
byNew Books in Mathematics
0 ratings
0% found this document useful
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020)
Podcast episode
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020)
byNew Books in Education
0 ratings
0% found this document useful
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020): An interview with Vicky Neale
Podcast episode
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020): An interview with Vicky Neale
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020): An interview with Vicky Neale
Podcast episode
Vicky Neale, "Why Study Mathematics?" (London Publishing Partnership, 2020): An interview with Vicky Neale
byNew Books in Science
0 ratings
0% found this document useful
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
Podcast episode
B. Fong and D. I. Spivak, "An Invitation to Applied Category Theory: Seven Sketches in Compositionality" (Cambridge UP, 2019): Fong and Spivak have written a marvelous and timely new textbook that, as its title suggests, invites readers of all backgrounds to explore what it means to take a compositional approach and how it might serve their needs....
byNew Books in Mathematics
0 ratings
0% found this document useful
Singular Pertubation: Modellansatz 162
Podcast episode
Singular Pertubation: Modellansatz 162
byModellansatz - English episodes only
0 ratings
0% found this document useful
Lance Fortnow
Podcast episode
Lance Fortnow
bySmart People Podcast
0 ratings
0% found this document useful
Paul Nahin, "Hot Molecules, Cold Electrons" (Princeton UP, 2020): Nahin offers a thorough study of the history and mathematics of the heat equation, which is not only important as an analysis of heat, its analysis marked the beginning of Fourier series...
Podcast episode
Paul Nahin, "Hot Molecules, Cold Electrons" (Princeton UP, 2020): Nahin offers a thorough study of the history and mathematics of the heat equation, which is not only important as an analysis of heat, its analysis marked the beginning of Fourier series...
byNew Books in Mathematics
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in Mathematics
0 ratings
0% found this document useful
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
Podcast episode
Thomas Huckle and Tobias Neckel, "Bits and Bugs: A Scientific and Historical Review of Software Failures in Computational Science" (SIAM, 2019): An interview with Thomas Huckle and Tobias Neckel
byNew Books in the History of Science
0 ratings
0% found this document useful
Anna Weltman, "Supermath: The Power of Numbers for Good and Evil" (Johns Hopkins UP, 2020): Weltman surveys a number of ways this conception of mathematics has informed scientific undertakings and public policies, not to mention our everyday behaviors, and makes a powerful case for reevaluating its assumptions...
Podcast episode
Anna Weltman, "Supermath: The Power of Numbers for Good and Evil" (Johns Hopkins UP, 2020): Weltman surveys a number of ways this conception of mathematics has informed scientific undertakings and public policies, not to mention our everyday behaviors, and makes a powerful case for reevaluating its assumptions...
byNew Books in Science
0 ratings
0% found this document useful
Thoughtful Advice for STEAM/STEM Students
Podcast episode
Thoughtful Advice for STEAM/STEM Students
byEWN - Engineering With Nature
0 ratings
0% found this document useful
Anna Weltman, "Supermath: The Power of Numbers for Good and Evil" (Johns Hopkins UP, 2020): Weltman surveys a number of ways this conception of mathematics has informed scientific undertakings and public policies, not to mention our everyday behaviors, and makes a powerful case for reevaluating its assumptions...
Podcast episode
Anna Weltman, "Supermath: The Power of Numbers for Good and Evil" (Johns Hopkins UP, 2020): Weltman surveys a number of ways this conception of mathematics has informed scientific undertakings and public policies, not to mention our everyday behaviors, and makes a powerful case for reevaluating its assumptions...
byNew Books in Mathematics
0 ratings
0% found this document useful
Nanophotonics: Modellansatz 066
Podcast episode
Nanophotonics: Modellansatz 066
byModellansatz - English episodes only
0 ratings
0% found this document useful
Episode 8 - The Future of Particle Physics: Dale talks with physicist Don Lincoln of the DZero detector experiment at Fermilab National Accelerator Laboratories.
Podcast episode
Episode 8 - The Future of Particle Physics: Dale talks with physicist Don Lincoln of the DZero detector experiment at Fermilab National Accelerator Laboratories.
byLab Out Loud
0 ratings
0% found this document useful
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
Podcast episode
007 Prof. Kristin Persson of the Materials Project – Building a Global Materials Informatics Platform: Summary: This episode focuses on Prof. Kristin Persson’s work directing the Materials Project, where she had her group have built an open-source materials informatics platform that reaches over 75,000 users worldwide. In this episode,...
byDataLab: The Materials Informatics Podcast
0 ratings
0% found this document useful
Waves: Modellansatz 054
Podcast episode
Waves: Modellansatz 054
byModellansatz - English episodes only
0 ratings
0% found this document useful
Spectral Geometry: Modellansatz 247
Podcast episode
Spectral Geometry: Modellansatz 247
byModellansatz - English episodes only
0 ratings
0% found this document useful
Convolution Quadrature: Modellansatz 133
Podcast episode
Convolution Quadrature: Modellansatz 133
byModellansatz - English episodes only
0 ratings
0% found this document useful
A useful method for obtaining alternative formulations of the analytical hierarchy: Colloquium Mathematical Philosophy
Podcast episode
A useful method for obtaining alternative formulations of the analytical hierarchy: Colloquium Mathematical Philosophy
byMCMP – Philosophy of Mathematics
0 ratings
0% found this document useful

Skip carousel

Graced With Knowledge, Mathematicians Seek to Understand
Quanta
Article
Graced With Knowledge, Mathematicians Seek to Understand
Apr 8, 2020
5 min read
Are Neural Networks About to Reinvent Physics?
Nautilus
Article
Are Neural Networks About to Reinvent Physics?
Nov 21, 2019
Can AI teach itself the laws of physics? Will classical computers soon be replaced by deep neural networks? Sure looks like it, if you’ve been following the news, which lately has been filled with headlines like, “A neural net solves the three-body p
9 min read
The Twin Prime Hero: Rags, riches, and fame in mathematics.
Nautilus
Article
The Twin Prime Hero: Rags, riches, and fame in mathematics.
Dec 29, 2016
Yitang “Tom” Zhang spent the seven years following the completion of his Ph.D. in mathematics floating between Kentucky and Queens, working for a chain of Subway restaurants, and doing odd accounting work. Now he is on a lecture tour that includes st
8 min read
The Dawn Of Post-theory Science
Guardian Weekly
Article
The Dawn Of Post-theory Science
Jan 14, 2022
5 min read
So You Want To Work In SPACE?
BBC Sky at Night
Article
So You Want To Work In SPACE?
Aug 12, 2021
7 min read
Hungary For Learning
New Zealand Listener
Article
Hungary For Learning
Mar 14, 2021
2 min read
The Physical Process That Powers a New Type of Generative AI
Nautilus
Article
The Physical Process That Powers a New Type of Generative AI
Oct 18, 2023
5 min read
Is It Possible To Know The Evolution Of An Epidemic? the Case Of Covid-19
Frontiers of Science
Article
Is It Possible To Know The Evolution Of An Epidemic? the Case Of Covid-19
Apr 21, 2020
1 min read
The Scientific Paper Is Obsolete
The Atlantic
Article
The Scientific Paper Is Obsolete
Apr 5, 2018
The scientific paper—the actual form of it—was one of the enabling inventions of modernity. Before it was developed in the 1600s, results were communicated privately in letters, ephemerally in lectures, or all at once in books. There was no public fo
18 min read
Commentary: Modern High School Math Should Be About Data Science — Not Algebra 2
Los Angeles Times
Article
Commentary: Modern High School Math Should Be About Data Science — Not Algebra 2
Oct 28, 2019
3 min read
We Got Bits Wrong: Microsoft Retracts Quantum-computing ‘Breakthrough’
Computeractive
Article
We Got Bits Wrong: Microsoft Retracts Quantum-computing ‘Breakthrough’
Mar 24, 2021
A Microsoft-led team of scientists have backtracked on claims they found an elusive subatomic particle that could help develop more powerful computers. In a 2018 paper in the scientific journal Nature, they said they’d found evidence of Majorana part
1 min read
Can Machine Learning Predict The Next Big Disaster?
Futurity
Article
Can Machine Learning Predict The Next Big Disaster?
Jan 3, 2023
3 min read
Math After COVID-19
Quanta
Article
Math After COVID-19
Apr 28, 2020
4 min read
U.S. Mathematician Becomes First Woman To Win Abel Prize, 'Math's Nobel'
NPR
Article
U.S. Mathematician Becomes First Woman To Win Abel Prize, 'Math's Nobel'
Mar 19, 2019
"I find that I am bored with anything I understand," Karen Uhlenbeck once said. That sentiment is part of why she won what many call the Nobel of mathematics Tuesday.
3 min read
The Math That Takes Newton Into the Quantum World
Nautilus
Article
The Math That Takes Newton Into the Quantum World
Feb 28, 2019
In my 50s, too old to become a real expert, I have finally fallen in love with algebraic geometry. As the name suggests, this is the study of geometry using algebra. Around 1637, René Descartes laid the groundwork for this subject by taking a plane,
9 min read
Quantum Mechanics 101
All About Space
Article
Quantum Mechanics 101
Dec 5, 2019
7 min read
In Fermat’s Library, No Margin Is Too Narrow
Nautilus
Article
In Fermat’s Library, No Margin Is Too Narrow
Oct 16, 2017
4 min read
How Our Reality May Be a Sum of All Possible Realities
Nautilus
Article
How Our Reality May Be a Sum of All Possible Realities
Mar 3, 2023
7 min read
Famous Unsolved Math Problem Sees New Progress
Futurity
Article
Famous Unsolved Math Problem Sees New Progress
May 29, 2019
4 min read
Fanning The Flame
Cosmos Magazine
Article
Fanning The Flame
Sep 13, 2023
3 min read
How Spooky Science Helps Us Peer Inside The Planets
All About Space
Article
How Spooky Science Helps Us Peer Inside The Planets
Dec 3, 2020
An assistant professor of computational science at the EPFL research centre in Lausanne, Switzerland, involved in the current research on metallic hydrogen. Could you explain how the machine-learning techniques used in your research work? Why were th
1 min read
2017 In Science: The Most Important Science Story Of The Year?
AQ: Australian Quarterly
Article
2017 In Science: The Most Important Science Story Of The Year?
Sep 30, 2017
7 min read
For National STEM Day, Argonne Lab’s Valerie Taylor Talks About AI, ‘Star Trek’ And Diversity In The Sciences
Chicago Tribune
Article
For National STEM Day, Argonne Lab’s Valerie Taylor Talks About AI, ‘Star Trek’ And Diversity In The Sciences
Nov 9, 2023
5 min read
Why Self-folding Objects Get Stuck
Futurity
Article
Why Self-folding Objects Get Stuck
Jan 4, 2018
Sticking points are simply intrinsic to structures designed to fold themselves, according to new research. Scientists and engineers are fascinated by self-folding structures. Imagine the possibilities: Heart stents that unfold in the right location o
1 min read
In The Age Of Screen Time, Is Paper Dead?
NPR
Article
In The Age Of Screen Time, Is Paper Dead?
Sep 10, 2017
3 min read
Science Is the New Nuclear Deterrent
Nautilus
Article
Science Is the New Nuclear Deterrent
Feb 23, 2024
1 Nuclear weapons research aids basic physics and astronomy research—and vice versa No matter what you’re investigating—the inside of a nuclear weapon, the interior of a giant planet, the core of a star, or the flow from a supernova—physics is physic
5 min read
Why Science Is Essential, Especially Now, For The Development Of Nations
Forbes Africa
Article
Why Science Is Essential, Especially Now, For The Development Of Nations
Jun 1, 2021
ONE OF THE MOST important aspects of development is industrialization. For industrialization to happen, it is important for us to understand the principles of scientific advancements. Scientific advancements have been instrumental in the advancement
3 min read
‘We Need People Power:’ How Visitors To Chicago Museum Are Helping With Scientific Research
Chicago Tribune
Article
‘We Need People Power:’ How Visitors To Chicago Museum Are Helping With Scientific Research
Jul 7, 2022
2 min read
Yes, Everything In Physics Is Completely Made Up. That’s The Whole Point
BBC Science Focus Magazine
Article
Yes, Everything In Physics Is Completely Made Up. That’s The Whole Point
Apr 13, 2023
3 min read
The Stereotypes That Distort How Americans Teach and Learn Math
The Atlantic
Article
The Stereotypes That Distort How Americans Teach and Learn Math
Nov 12, 2013
5 min read

Related categories

Skip carousel

Reviews for Information Geometry and Its Applications

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Information Geometry and Its Applications - Shun-ichi Amari

Part IGeometry of Divergence Functions: Dually Flat Riemannian Structure

S.-i. AmariInformation Geometry and Its ApplicationsApplied Mathematical Sciences194https://doi.org/10.1007/978-4-431-55978-8_1

1. Manifold, Divergence and Dually Flat Structure

Shun-ichi Amari¹

(1)

Brain Science Institute, RIKEN, Wako, Saitama, Japan

Shun-ichi Amari

Email: amari@brain.riken.jp

The original version of this chapter was revised: The incomplete texts have been updated. The correction to this chapter is available at https://doi.org/10.1007/978-4-431-55978-8_14

The present chapter begins with a manifold and a coordinate system within it. Then, a divergence between two points is defined. We use an intuitive style of explanation for manifolds, followed by typical examples. A divergence represents a degree of separation of two points, but it is not a distance since it is not symmetric with respect to the two points. Here is the origin of dually coupled asymmetry, leading us to a dual world. When a divergence is derived from a convex function in the form of the Bregman divergence, two affine structures are induced in the manifold. They are dually coupled via the Legendre transformation. Thus, a convex function provides a manifold with a dually flat affine structure in addition to a Riemannian metric derived from it. The dually flat structure plays a pivotal role in information geometry, as is shown in the generalized Pythagorean theorem. The dually flat structure is a special case of Riemannian geometry equipped with non-flat dual affine connections, which will be studied in Part II.

1.1 Manifolds

1.1.1 Manifold and Coordinate Systems

An n-dimensional manifold M is a set of points such that each point has n-dimensional extensions in its neighborhood. That is, such a neighborhood is topologically equivalent to an n-dimensional Euclidean space. Intuitively speaking, a manifold is a deformed Euclidean space, like a curved surface in the two-dimensional case. But it may have a different global topology. A sphere is an example which is locally equivalent to a two-dimensional Euclidean space, but is curved and has a different global topology because it is compact (bounded and closed).

Since a manifold M is locally equivalent to an n-dimensional Euclidean space $$E_n$$ , we can introduce a local coordinate system

$$\begin{aligned} {\varvec{\xi }} = \left( \xi _1, \ldots , \xi _n \right) \end{aligned}$$

(1.1)

composed of n components $$\xi _1, \ldots , \xi _n$$ such that each point is uniquely specified by its coordinates $${\varvec{\xi }}$$ in a neighborhood. See Fig. 1.1 for the two-dimensional case. Since a manifold may have a topology different from a Euclidean space, in general we need more than one coordinate neighborhood and coordinate system to cover all the points of a manifold.

../images/385161_1_En_1_Chapter/385161_1_En_1_Fig1_HTML.png

Fig. 1.1

Manifold M and coordinate system $$\xi $$ . $$E_2$$ is a two-dimensional Euclidean space

The coordinate system is not unique even in a coordinate neighborhood, and there are many coordinate systems. Let

$${\varvec{\zeta }}= \left( \zeta _1, \ldots , \zeta _n \right) $$

be another coordinate system. When a point $$P \in M$$ is represented in two coordinate systems $${\varvec{\xi }}$$ and $${\varvec{\zeta }}$$ , there is a one-to-one correspondence between them and we have relations

$$\begin{aligned} {\varvec{\xi }}= & {} {\textit{\textbf{f}}} \left( \zeta _1, \ldots , \zeta _n \right) , \end{aligned}$$

(1.2)

$$\begin{aligned} {\varvec{\zeta }}= & {} {\textit{\textbf{f}}}^{-1} \left( \xi _1, \ldots , \xi _n \right) , \end{aligned}$$

(1.3)

where $${\textit{\textbf{f}}}$$ and $${\textit{\textbf{f}}}^{-1}$$ are mutually inverse vector-valued functions. They are a coordinate transformation and its inverse transformation. We usually assume that (1.2) and (1.3) are differentiable functions of n coordinate variables.¹

../images/385161_1_En_1_Chapter/385161_1_En_1_Fig2_HTML.png

Fig. 1.2

Cartesian coordinate system

$${\varvec{\xi }}= \left( \xi _1, \xi _2 \right) $$

and polar coordinate system $$(r, \theta )$$ in $$E_2$$

1.1.2 Examples of Manifolds

A. Euclidean Space

Consider a two-dimensional Euclidean space, which is a flat plane. It is convenient to use an orthonormal Cartesian coordinate system

$${\varvec{\xi }}= \left( \xi _1, \xi _2 \right) $$

. A polar coordinate system $${\varvec{\zeta }}=(r, \theta )$$ is sometimes used, where r is the radius and $$\theta $$ is the angle of a point from one axis (see Fig. 1.2). The coordinate transformation between them is given by

$$\begin{aligned}&r = \sqrt{\xi ^2_1 + \xi ^2_2}, \quad \theta = \tan ^{-1} \left( \frac{\xi _2}{\xi _1}\right) , \end{aligned}$$

(1.4)

$$\begin{aligned}&\xi _1 = r \cos \theta , \quad \xi _2 = r \sin \theta . \end{aligned}$$

(1.5)

The transformation is analytic except for the origin.

B. Sphere

A sphere is the surface of a three-dimensional ball. The surface of the earth is regarded as a sphere, where each point has a two-dimensional neighborhood, so that we can draw a local geographic map on a flat sheet. The pair of latitude and longitude gives a local coordinate system. However, a sphere is topologically different from a Euclidean space and it cannot be covered by one coordinate system. At least two coordinate systems are required to cover it. If we delete one point, say the north pole of the earth, it is topologically equivalent to a Euclidean space. Hence, at least two overlapping coordinate neighborhoods, one including the north pole and the other including the south pole, for example, are necessary and they are sufficient to cover the entire sphere.

C. Manifold of Probability Distributions

C1. Gaussian Distributions

The probability density function of Gaussian random variable x is given by

$$\begin{aligned} p \left( x; \mu , \sigma ^2 \right) = \frac{1}{\sqrt{2 \pi }\sigma } \exp \left\{ -\frac{(x-\mu )^2}{2 \sigma ^2}\right\} , \end{aligned}$$

(1.6)

where $$\mu $$ is the mean and $$\sigma ^2$$ is the variance. Hence, the set of all the Gaussian distributions is a two-dimensional manifold, where a point denotes a probability density function and

$$\begin{aligned} {\varvec{\xi }} = (\mu , \sigma ), \quad \sigma >0 \end{aligned}$$

(1.7)

is a coordinate system. This is topologically equivalent to the upper half of a two-dimensional Euclidean space. The manifold of Gaussian distributions is covered by one coordinate system

$${\varvec{\xi }}= (\mu , \sigma )$$

There are other coordinate systems. For example, let $$m_1$$ and $$m_2$$ be the first and second moments of x, given by

$$\begin{aligned} m_1 = {\text {E}}[x] = \mu , \quad m_2 = {\text {E}} \left[ x^2\right] = \mu ^2+ \sigma ^2, \end{aligned}$$

(1.8)

where $$\text {E}$$ denotes the expectation of a random variable. Then,

$$\begin{aligned} {\varvec{\zeta }} = \left( m_1, m_2 \right) \end{aligned}$$

(1.9)

is a coordinate system (the moment coordinate system).

It will be shown later that the coordinate system defined by $${\varvec{\theta }}$$ ,

$$\begin{aligned} \theta _1 = \frac{\mu }{\sigma ^2}, \quad \theta _2 = -\frac{1}{2 \sigma ^2}, \end{aligned}$$

(1.10)

is referred to as the natural parameters, and is convenient for studying properties of Gaussian distributions.

C2. Discrete Distributions

Let x be a discrete random variable taking values on

$$X= \left\{ 0, 1, \ldots , n \right\} $$

. A probability distribution p(x) is specified by $$n+1$$ probabilities

$$\begin{aligned} p_i={\text{ Prob }}\{x=i\}, \quad i=0, 1, \ldots , n, \end{aligned}$$

(1.11)

so that p(x) is represented by a probability vector

$$\begin{aligned} {\textit{\textbf{p}}}= \left( p_0, p_1, \ldots , p_n \right) . \end{aligned}$$

(1.12)

Because of the restriction

$$\begin{aligned} \sum ^n_{i=0} p_i=1, \quad p_i>0, \end{aligned}$$

(1.13)

the set of all probability distributions $${\textit{\textbf{p}}}$$ forms an n-dimensional manifold. Its coordinate system is given, for example, by

$$\begin{aligned} {\varvec{\xi }} = \left( p_1, \ldots , p_n \right) \end{aligned}$$

(1.14)

and $$p_0$$ is not free but is a function of the coordinates,

$$\begin{aligned} p_0 = 1-\sum \xi _i. \end{aligned}$$

(1.15)

The manifold is an n-dimensional simplex, called the probability simplex, and is denoted by $$S_n$$ . When $$n=2$$ , $$S_2$$ is the interior of a triangle and when $$n=3$$ , it is the interior of a 3-simplex, as is shown in Fig. 1.3.

../images/385161_1_En_1_Chapter/385161_1_En_1_Fig3_HTML.png

Fig. 1.3

Probability simplex: $$S_2$$ and $$S_3$$

Let us introduce $$n+1$$ random variables

$$\delta _i(x), i=0, 1, \ldots , n$$

, such that

$$\begin{aligned} \delta _i(x) = \left\{ \begin{array}{ll} 1, &{} x=i, \\ 0, &{} x \ne i. \end{array} \right. \end{aligned}$$

(1.16)

Then, a probability distribution of x is denoted by

$$\begin{aligned} p(x, {\varvec{\xi }}) = \sum ^n_{i=1} \xi _i \delta _i(x)+ p_0 ({\varvec{\xi }}) \delta _0 (x) \end{aligned}$$

(1.17)

in terms of coordinates $${\varvec{\xi }}$$ .

We shall use another coordinate system $${\varvec{\theta }}$$ later, given by

$$\begin{aligned} \theta _i = \log \frac{p_i}{p_0}, \quad i=1, \ldots , n, \end{aligned}$$

(1.18)

which is also very useful.

C3. Regular Statistical Model

Let x be a random variable which may take discrete, scalar or vector continuous values. A statistical model is a family of probability distributions

$$M=\left\{ p(x, {\varvec{\xi }})\right\} $$

specified by a vector parameter $${\varvec{\xi }}$$ . When it satisfies certain regularity conditions, it is called a regular statistical model. Such an M is a manifold, where $${\varvec{\xi }}$$ plays the role of a coordinate system. The family of Gaussian distributions and the family of discrete probability distributions are examples of the regular statistical model. Information geometry has emerged from a study of invariant geometrical structures of regular statistical models.

D. Manifold of Positive Measures

Let x be a variable taking values in set

$$N=\left\{ 1, 2, \ldots , n \right\} $$

. We assign a positive measure (or a weight) $$m_i$$ to element

$$i, i=1, \ldots , n$$

. Then

$$\begin{aligned} {\varvec{\xi }} = \left( m_1, \ldots , m_n \right) , \quad m_i>0 \end{aligned}$$

(1.19)

defines a distribution of measures over N. The set of all such measures sits in the first quadrant $${\textit{\textbf{R}}}^{n}_+$$ of an n-dimensional Euclidean space. The sum

$$\begin{aligned} m = \sum ^n_{i=1} m_i \end{aligned}$$

(1.20)

is called the total mass of

$${\textit{\textbf{m}}} = \left( m_1, \ldots , m_n \right) $$

When $${\textit{\textbf{m}}}$$ satisfies the constraint that the total mass is equal to 1,

$$\begin{aligned} \sum m_i = 1, \end{aligned}$$

(1.21)

it is a probability distribution belonging to $$S_{n-1}$$ . Hence, $$S_{n-1}$$ is included in $${\textit{\textbf{R}}}^n_+$$ as its submanifold.

A positive measure (unnormalized probability distribution) appears in many engineering problems. For example, image s(x, y) drawn on the x–y plane is a positive measure when the brightness is positive,

$$\begin{aligned} s(x, y)>0. \end{aligned}$$

(1.22)

When we discretize the x–y plane into $$n^2$$ pixels (i, j), the discretized pictures $$\left\{ s(i, j)\right\} $$ form a positive measure belonging to $${\textit{\textbf{R}}}^{n^2}_{+}$$ . Similarly, when we consider a discretized power spectrum of a sound, it is a positive measure. The histogram of observed data defines a positive measure, too.

E. Positive-Definite Matrices

Let A be an $$n \times n$$ matrix. All such matrices form an $$n^2$$ -dimensional manifold. When A is symmetric and positive-definite, they form a $$\frac{n(n+1)}{2}$$ -dimensional manifold. This is a submanifold embedded in the manifold of all the matrices. We may use the upper right elements of A as a coordinate system. Positive-definite matrices appear in statistics, physics, operations research, control theory, etc.

F. Neural Manifold

A neural network is composed of a large number of neurons connected with each other, where the dynamics of information processing takes place. A network is specified by connection weights $$w_{ji}$$ connecting neuron i with neuron j. The set of all such networks forms a manifold, where matrix

$$ \mathbf{W} =\left( w_{ji} \right) $$

is a coordinate system. We will later analyze behaviors of such networks from the information geometry point of view.

1.2 Divergence Between Two Points

1.2.1 Divergence

Let us consider two points P and Q in a manifold M, of which coordinates are $${\varvec{\xi }}_{P}$$ and $${\varvec{\xi }}_Q$$ . A divergence D[P : Q] is a function of $${\varvec{\xi }}_p$$ and $${\varvec{\xi }}_Q$$ which satisfies certain criteria. See Basseville (2013) for a detailed bibliography. We may write it as

$$\begin{aligned} D[P:Q] = D \left[ {\varvec{\xi }}_P : {\varvec{\xi }}_Q \right] . \end{aligned}$$

(1.23)

We assume that it is a differentiable function of $${\varvec{\xi }}_P$$ and $${\varvec{\xi }}_Q$$ .

Definition 1.1

D[P : Q] is called a divergence when it satisfies the following criteria:

(1)

$$D[P:Q] \ge 0$$

(2)

$$D[P:Q]=0$$

, when and only when $$P=Q$$ .

(3)

When P and Q are sufficiently close, by denoting their coordinates by $${\varvec{\xi }}_P$$ and

$${\varvec{\xi }}_Q = {\varvec{\xi }}_P + d{\varvec{\xi }}$$

, the Taylor expansion of D is written as

$$\begin{aligned} D[\varvec{\xi }_P : \varvec{\xi }_P+d \varvec{\xi }]= \frac{1}{2} \sum g_{ij} ({\varvec{\xi }}_P)d \xi _i d \xi _j + O (|d {\varvec{\xi }}|^3), \end{aligned}$$

(1.24)

and matrix $${\mathbf{G }}=\left( g_{ij}\right) $$ is positive-definite, depending on $${\varvec{\xi }}_P$$ .

A divergence represents a degree of separation of two points P and Q, but it or its square root is not a distance. It does not necessarily satisfy the symmetry condition, so that in general

$$\begin{aligned} D[P:Q] \ne D[Q:P]. \end{aligned}$$

(1.25)

We may call D[P : Q] divergence from P to Q. Moreover, the triangular inequality does not hold. It has the dimension of the square of distance, as is suggested by (1.24). It is possible to symmetrize a divergence by

$$\begin{aligned} D_S[P:Q] = \frac{1}{2} \left( D[P:Q]+D[Q:P]\right) . \end{aligned}$$

(1.26)

However, the asymmetry of divergence plays an important role in information geometry, as will be seen later.

When P and Q are sufficiently close, we define the square of an infinitesimal distance ds between them by using (1.24) as

$$\begin{aligned} ds^2 = 2D \left[ {\varvec{\xi }}:{\varvec{\xi }}+ d{\varvec{\xi }}\right] = \sum g_{ij} d \xi _i d \xi _j. \end{aligned}$$

(1.27)

A manifold M is said to be Riemannian when a positive-definite matrix $$\mathbf{G }({\varvec{\xi }})$$ is defined on M and the square of the local distance between two nearby points $${\varvec{\xi }}$$ and $${\varvec{\xi }}+ d{\varvec{\xi }}$$ is given by (1.27). A divergence D provides M with a Riemannian structure.

1.2.2 Examples of Divergence

A. Euclidean Divergence

When we use an orthonormal Cartesian coordinate system in a Euclidean space, we define a divergence by a half of the square of the Euclidean distance,

$$\begin{aligned} D[P:Q]= \frac{1}{2} \sum \left( \xi _{Pi}- \xi _{Qi} \right) ^2. \end{aligned}$$

(1.28)

The matrix $$\mathbf{G} $$ is the identity matrix in this case, so that

$$\begin{aligned} ds^2 = \sum \left( d \xi _i \right) ^2. \end{aligned}$$

(1.29)

B. Kullback–Leibler Divergence

Let p(x) and q(x) be two probability distributions of random variable x in a manifold of probability distributions. The following is called the Kullback–Leibler (KL) divergence:

$$\begin{aligned} D_{KL} [p(x):q(x)] = \int p(x)\log \frac{p(x)}{q(x)}dx. \end{aligned}$$

(1.30)

When x is discrete, integration is replaced by summation. We can easily check that it satisfies the criteria of divergence. It is asymmetric in general and is useful in statistics, information theory, physics, etc. Many other divergences will be introduced later in a manifold of probability distributions.

C. KL-Divergence for Positive Measures

A manifold of positive measures $${\textit{\textbf{R}}}^{n}_+$$ is a subset of a Euclidean space. Hence, we can introduce the Euclidean divergence (1.28) in it. However, we can extend the KL-divergence to give

$$\begin{aligned} D_{KL} \left[ {\textit{\textbf{m}}}_1: {\textit{\textbf{m}}}_2 \right] = \sum m_{1i} \log \frac{m_{1i}}{m_{2i}} - \sum m_{1i} + \sum m_{2i}. \end{aligned}$$

(1.31)

When the total masses of two measures $${\textit{\textbf{m}}}_1$$ and $${\textit{\textbf{m}}}_2$$ are 1, they are probability distributions and

$$D_{KL} \left[ {\textit{\textbf{m}}}_1: {\textit{\textbf{m}}}_2 \right] $$

reduces to the KL-divergence $$D_{KL}$$ in (1.30).

D. Divergences for Positive-Definite Matrices

There is a family of useful divergences introduced in the manifold of positive-definite matrices. Let P and Q be two positive-definite matrices. The following are typical examples of divergence:

$$\begin{aligned} D[\mathbf{P }:\mathbf{Q }] = \text{ tr } \left( \mathbf{P } \log \mathbf{P }-\mathbf{P } \log \mathbf{Q }-\mathbf{P }+\mathbf{Q } \right) , \end{aligned}$$

(1.32)

which is related to the Von Neumann entropy of quantum mechanics,

$$\begin{aligned} D[{\mathbf{P }}:\mathbf{Q }] = \text{ tr } \left( \mathbf{P }\mathbf{Q }^{-1}\right) -\log \left| \mathbf{P }\mathbf{Q }^{-1}\right| -n, \end{aligned}$$

(1.33)

which is due to the KL-divergence of multivariate Gaussian distribution, and

$$\begin{aligned} D[\mathbf{P }:\mathbf{Q }] = \frac{4}{1-\alpha ^2} \text{ tr } \left( -\mathbf{P }^{\frac{1-\alpha }{2}} \mathbf{Q }^{\frac{1+\alpha }{2}} + \frac{1-\alpha }{2} \mathbf{P }+ \frac{1+\alpha }{2} \mathbf{Q } \right) , \end{aligned}$$

(1.34)

which is called the $$\alpha $$ -divergence, where $$\alpha $$ is a real parameter. Here, tr $$\mathbf{P }$$ denotes the trace of matrix $$\mathbf{P }$$ and $$|\mathbf{P }|$$ is the determinant of $$\mathbf{P }$$ .

1.3 Convex Function and Bregman Divergence

1.3.1 Convex Function

A nonlinear function $$\psi ({\varvec{\xi }})$$ of coordinates $${\varvec{\xi }}$$ is said to be convex when the inequality

$$\begin{aligned} \lambda \psi \left( {\varvec{\xi }}_1 \right) + (1-\lambda ) \psi \left( {\varvec{\xi }}_2 \right) \ge \psi \left\{ \lambda {\varvec{\xi }}_1 + (1-\lambda ){\varvec{\xi }}_2 \right\} \end{aligned}$$

(1.35)

is satisfied for any $${\varvec{\xi }}_1$$ , $${\varvec{\xi }}_2$$ and scalar $$0 \le \lambda \le 1$$ . We consider a differentiable convex function. Then, a function is convex if and only if its Hessian

$$\begin{aligned} \mathbf{H }({\varvec{\xi }}) = \left( \frac{\partial ^2}{\partial \xi _i \partial \xi _j} \psi (\varvec{\xi }) \right) \end{aligned}$$

(1.36)

is positive-definite.

There are many convex functions appearing in physics, optimization and engineering problems. One simple example is

$$\begin{aligned} \psi ({\varvec{\xi }}) = \frac{1}{2} \sum \xi ^2_i \end{aligned}$$

(1.37)

which is a half of the square of the Euclidean distance from the origin to point $${\varvec{\xi }}$$ . Let $${\textit{\textbf{p}}}$$ be a probability distribution belonging to $$S_n$$ . Then, its entropy

$$\begin{aligned} H({\textit{\textbf{p}}})= -\sum p_i \log p_i \end{aligned}$$

(1.38)

is a concave function, so that its negative,

$$\varphi ({\textit{\textbf{p}}})= -H({\textit{\textbf{p}}})$$

, is a convex function.

We give one more example from a probability model. An exponential family of probability distributions is written as

$$\begin{aligned} p({\textit{\textbf{x}}}, {\varvec{\theta }})= \exp \left\{ \sum \theta _i x_i + k({\textit{\textbf{x}}})-\psi ({\varvec{\theta }}) \right\} , \end{aligned}$$

(1.39)

where $$p({\textit{\textbf{x}}}, {\varvec{\theta }})$$ is the probability density function of vector random variable $${\textit{\textbf{x}}}$$ specified by vector parameter $${\varvec{\theta }}$$ and $$k({\textit{\textbf{x}}})$$ is a function of $${\textit{\textbf{x}}}$$ . The term

$$\exp \left\{ -\psi ({\varvec{\theta }})\right\} $$

is the normalization factor with which

$$\begin{aligned} \int p({\textit{\textbf{x}}}, {\varvec{\theta }})d{\textit{\textbf{x}}} = 1 \end{aligned}$$

(1.40)

is satisfied. Therefore, $$\psi ({\varvec{\theta }})$$ is given by

$$\begin{aligned} \psi ({\varvec{\theta }}) = \log \int \exp \left\{ \sum \theta _i x_i + k({\textit{\textbf{x}}}) \right\} d{\textit{\textbf{x}}}. \end{aligned}$$

(1.41)

$$M= \left\{ p({\textit{\textbf{x}}}, {\varvec{\theta }})\right\} $$

is regarded as a manifold, where $${\varvec{\theta }}$$ is a coordinate system. By differentiating (1.41), we can prove that its Hessian is positive-definite (see the next subsection). Hence, $$\psi ({\varvec{\theta }})$$ is a convex function. It is known as the cumulant generating function in statistics and free energy in statistical physics. The exponential family plays a fundamental role in information geometry.

1.3.2 Bregman Divergence

A graph of a convex function is shown in Fig. 1.4. We draw a tangent hyperplane touching it at point $${\varvec{\xi }}_0$$ (Fig. 1.4). It is given by the equation

$$\begin{aligned} z= \psi \left( {\varvec{\xi }}_0 \right) + \nabla \psi \left( {\varvec{\xi }}_0 \right) \cdot \left( {\varvec{\xi }}-{\varvec{\xi }}_0 \right) , \end{aligned}$$

(1.42)

where z is the vertical axis of the graph. Here, $$\nabla $$ is the gradient operator such that $$\nabla \psi $$ is the gradient vector defined by

$$\begin{aligned} \nabla \psi = \left( \frac{\partial }{\partial \xi _i} \psi ({\varvec{\xi }}) \right) , \quad i=1, \ldots , n \end{aligned}$$

(1.43)

in the component form. Since $$\psi $$ is convex, the graph of $$\psi $$ is always above the hyperplane, touching it at $${\varvec{\xi }}_0$$ . Hence, it is a supporting hyperplane of $$\psi $$ at $${\varvec{\xi }}_0$$ (Fig. 1.4).

../images/385161_1_En_1_Chapter/385161_1_En_1_Fig4_HTML.png

Fig. 1.4

Convex function $$z= \psi (\xi )$$ , its supporting hyperplane with normal vector

$${\textit{\textbf{n}}}= \nabla \psi \left( \xi _0\right) $$

and divergence $$D \left[ \xi : \xi _0\right] $$

We evaluate how high the function $$\psi ({\varvec{\xi }})$$ is at $${\varvec{\xi }}$$ from the hyperplane (1.42). This depends on the point $${\varvec{\xi }}_0$$ at which the supporting hyperplane is defined. The difference from (1.42) is written as

$$\begin{aligned} D_{\psi } \left[ {\varvec{\xi }} : {\varvec{\xi }}_0 \right] = \psi ({\varvec{\xi }})- \psi \left( {\varvec{\xi }}_0 \right) -\nabla \psi \left( {\varvec{\xi }}_0 \right) \cdot \left( {\varvec{\xi }}-{\varvec{\xi }}_0 \right) . \end{aligned}$$

(1.44)

Considering it as a function of two points $${\varvec{\xi }}$$ and $${\varvec{\xi }}_0$$ , we can easily prove that it satisfies the criteria of divergence. This is called the Bregman divergence [Bregman (1967)] derived from a convex function $$\psi $$ .

We show examples of Bregman divergence.

Example 1.1

(Euclidean divergence) For $$\psi $$ defined by (1.37) in a Euclidean space, we easily see that the divergence is

$$\begin{aligned} D \left[ {\varvec{\xi }}: {\varvec{\xi }}_0 \right] = \frac{1}{2} \left| {\varvec{\xi }}-{\varvec{\xi }}_0 \right| ^2, \end{aligned}$$

(1.45)

that is, the same as a half of the square of the Euclidean distance. It is symmetric.

Example 1.2

(Logarithmic divergence) We consider a convex function

$$\begin{aligned} \psi ({\varvec{\xi }}) = -\sum ^n_{i=1} \log \xi _i \end{aligned}$$

(1.46)

in the manifold $${\textit{\textbf{R}}}^n_+$$ of positive measures. Its gradient is

$$\begin{aligned} \nabla \psi ({\varvec{\xi }}) = \left( -\frac{1}{\xi _i}\right) . \end{aligned}$$

(1.47)

Hence, the Bregman divergence is

$$\begin{aligned} D_{\psi } \left[ {\varvec{\xi }}:{\varvec{\xi }}^{\prime }\right] = \sum ^n_{i=1} \left( \log \frac{\xi ^{\prime }_i}{\xi _i} + \frac{\xi _i}{\xi ^{\prime }_i} -1 \right) . \end{aligned}$$

(1.48)

For another convex function

$$\begin{aligned} \varphi ({\varvec{\xi }}) = \sum \xi _i \log \xi _i, \end{aligned}$$

(1.49)

the Bregman divergence is the same as the KL-divergence (1.31), given by

$$\begin{aligned} D_{\varphi } \left[ {\varvec{\xi }}:{\varvec{\xi }}^{\prime }\right] = \sum \left( \xi _i \log \frac{\xi _i}{\xi ^{\prime }_i} - \xi _i + \xi ^{\prime }_i \right) . \end{aligned}$$

(1.50)

When

$$\sum \xi _i = \sum \xi ^{\prime }_i = 1$$

, this is the KL-divergence from probability vector $${\varvec{\xi }}$$ to another $${\varvec{\xi }}^{\prime }$$ .

Example 1.3

(Free energy of exponential family) We calculate the divergence given by the normalization factor $$\psi ({\varvec{\theta }})$$ (1.41) of an exponential family. To this end, we differentiate the identity

$$\begin{aligned} 1 = \int p({\textit{\textbf{x}}}, {\varvec{\theta }}) d{\textit{\textbf{x}}} = \int \exp \left\{ \sum \theta _i x_i + k({\textit{\textbf{x}}}) -\psi ({\varvec{\theta }})\right\} d{\textit{\textbf{x}}} \end{aligned}$$

(1.51)

with respect to $$\theta _i$$ . We then have

$$\begin{aligned} \int \left\{ x_i- \frac{\partial }{\partial \theta _i} \psi ({\varvec{\theta }})\right\} p({\textit{\textbf{x}}}, {\varvec{\theta }})d{\textit{\textbf{x}}} = 0 \end{aligned}$$

(1.52)

$$\begin{aligned} \frac{\partial }{\partial \theta _i} \psi ({\varvec{\theta }})= & {} \int x_i p({\textit{\textbf{x}}}, {\varvec{\theta }}) d{\textit{\textbf{x}}} = \mathbf{E } \left[ x_i \right] = \bar{x}_i, \end{aligned}$$

(1.53)

$$\begin{aligned} \nabla \psi ({\varvec{\theta }})= & {} \mathrm{{E}} \left[ {\textit{\textbf{x}}}\right] , \end{aligned}$$

(1.54)

where $$\mathrm{{E}}$$ denotes the expectation with respect to $$p({\textit{\textbf{x}}}, {\varvec{\theta }})$$ and $$\bar{x}_i$$ is the expectation of $$x_i$$ . We then differentiate (1.52) again with respect to $$\theta _j$$ and, after some calculations, obtain

$$\begin{aligned} -\frac{\partial ^2 \psi ({\varvec{\theta }})}{\partial \theta _i \partial \theta _j} + \mathrm{{E}} \left[ \left( x_i-\bar{x}_i \right) \left( x_j-\bar{x}_j \right) \right] =0 \end{aligned}$$

(1.55)

$$\begin{aligned} \nabla \nabla \psi ({\varvec{\theta }}) = \mathrm{{E}} \left[ \left( {\textit{\textbf{x}}}-\bar{\textit{\textbf{x}}}\right) \left( {\textit{\textbf{x}}}- \bar{\textit{\textbf{x}}} \right) ^T \right] = \text{ Var }[{\textit{\textbf{x}}}], \end{aligned}$$

(1.56)

where $${\textit{\textbf{x}}}^T$$ is the transpose of column vector $${\textit{\textbf{x}}}$$ and $$\text {Var}[{\textit{\textbf{x}}}]$$ is the covariance matrix of $${\textit{\textbf{x}}}$$ , which is positive-definite. This shows that $$\psi ({\varvec{\theta }})$$ is a convex function. It is useful to see that the expectation and covariance of $${\textit{\textbf{x}}}$$ are derived from $$\psi ({\varvec{\theta }})$$ by differentiation.

The Bregman divergence from $${\varvec{\theta }}$$ to $${\varvec{\theta }}^{\prime }$$ derived from $$\psi $$ of an exponential family is calculated from

$$\begin{aligned} D_{\psi } \left[ {\varvec{\theta }} : {\varvec{\theta }}^{\prime }\right] = \psi \left( {\varvec{\theta }}\right) -\psi ({\varvec{\theta }}^{\prime }) - \nabla \psi ({\varvec{\theta }}^{\prime }) \cdot \left( {\varvec{\theta }} -{\varvec{\theta }}^{\prime }\right) , \end{aligned}$$

(1.57)

proving that it is equal to the KL-divergence from $${\varvec{\theta }}^{\prime }$$ to $${\varvec{\theta }}$$ after careful calculations,

$$\begin{aligned} D_{KL} \left[ p \left( {\textit{\textbf{x}}}, {\varvec{\theta }}^{\prime }\right) : p({\textit{\textbf{x}}}, {\varvec{\theta }})\right] = \int p \left( {\textit{\textbf{x}}}, {\varvec{\theta }}^{\prime }\right) \log \frac{p \left( {\textit{\textbf{x}}}, {\varvec{\theta }}^{\prime }\right) }{p({\textit{\textbf{x}}}, {\varvec{\theta }})} d{\textit{\textbf{x}}}. \end{aligned}$$

(1.58)

1.4 Legendre Transformation

The gradient of $$\psi ({\varvec{\xi }})$$

$$\begin{aligned} {\varvec{\xi }}^{*} = \nabla \psi ({\varvec{\xi }}) \end{aligned}$$

(1.59)

is equal to the normal vector $${\textit{\textbf{n}}}$$ of the supporting tangent hyperplane at $${\varvec{\xi }}$$ , as is easily seen from Fig. 1.4. Different points have different normal vectors. Hence, it is possible to specify a point of M by its normal vector. In other words, the transformation between $${\varvec{\xi }}$$ and $${\varvec{\xi }}^{*}$$ is one-to-one and differentiable. This shows that $${\varvec{\xi }}^{*}$$ is used as another coordinate system of M, which is connected with $${\varvec{\xi }}$$ by (1.59).

The transformation (1.59) is known as the Legendre transformation. The Legendre transformation has a dualistic structure concerning the two coupled coordinate systems $${\varvec{\xi }}$$ and $${\varvec{\xi }}^{*}$$ . To show this, we define a new function of $${\varvec{\xi }}^{*}$$ by

$$\begin{aligned} \psi ^{*} \left( {\varvec{\xi }}^{*}\right) = {\varvec{\xi }} \cdot {\varvec{\xi }}^{*} - \psi ({\varvec{\xi }}), \end{aligned}$$

(1.60)

where

$$\begin{aligned} {\varvec{\xi }} \cdot {\varvec{\xi }}^{*} = \sum _i \xi _i \xi ^{*}_i \end{aligned}$$

(1.61)

and $${\varvec{\xi }}$$ is not free but is a function of $${\varvec{\xi }}^{*}$$ ,

$$\begin{aligned} {\varvec{\xi }} = {\textit{\textbf{f}}} \left( {\varvec{\xi }}^{*} \right) , \end{aligned}$$

(1.62)

which is the inverse function of

$${\varvec{\xi }}^{*}= \nabla \psi ({\varvec{\xi }})$$

. By differentiating (1.60) with respect to $${\varvec{\xi }}^{*}$$ , we have

$$\begin{aligned} \nabla \psi ^{*} \left( {\varvec{\xi }}^{*}\right) = {\varvec{\xi }} + \frac{\partial {\varvec{\xi }}}{\partial {\varvec{\xi }}^{*}} {\varvec{\xi }}^{*} - \nabla \psi ({\varvec{\xi }}) \frac{\partial {\varvec{\xi }}}{\partial {\varvec{\xi }}^{*}}. \end{aligned}$$

(1.63)

Since the last two terms of (1.63) cancel out because of (1.59), we have a dualistic structure

$$\begin{aligned} {\varvec{\xi }}^{*} = \nabla \psi ({\varvec{\xi }}), \quad {\varvec{\xi }} = \nabla \psi ^{*} \left( {\varvec{\xi }}^{*}\right) . \end{aligned}$$

(1.64)

$$\psi ^{*}$$ is called the Legendre dual of $$\psi $$ . The dual function $$\psi ^{*}$$ satisfies

$$\begin{aligned} \psi ^{*}\left( {\varvec{\xi }}^{*}\right) = {\mathop {\max }_{\varvec{\xi }^{\prime }}}\left\{ {\varvec{\xi }^{\prime }}\cdot {\varvec{\xi }}^{*}-\psi ({\varvec{\xi }^{\prime }}) \right\} , \end{aligned}$$

(1.65)

which is usually used as the definition of $$\psi ^{*}$$ . Our definition (1.60) is direct. We need to show $$\psi ^{*}$$ is a convex function. The Hessian of $$\psi ^{*}\left( {\varvec{\xi }}^{*}\right) $$ is written as

$$\begin{aligned} \mathrm{\mathbf{G}}^{*}\left( {\varvec{\xi }}^{*}\right) = \nabla \nabla \psi ^{*}\left( {\varvec{\xi }}^{*}\right) = \frac{\partial {\varvec{\xi }}}{\partial {\varvec{\xi }}^{*}}, \end{aligned}$$

(1.66)

which is the Jacobian matrix of the inverse transformation from $${\varvec{\xi }}^{*}$$ to $${\varvec{\xi }}$$ . This is the inverse of the Hessian

$$\mathrm{\mathbf{G}} = \nabla \nabla \psi ({\varvec{\xi }})$$

, since it is the Jacobian matrix of the transformation from $${\varvec{\xi }}$$ to $${\varvec{\xi }}^{*}$$ . Hence, it is a positive-definite matrix. This shows that $$\psi ^{*} \left( {\varvec{\xi }}^{*}\right) $$ is a convex function of $${\varvec{\xi }}^{*}$$ .

A new Bregman divergence is derived from the dual convex function $$\psi ^{*}\left( {\varvec{\xi }}^{*}\right) $$ ,

$$\begin{aligned} D_{\psi ^{*}} \left[ {\varvec{\xi }}^{*}:{\varvec{\xi }}^{*\prime }\right] = \psi ^{*}\left( {\varvec{\xi }}^{*}\right) -\psi ^{*} \left( {\varvec{\xi }}^{*\prime }\right) -\nabla \psi ^{*} \left( {\varvec{\xi }}^{*\prime }\right) \cdot \left( {\varvec{\xi }}^{*}-{\varvec{\xi }}^{*\prime } \right) , \end{aligned}$$

(1.67)

which we call the dual divergence. However, by calculating carefully, one can easily derive

$$\begin{aligned} D_{\psi ^{*}}\left[ {\varvec{\xi }}^{*}:{\varvec{\xi }}^{*\prime }\right] = D_{\psi } \left[ {\varvec{\xi }}^{\prime }:{\varvec{\xi }} \right] . \end{aligned}$$

(1.68)

Hence, the dual divergence is equal to the primal one if the order of two points is exchanged. Therefore, the divergences derived from the two convex functions are substantially the same, except for the order.

It is convenient to use a self-dual expression of divergence by using the two coordinate systems.

Theorem 1.1

The divergence from P to Q derived from a convex $$\psi ({\varvec{\xi }})$$ is written as

$$\begin{aligned} D_{\psi }[P:Q] = \psi \left( {\varvec{\xi }}_P \right) + \psi ^{*} \left( {\varvec{\xi }}^{*}_Q \right) -{\varvec{\xi }}_P \cdot {\varvec{\xi }}^{*}_Q, \end{aligned}$$

(1.69)

where $${\varvec{\xi }}_P$$ is the coordinates of P in $${\varvec{\xi }}$$ coordinate system and $${\varvec{\xi }}^{*}_Q$$ is the coordinates of Q in $${\varvec{\xi }}^{*}$$ coordinate system.

Proof

From (1.60), we have

$$\begin{aligned} \psi ^{*} \left( {\varvec{\xi }}^{*}_Q \right) = {\varvec{\xi }}_Q \cdot {\varvec{\xi }}^{*}_Q -\psi ({\varvec{\xi }}_Q). \end{aligned}$$

(1.70)

Substituting (1.70) in (1.69) and using

$$\nabla \psi \left( {\varvec{\xi }}_Q \right) = {\varvec{\xi }}^{*}_Q$$

, we have the theorem.

We give examples of dual convex functions. For convex function (1.37) in Example 1.1, we easily have

$$\begin{aligned} \psi ^{*} \left( {\varvec{\xi }}^{*}\right) = \frac{1}{2} \left| {\varvec{\xi }}^{*}\right| ^2 \end{aligned}$$

(1.71)

and

$$\begin{aligned} {\varvec{\xi }}^{*} = {\varvec{\xi }}. \end{aligned}$$

(1.72)

Hence, the dual convex function is the same as the primal one, implying that the structure is self-dual. $$\square $$

In the case of Example 1.2, the duals of $$\psi $$ and $$\varphi $$ in (1.46) and (1.49) are

$$\begin{aligned} \psi ^{*} \left( {\varvec{\xi }}^{*}\right)= & {} -\sum \left\{ 1+ \log \left( -\xi ^{*}_i \right) \right\} , \end{aligned}$$

(1.73)

$$\begin{aligned} \varphi ^{*} \left( {\varvec{\xi }}^{*}\right)= & {} \sum \exp \left\{ \xi ^{*}_i -1 \right\} , \end{aligned}$$

(1.74)

by which

$$\begin{aligned} \nabla \psi ^{*} \left( {\varvec{\xi }}^{*}\right) = {\varvec{\xi }}, \quad \nabla \varphi ^{*} \left( \xi ^{*}\right) = {\varvec{\xi }} \end{aligned}$$

(1.75)

hold, respectively.

In the case of the free energy $$\psi ({\varvec{\theta }})$$ in Example 1.3, its Legendre transformation is

$$\begin{aligned} {\varvec{\theta }}^{*} = \nabla \psi ({\varvec{\theta }}) = \mathrm{{E}}_{\varvec{\theta }}[{\textit{\textbf{x}}}], \end{aligned}$$

(1.76)

where $$\mathrm{{E}}_{\varvec{\theta }}$$ is the expectation with respect to $$p({\textit{\textbf{x}}}, {\varvec{\theta }})$$ . Because of this, $${\varvec{\theta }}^{*}$$ is called the expectation parameter in statistics. The dual convex function $$\psi ^{*} \left( {\varvec{\theta }^{*}}\right) $$ derived from (1.65) is calculated from

$$\begin{aligned} \psi ^{*} \left( {\varvec{\theta }}^{*}\right) = {\varvec{\theta }}^{*} \cdot {\varvec{\theta }} -\psi ({\varvec{\theta }}), \end{aligned}$$

(1.77)

where $${\varvec{\theta }}$$ is a function of $${\varvec{\theta }}^{*}$$ given by

$${\varvec{\theta }}^{*}= \nabla \psi ({\varvec{\theta }})$$

. This proves that $$\psi ^{*}$$ is the negative entropy,

$$\begin{aligned} \psi ^{*} \left( {\varvec{\theta }^{*}}\right) = \int p({\textit{\textbf{x}}}, {\varvec{\theta }}) \log p({\textit{\textbf{x}}}, {\varvec{\theta }})d{\textit{\textbf{x}}}. \end{aligned}$$

(1.78)

The dual divergence derived from $$\psi ^{*}\left( {\varvec{\theta }}^{*}\right) $$ is the KL-divergence

$$\begin{aligned} D_{\psi ^{*}} \left[ {\varvec{\theta }}^{*} : {\varvec{\theta }}^{*\prime }\right] = D_{KL} \left[ p({\textit{\textbf{x}}}, {\varvec{\theta }}): p \left( {\textit{\textbf{x}}}, {\varvec{\theta }}^{\prime } \right) \right] , \end{aligned}$$

(1.79)

where

$${\varvec{\theta }}= \nabla \psi ^{*} ({\varvec{\theta }}^{*})$$

and

$${\varvec{\theta }}^{\prime }= \nabla \psi ^{*} \left( {\varvec{\theta }}^{*\prime }\right) $$

1.5 Dually Flat Riemannian Structure Derived from Convex Function

1.5.1 Affine and Dual Affine Coordinate Systems

When a function $$\psi ({\varvec{\theta }})$$ is convex in a coordinate system $${\varvec{\theta }}$$ , the same function expressed in another coordinate system $${\varvec{\xi }}$$ ,

$$\begin{aligned} \tilde{\psi }({\varvec{\xi }}) = \psi \left\{ {\varvec{\theta }}({\varvec{\xi }})\right\} , \end{aligned}$$

(1.80)

is not necessarily convex as a function of $${\varvec{\xi }}$$ . Hence, the convexity of a function depends on the coordinate system of M. But a convex function remains convex under affine transformations

$$\begin{aligned} {\varvec{\theta }^{\prime }} = \mathrm{\mathbf{A}} {\varvec{\theta }} + {\textit{\textbf{b}}}, \end{aligned}$$

(1.81)

where $$\mathrm{\mathbf{A}}$$ is a non-singular constant matrix and $${\textit{\textbf{b}}}$$ is a constant vector.

We fix a coordinate system $${\varvec{\theta }}$$ in which $$\psi ({\varvec{\theta }})$$ is convex and introduce geometric structures to M based on it. We consider $${\varvec{\theta }}$$ as an affine coordinate system, which provides M with an affine flat structure: M is a flat manifold and each coordinate axis of $${\varvec{\theta }}$$ is a straight line. Any curve $${\varvec{\theta }}(t)$$ of M written in the linear form of parameter t,

$$\begin{aligned} {\varvec{\theta }}(t) = {\textit{\textbf{a}}}t+ {\textit{\textbf{b}}}, \end{aligned}$$

(1.82)

is a straight line, where and $${\textit{\textbf{a}}}$$ and $${\textit{\textbf{b}}}$$ are constant vectors. We call it a geodesic of an affine manifold. Here, the term geodesic is used to represent a straight line and does not mean the shortest path connecting two points. A geodesic is invariant under affine transformations (1.81), but this is not true under nonlinear coordinate transformations.

Dually, we can define another coordinate system $${\varvec{\theta }^{*}}$$ by the Legendre transformation,

$$\begin{aligned} {\varvec{\theta }}^{*} = \nabla \psi ({\varvec{\theta }}), \end{aligned}$$

(1.83)

and consider it as another type of affine coordinates. This defines another affine structure. Each coordinate axis of $${\varvec{\theta }}^{*}$$ is a dual straight line or dual geodesic. A dual straight line is written as

$$\begin{aligned} {\varvec{\theta }}^{*}(t) = {\textit{\textbf{a}}}t+{\textit{\textbf{b}}}. \end{aligned}$$

(1.84)

This is the dual affine structure derived from the convex function $$\psi ^{*}\left( {\varvec{\theta }}^{*}\right) $$ . Since the coordinate transformation between the two affine coordinate systems $${\varvec{\theta }}$$ and $${\varvec{\theta }}^{*}$$ is not linear in general, a geodesic is not a dual geodesic and vice versa. This implies that we have introduced two different criteria of straightness or flatness in M, namely primal and dual flatness. M is dually flat and the two flat coordinates are connected by the Legendre transformation.

1.5.2 Tangent Space, Basis Vectors and Riemannian Metric

When $$d{\varvec{\theta }}$$ is an (infinitesimally) small line element, the square of its length ds is given by

$$\begin{aligned} ds^2 = 2 D_{\psi } \left[ {\varvec{\theta }}:{\varvec{\theta }}+d{\varvec{\theta }}\right] = \sum g_{ij}d\theta ^i d \theta ^j. \end{aligned}$$

(1.85)

Here, we use the upper indices i, j to represent components of $${\varvec{\theta }}$$ . It is easy to see that the Riemannian metric $$g_{ij}$$ is given by the Hessian of $$\psi $$

$$\begin{aligned} g_{ij}({\varvec{\theta }}) = \frac{\partial ^2}{\partial \theta ^i \partial \theta ^j} \psi ({\varvec{\theta }}). \end{aligned}$$

(1.86)

Let

$$\left\{ {\textit{\textbf{e}}_i}, i=1, \ldots , n \right\} $$

be the set of tangent vectors along the coordinate curves of $${\varvec{\theta }}$$ (Fig. 1.5). The vector space spanned by $$\left\{ {\textit{\textbf{e}}}_i \right\} $$ is the tangent space of M at each point. Since $${\varvec{\theta }}$$ is an affine coordinate system, $$\left\{ {\textit{\textbf{e}}_i}\right\} $$ looks the same at any point. A tangent vector $${\textit{\textbf{A}}}$$ is represented as

../images/385161_1_En_1_Chapter/385161_1_En_1_Fig5_HTML.png

Fig. 1.5

Basis vectors $${\textit{\textbf{e}}}_i$$ and small line element $$d{\varvec{\theta }}$$

$$\begin{aligned} {\textit{\textbf{A}}} = \sum A^i{\textit{\textbf{e}}}_i, \end{aligned}$$

(1.87)

where $$A^i$$ are the components of $${\textit{\textbf{A}}}$$ with respect to the basis vectors

$$\left\{ {\textit{\textbf{e}}}_i \right\} , i=1, \ldots , n$$

. The small line element $$d{\varvec{\theta }}$$ is a tangent vector expressed as

$$\begin{aligned} d{\varvec{\theta }} = \sum d \theta ^i {\textit{\textbf{e}}}_i. \end{aligned}$$

(1.88)

Dually, we introduce a set of basis vectors $$\left\{ {\textit{\textbf{e}}}^{*i}\right\} $$ which are tangent vectors of the dual affine coordinate curves of $${\varvec{\theta }}^{*}$$ (Fig. 1.6). The small line element $$d{\varvec{\theta }}^{*}$$ is expressed as

$$\begin{aligned} d{\varvec{\theta }}^{*} = \sum d \theta _i^{*} {\textit{\textbf{e}}}^{*i} \end{aligned}$$

(1.89)

in this basis. A vector $${\textit{\textbf{A}}}$$ is represented in this basis as

$$\begin{aligned} {\textit{\textbf{A}}} = \sum A_i {\textit{\textbf{e}}}^{*i}. \end{aligned}$$

(1.90)

In order to distinguish affine and dual affine bases, we use the lower index as in $${\textit{\textbf{e}}}_i$$ for the affine basis and the upper index as in $${\textit{\textbf{e}}}^{*i}$$ for the dual affine basis. Then, by using the lower and upper indices as in $$A^i$$ and $$A_i$$ in the two bases, the components of a vector are naturally expressed without changing the letter A but by changing the position of the index to upper or lower. Since they are the same vector expressed in different bases,

$$\begin{aligned} {\textit{\textbf{A}}} = \sum A^i {\textit{\textbf{e}}}_i = \sum A_i {\textit{\textbf{e}}}^{*i}, \end{aligned}$$

(1.91)

and $$A_i \ne A^i$$ in general.

../images/385161_1_En_1_Chapter/385161_1_En_1_Fig6_HTML.png

Fig. 1.6

Two dual bases $$\left\{ {\textit{\textbf{e}}}_i \right\} $$ and $$\left\{ {\textit{\textbf{e}}}^{*i}\right\} $$

It is cumbersome to use the summation symbol in Eqs. (1.87)–(1.91) and others. Even if the summation symbol is discarded, the reader may consider from the context that it has been omitted by mistake. In most cases, index i appearing twice in one term, once as an upper index and the other time as a lower index, is summed over from 1 to n. A. Einstein introduced the following summation convention:

Einstein Summation Convention: When the same index appears twice in one term, once as an upper index and the other time as a lower index, summation is automatically taken over this index even without the summation symbol.

We use this convention throughout the monograph, unless specified otherwise. Then, (1.91) is rewritten as

$$\begin{aligned} {\textit{\textbf{A}}} = A^i {\textit{\textbf{e}}}_i = A_i {\textit{\textbf{e}}}^{*i}. \end{aligned}$$

(1.92)

Since the square of the length ds of a small line element $$d{\varvec{\theta }}$$ is given by the inner product of $$d{\varvec{\theta }}$$ , we have

$$\begin{aligned} ds^2 = \langle d{\varvec{\theta }}, d {\varvec{\theta }} \rangle = g_{ij}d \theta ^i d \theta ^j, \end{aligned}$$

(1.93)

which is rewritten as

$$\begin{aligned} ds^2 = \langle d \theta ^i {\textit{\textbf{e}}}_i, d \theta ^j {\textit{\textbf{e}}}_j \rangle = \langle {\textit{\textbf{e}}}_i, {\textit{\textbf{e}}}_j \rangle d \theta ^i d \theta ^j. \end{aligned}$$

(1.94)

Therefore, we have

$$\begin{aligned} g_{ij}({\varvec{\theta }}) = \langle {\textit{\textbf{e}}}_i, {\textit{\textbf{e}}}_j \rangle . \end{aligned}$$

(1.95)

This is the inner product of basis vectors $${\textit{\textbf{e}}}_i$$ and $${\textit{\textbf{e}}}_j$$ , which depends on position $${\varvec{\theta }}$$ .

A manifold equipped with $$ \mathrm{\mathbf{G}} = \left( g_{ij} \right) $$ , by which the length of a small line element $$d{\varvec{\theta }}$$ is given by (1.93), is a Riemannian manifold. In the case of a Euclidean space with an orthonormal coordinate system, $$g_{ij}$$ is given by

$$\begin{aligned} g_{ij} = \delta _{ij}, \end{aligned}$$

(1.96)

where $$\delta _{ij}$$ is the Kronecker delta, which is equal to 1 for $$i=j$$ and 0 otherwise. This is derived from convex function (1.37). A Euclidean space is a special case of the Riemannian manifold in which there is a coordinate system such that $$g_{ij}$$ does not depend on position, in particular, written as (1.96). A manifold induced from a convex function is not Euclidean in general.

The Riemannian metric can also be represented in the dual affine coordinate system $${\varvec{\theta }}^{*}$$ . From the representation of a small line element $$d{\varvec{\theta }}^{*}$$ as

$$\begin{aligned} d{\varvec{\theta }}^{*} = d \theta _i^{*} {\textit{\textbf{e}}}^{*i}, \end{aligned}$$

(1.97)

we have

$$\begin{aligned} ds^2 = \langle d{\varvec{\theta }}^{*}, d {\varvec{\theta }}^{*} \rangle = g^{*ij} d \theta _i^{*} d {\varvec{\theta }}^{*}_j, \end{aligned}$$

(1.98)

where $$g^{*ij}$$ is given by

$$\begin{aligned} g^{*ij} = \langle {\textit{\textbf{e}}}^{*i}, {\textit{\textbf{e}}}^{*j} \rangle . \end{aligned}$$

(1.99)

From (1.66), we see that the components of the small line elements $$d{\varvec{\theta }}$$ and $$d{\varvec{\theta }}^{*}$$ are related as

$$\begin{aligned}&d{\varvec{\theta }}^{*} = \mathrm{\mathbf{G}} d{\varvec{\theta }}, \quad d{\varvec{\theta }} = \mathrm{\mathbf{G}}^{-1}d{\varvec{\theta }}^{*}, \end{aligned}$$

(1.100)

$$\begin{aligned}&d \theta ^{*}_i = g_{ij} d \theta ^j, \quad d \theta ^j= g^{*ji} d \theta ^{*}_i, \end{aligned}$$

(1.101)

where $$\mathrm{\mathbf{G}} = \mathrm{\mathbf{G}}^{*-1}$$ . So the two Riemannian metric tensors are mutually inverse.

This also implies that the two bases are related as

$$\begin{aligned} {\textit{\textbf{e}}}^{*i} = g^{ij}{\textit{\textbf{e}}}_j, \quad {\textit{\textbf{e}}}_i = g_{ij}{\textit{\textbf{e}}}^{*j}. \end{aligned}$$

Enjoying the preview?

Page 1 of 1

Information Geometry and Its Applications

About this ebook

Shun-ichi Amari

Related authors

Related to Information Geometry and Its Applications

Titles in the series (1)

Related ebooks

Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Information Geometry and Its Applications

What did you think?

Book preview

Information Geometry and Its Applications - Shun-ichi Amari

1. Manifold, Divergence and Dually Flat Structure

1.1 Manifolds

1.1.1 Manifold and Coordinate Systems

1.1.2 Examples of Manifolds

1.2 Divergence Between Two Points

1.2.1 Divergence

1.2.2 Examples of Divergence

1.3 Convex Function and Bregman Divergence

1.3.1 Convex Function

1.3.2 Bregman Divergence

1.4 Legendre Transformation

1.5 Dually Flat Riemannian Structure Derived from Convex Function

1.5.1 Affine and Dual Affine Coordinate Systems

1.5.2 Tangent Space, Basis Vectors and Riemannian Metric