Machine Learning for Future Fiber-Optic Communication Systems

Ebook765 pages9 hours

Machine Learning for Future Fiber-Optic Communication Systems

Name: Machine Learning for Future Fiber-Optic Communication Systems
ISBN: 9780323852289

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Machine Learning for Future Fiber-Optic Communication Systems provides a comprehensive and in-depth treatment of machine learning concepts and techniques applied to key areas within optical communications and networking, reflecting the state-of-the-art research and industrial practices. The book gives knowledge and insights into the role machine learning-based mechanisms will soon play in the future realization of intelligent optical network infrastructures that can manage and monitor themselves, diagnose and resolve problems, and provide intelligent and efficient services to the end users.

With up-to-date coverage and extensive treatment of various important topics related to machine learning for fiber-optic communication systems, this book is an invaluable reference for photonics researchers and engineers. It is also a very suitable text for graduate students interested in ML-based signal processing and networking.

Discusses the reasons behind the recent popularity of machine learning (ML) concepts in modern optical communication networks and the why/where/how ML can play a unique role
Presents fundamental ML techniques like artificial neural networks (ANNs), support vector machines (SVMs), K-means clustering, expectation-maximization (EM) algorithm, principal component analysis (PCA), independent component analysis (ICA), reinforcement learning, and more
Covers advanced deep learning (DL) methods such as deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs)
Individual chapters focus on ML applications in key areas of optical communications and networking

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateFeb 10, 2022

ISBN9780323852289

Related to Machine Learning for Future Fiber-Optic Communication Systems

Related ebooks

Skip carousel

Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications
Ebook
Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications
byVineeth Balasubramanian
Rating: 0 out of 5 stars
0 ratings
Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture
Ebook
Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture
byXichuan Zhou
Rating: 0 out of 5 stars
0 ratings
Nature-Inspired Computation and Swarm Intelligence: Algorithms, Theory and Applications
Ebook
Nature-Inspired Computation and Swarm Intelligence: Algorithms, Theory and Applications
byXin-She Yang
Rating: 0 out of 5 stars
0 ratings
Generative Adversarial Networks for Image-to-Image Translation
Ebook
Generative Adversarial Networks for Image-to-Image Translation
byArun Solanki
Rating: 0 out of 5 stars
0 ratings
Quantum Inspired Computational Intelligence: Research and Applications
Ebook
Quantum Inspired Computational Intelligence: Research and Applications
bySiddhartha Bhattacharyya
Rating: 0 out of 5 stars
0 ratings
Computing Perspectives
Ebook
Computing Perspectives
byMaurice V. Wilkes
Rating: 5 out of 5 stars
5/5
Quantum Well Lasers
Ebook
Quantum Well Lasers
byPeter S. Zory Jr.
Rating: 0 out of 5 stars
0 ratings
Quadratic Form Theory and Differential Equations
Ebook
Quadratic Form Theory and Differential Equations
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Mathematical Experiments on the Computer
Ebook
Mathematical Experiments on the Computer
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Cooperative and Graph Signal Processing: Principles and Applications
Ebook
Cooperative and Graph Signal Processing: Principles and Applications
byPetar Djuric
Rating: 0 out of 5 stars
0 ratings
Quantum Theory of Collective Phenomena
Ebook
Quantum Theory of Collective Phenomena
byG. L. Sewell
Rating: 0 out of 5 stars
0 ratings
Pseudorandomness and Cryptographic Applications
Ebook
Pseudorandomness and Cryptographic Applications
byMichael Luby
Rating: 0 out of 5 stars
0 ratings
Computational Number Theory and Modern Cryptography
Ebook
Computational Number Theory and Modern Cryptography
bySong Y. Yan
Rating: 3 out of 5 stars
3/5
Unitary Symmetry and Elementary Particles
Ebook
Unitary Symmetry and Elementary Particles
byD. B. Lichtenberg
Rating: 5 out of 5 stars
5/5
Markov Processes: An Introduction for Physical Scientists
Ebook
Markov Processes: An Introduction for Physical Scientists
byDaniel T. Gillespie
Rating: 1 out of 5 stars
1/5
Deep Neural Nets A Clear and Concise Reference
Ebook
Deep Neural Nets A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Deep Neural Nets Deep Learning A Complete Guide
Ebook
Deep Neural Nets Deep Learning A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Deep Neural Network ASICs The Ultimate Step-By-Step Guide
Ebook
Deep Neural Network ASICs The Ultimate Step-By-Step Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Systems Approaches in Computer Science and Mathematics: Proceedings of the International Congress on Applied Systems Research and Cybernetics
Ebook
Systems Approaches in Computer Science and Mathematics: Proceedings of the International Congress on Applied Systems Research and Cybernetics
byG.E. Lasker
Rating: 0 out of 5 stars
0 ratings
Algorithms, Graphs, and Computers
Ebook
Algorithms, Graphs, and Computers
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Microbiorobotics: Biologically Inspired Microscale Robotic Systems
Ebook
Microbiorobotics: Biologically Inspired Microscale Robotic Systems
byMinjun Kim
Rating: 0 out of 5 stars
0 ratings
Probabilistic Methods in Applied Mathematics: Volume 3
Ebook
Probabilistic Methods in Applied Mathematics: Volume 3
byA. T. Bharucha-Reid
Rating: 0 out of 5 stars
0 ratings
Finite Automata: Behavior and Synthesis
Ebook
Finite Automata: Behavior and Synthesis
byA. de Vries
Rating: 5 out of 5 stars
5/5
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
Ebook
Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood
bySupun Kamburugamuve
Rating: 0 out of 5 stars
0 ratings
Operator Methods in Quantum Mechanics
Ebook
Operator Methods in Quantum Mechanics
byMartin Schechter
Rating: 0 out of 5 stars
0 ratings
Radioastronomical Methods of Antenna Measurements
Ebook
Radioastronomical Methods of Antenna Measurements
byA.D. Kuzmin
Rating: 0 out of 5 stars
0 ratings
Estimation and Control of Large-Scale Networked Systems
Ebook
Estimation and Control of Large-Scale Networked Systems
byTong Zhou
Rating: 0 out of 5 stars
0 ratings
Artificial Neural Systems: Principle and Practice
Ebook
Artificial Neural Systems: Principle and Practice
byPierre Lorrentz
Rating: 0 out of 5 stars
0 ratings
Stochastic Analysis of Mixed Fractional Gaussian Processes
Ebook
Stochastic Analysis of Mixed Fractional Gaussian Processes
byYuliya Mishura
Rating: 0 out of 5 stars
0 ratings
Introduction to Reliable and Secure Distributed Programming
Ebook
Introduction to Reliable and Secure Distributed Programming
byChristian Cachin
Rating: 0 out of 5 stars
0 ratings

Technology & Engineering For You

Skip carousel

The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos,
Ebook
The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos,
byAlbert Rutherford
Rating: 4 out of 5 stars
4/5
Sneaky Uses for Everyday Things: How to Turn a Penny into a Radio, Make a Flood Alarm with an Aspirin, Change Milk into Plastic, Extract Water and Electricity from Thin Air, Turn on a TV with your Ring, and Other Amazing Feats
Ebook
Sneaky Uses for Everyday Things: How to Turn a Penny into a Radio, Make a Flood Alarm with an Aspirin, Change Milk into Plastic, Extract Water and Electricity from Thin Air, Turn on a TV with your Ring, and Other Amazing Feats
byCy Tymony
Rating: 3 out of 5 stars
3/5
The Art of War
Ebook
The Art of War
bySun Tzu
Rating: 4 out of 5 stars
4/5
The Art of War
Ebook
The Art of War
bySun Tsu
Rating: 4 out of 5 stars
4/5
A Night to Remember: The Sinking of the Titanic
Ebook
A Night to Remember: The Sinking of the Titanic
byWalter Lord
Rating: 4 out of 5 stars
4/5
The Right Stuff
Ebook
The Right Stuff
byTom Wolfe
Rating: 4 out of 5 stars
4/5
The 48 Laws of Power in Practice: The 3 Most Powerful Laws & The 4 Indispensable Power Principles
Ebook
The 48 Laws of Power in Practice: The 3 Most Powerful Laws & The 4 Indispensable Power Principles
byJon Waterlow
Rating: 5 out of 5 stars
5/5
Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time
Ebook
Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientific Problem of His Time
byDava Sobel
Rating: 4 out of 5 stars
4/5
The Big Book of Hacks: 264 Amazing DIY Tech Projects
Ebook
The Big Book of Hacks: 264 Amazing DIY Tech Projects
byDoug Cantor
Rating: 4 out of 5 stars
4/5
How to Disappear and Live Off the Grid: A CIA Insider's Guide
Ebook
How to Disappear and Live Off the Grid: A CIA Insider's Guide
byJohn Kiriakou
Rating: 0 out of 5 stars
0 ratings
Vanderbilt: The Rise and Fall of an American Dynasty
Ebook
Vanderbilt: The Rise and Fall of an American Dynasty
byAnderson Cooper
Rating: 4 out of 5 stars
4/5
Death in Mud Lick: A Coal Country Fight against the Drug Companies That Delivered the Opioid Epidemic
Ebook
Death in Mud Lick: A Coal Country Fight against the Drug Companies That Delivered the Opioid Epidemic
byEric Eyre
Rating: 4 out of 5 stars
4/5
The Big Book of Maker Skills: Tools & Techniques for Building Great Tech Projects
Ebook
The Big Book of Maker Skills: Tools & Techniques for Building Great Tech Projects
byChris Hackett
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Digital Minimalism - Summarized for Busy People: Choosing a Focused Life in a Noisy World: Based on the Book by Cal Newport
Ebook
Digital Minimalism - Summarized for Busy People: Choosing a Focused Life in a Noisy World: Based on the Book by Cal Newport
byGoldmine Reads
Rating: 4 out of 5 stars
4/5
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career
Ebook
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career
byScott H. Young
Rating: 4 out of 5 stars
4/5
80/20 Principle: The Secret to Working Less and Making More
Ebook
80/20 Principle: The Secret to Working Less and Making More
byPaul J. Stanley
Rating: 5 out of 5 stars
5/5
Electrical Engineering 101: Everything You Should Have Learned in School...but Probably Didn't
Ebook
Electrical Engineering 101: Everything You Should Have Learned in School...but Probably Didn't
byDarren Ashby
Rating: 5 out of 5 stars
5/5
The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026
Ebook
The Fast Track to Your Technician Class Ham Radio License: For Exams July 1, 2022 - June 30, 2026
byMichael Burnette, AF7KB
Rating: 5 out of 5 stars
5/5
Summary of Nicolas Cole's The Art and Business of Online Writing
Ebook
Summary of Nicolas Cole's The Art and Business of Online Writing
byIRB Media
Rating: 4 out of 5 stars
4/5
Logic Pro X For Dummies
Ebook
Logic Pro X For Dummies
byGraham English
Rating: 0 out of 5 stars
0 ratings
The Basics of Bitcoins and Blockchains: An Introduction to Cryptocurrencies and the Technology that Powers Them (Cryptography, Derivatives Investments, Futures Trading, Digital Assets, NFT)
Ebook
The Basics of Bitcoins and Blockchains: An Introduction to Cryptocurrencies and the Technology that Powers Them (Cryptography, Derivatives Investments, Futures Trading, Digital Assets, NFT)
byAntony Lewis
Rating: 4 out of 5 stars
4/5
Selfie: How We Became So Self-Obsessed and What It's Doing to Us
Ebook
Selfie: How We Became So Self-Obsessed and What It's Doing to Us
byWill Storr
Rating: 4 out of 5 stars
4/5
The CIA Lockpicking Manual
Ebook
The CIA Lockpicking Manual
byCentral Intelligence Agency
Rating: 5 out of 5 stars
5/5
Understanding Media: The Extensions of Man
Ebook
Understanding Media: The Extensions of Man
byMarshall McLuhan
Rating: 4 out of 5 stars
4/5
My Inventions: The Autobiography of Nikola Tesla
Ebook
My Inventions: The Autobiography of Nikola Tesla
byNikola Tesla
Rating: 4 out of 5 stars
4/5
Summary of Empire of Pain: by Patrick Radden Keefe - The Secret History of the Sackler Dynasty - A Comprehensive Summary
Ebook
Summary of Empire of Pain: by Patrick Radden Keefe - The Secret History of the Sackler Dynasty - A Comprehensive Summary
byAlexander Cooper
Rating: 3 out of 5 stars
3/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
The Wuhan Cover-Up: And the Terrifying Bioweapons Arms Race
Ebook
The Wuhan Cover-Up: And the Terrifying Bioweapons Arms Race
byRobert F. Kennedy, Jr.
Rating: 0 out of 5 stars
0 ratings
Rust: The Longest War
Ebook
Rust: The Longest War
byJonathan Waldman
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
Podcast episode
Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484: Today we conclude our 2021 ICLR coverage joined by Konstantin Rusch, a PhD Student at ETH Zurich. In our conversation with Konstantin, we explore his recent papers, titled coRNN and uniCORNN respectively, which focus on a novel architecture of...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
Podcast episode
One Shot and Metric Learning - Quadruplet Loss (Machine Learning Dojo)
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
Podcast episode
[MINI] Long Short Term Memory: Thanks to our sponsor brilliant.org/dataskeptics A Long Short Term Memory (LSTM) is a neural unit, often used in Recurrent Neural Network (RNN) which attempts to provide the network the capacity to store information for longer periods of time. An...
byData Skeptic
0 ratings
0% found this document useful
A New Map Traces the Limits of Computation: A major advance in computational complexity reveals deep connections between the classes of problems that computers can — and can’t — possibly do.
Podcast episode
A New Map Traces the Limits of Computation: A major advance in computational complexity reveals deep connections between the classes of problems that computers can — and can’t — possibly do.
byQuanta Science Podcast
0 ratings
0% found this document useful
Scott Aaronson | Quantum Computing: Dismantling the Hype: Scott Aaronson is a professor of computer science at University of Texas at Austin and director of its Quantum Information Center. Previously he received his PhD at UC Berkeley and was a faculty member at MIT in Electrical Engineering and Computer Sc...
Podcast episode
Scott Aaronson | Quantum Computing: Dismantling the Hype: Scott Aaronson is a professor of computer science at University of Texas at Austin and director of its Quantum Information Center. Previously he received his PhD at UC Berkeley and was a faculty member at MIT in Electrical Engineering and Computer Sc...
byThe Cartesian Cafe
0 ratings
0% found this document useful
Differential Privacy with Dr. Yun Lu: Differential privacy provides a mathematical definition of what privacy is in the context of user data. In lay terms, a data set is said to be differentially private if the existence or lack of existence of a particular piece of data doesn't impact the e...
Podcast episode
Differential Privacy with Dr. Yun Lu: Differential privacy provides a mathematical definition of what privacy is in the context of user data. In lay terms, a data set is said to be differentially private if the existence or lack of existence of a particular piece of data doesn't impact the e...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
Podcast episode
A Programmer's Introduction to Mathematics with Jeremy Kun: Like Programming, Mathematics has language and culture. Jeremy Kun has written A Programmer's Introduction to Mathematics as a way to bridge these two worlds and make the power and magic of mathematics available and understandable to programmers everywhere.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
#54 Gary Marcus and Luis Lamb - Neurosymbolic models
Podcast episode
#54 Gary Marcus and Luis Lamb - Neurosymbolic models
byMachine Learning Street Talk (MLST)
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
Podcast episode
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
byCppCast
0 ratings
0% found this document useful
Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks: Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks Did you know that it's possible to hide malware in neural networks? Actually, you can hide malware in many statistical models. This is the subject of two recently-published papers (aptly ti...
Podcast episode
Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks: Jack Fitzsimons | Evil Models: Hiding Malware in Neural Networks Did you know that it's possible to hide malware in neural networks? Actually, you can hide malware in many statistical models. This is the subject of two recently-published papers (aptly ti...
byData & Science with Glen Wright Colopy
0 ratings
0% found this document useful
#13 Fake News Detection with Data Science: Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech...
Podcast episode
#13 Fake News Detection with Data Science: Fake news: how can data science and deep learning be leveraged to detect it? Come on a journey with Mike Tamir, Head of Data Science at Uber ATG, who is building out a data science product that classifies text as news, editorial, satire, hate speech...
byDataFramed
100%
100% found this document useful
A Chaos Engineering & Jeli Sandwich with Nora Jones: Nora Jones is the founder and CEO at Jeli, makers of an incident analysis platform that leverages data to recommend productive solutions to the problems at hand. Before this role, she was Head of Chaos Engineering and Human Factors at Slack, a senior soft
Podcast episode
A Chaos Engineering & Jeli Sandwich with Nora Jones: Nora Jones is the founder and CEO at Jeli, makers of an incident analysis platform that leverages data to recommend productive solutions to the problems at hand. Before this role, she was Head of Chaos Engineering and Human Factors at Slack, a senior soft
byScreaming in the Cloud
0 ratings
0% found this document useful
Celestia’s Building the Multi-Chain Universe with Nick White | Alpha Leak: Nick White is the COO of Celestia Labs. Celestia is the first modular blockchain network. If you have no idea what that is, you’re in luck! Celestia wants it to be as easy to deploy blockchains as it is smart contracts. Nick and David cover all...
Podcast episode
Celestia’s Building the Multi-Chain Universe with Nick White | Alpha Leak: Nick White is the COO of Celestia Labs. Celestia is the first modular blockchain network. If you have no idea what that is, you’re in luck! Celestia wants it to be as easy to deploy blockchains as it is smart contracts. Nick and David cover all...
byBankless
0 ratings
0% found this document useful
Data Science with Juliet Hougland and Michelle Casbon: This week we dive into data science methodology, applications, platforms and all the things with Juliet Hougland from Stitch Fix and Michelle Casbon from Google Cloud.
Podcast episode
Data Science with Juliet Hougland and Michelle Casbon: This week we dive into data science methodology, applications, platforms and all the things with Juliet Hougland from Stitch Fix and Michelle Casbon from Google Cloud.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
Podcast episode
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
027 | Big Data Skepticism w/ Kate Crawford
Podcast episode
027 | Big Data Skepticism w/ Kate Crawford
byData Stories
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
044 | Tamara Munzner
Podcast episode
044 | Tamara Munzner
byData Stories
0 ratings
0% found this document useful
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
Podcast episode
Graph Analytic Systems with Zachary Hanif - TWiML Talk #188: In this, the final episode of our Strata Data Conference series, we’re joined by Zachary Hanif, Director of Machine Learning at Capital One’s Center for Machine Learning. Zach led a session at Strata called “Network effects: Working with modern...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Visions of Future Physics: Nima Arkani-Hamed is championing a campaign to build the world’s largest particle collider, even as he pursues a new vision of the laws of nature.
Podcast episode
Visions of Future Physics: Nima Arkani-Hamed is championing a campaign to build the world’s largest particle collider, even as he pursues a new vision of the laws of nature.
byQuanta Science Podcast
0 ratings
0% found this document useful
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
Podcast episode
Recurrent Neural Nets: This week, we're doing a crash course in recurren…
byLinear Digressions
0 ratings
0% found this document useful
Leslie Lamport - in partnership with ACM Bytecast: In this collaboration with ACM ByteCast and Hanselminutes, Scott welcomes 2013 ACM A.M. Turing Award laureate Leslie Lamport of Microsoft Research, best known for his seminal work in distributed and concurrent systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual. Among his many honors and recognitions, Lamport is a Fellow of ACM and has received the IEEE Emanuel R. Piore Award, the Dijkstra Prize, and the IEEE John von Neumann Medal. Leslie shares his journey into computing, which started out as something he only did in his spare time as a mathematician. Scott and Leslie discuss the differences and similarities between computer science and software engineering, the math involved in Leslie’s high-level temporal logic of actions (TLA), which can help solve the famous Byzantine Generals Problem, and the algorithms Leslie himself has created. He also reflects on how the building
Podcast episode
Leslie Lamport - in partnership with ACM Bytecast: In this collaboration with ACM ByteCast and Hanselminutes, Scott welcomes 2013 ACM A.M. Turing Award laureate Leslie Lamport of Microsoft Research, best known for his seminal work in distributed and concurrent systems, and as the initial developer of the document preparation system LaTeX and the author of its first manual. Among his many honors and recognitions, Lamport is a Fellow of ACM and has received the IEEE Emanuel R. Piore Award, the Dijkstra Prize, and the IEEE John von Neumann Medal. Leslie shares his journey into computing, which started out as something he only did in his spare time as a mathematician. Scott and Leslie discuss the differences and similarities between computer science and software engineering, the math involved in Leslie’s high-level temporal logic of actions (TLA), which can help solve the famous Byzantine Generals Problem, and the algorithms Leslie himself has created. He also reflects on how the building
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
#12 Data Science, Nuclear Engineering and the Open Source: Nuclear engineering, data science and open source software development: where do these all intersect? To find out, join Hugo and Katy Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illi...
Podcast episode
#12 Data Science, Nuclear Engineering and the Open Source: Nuclear engineering, data science and open source software development: where do these all intersect? To find out, join Hugo and Katy Huff, Assistant Professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illi...
byDataFramed
0 ratings
0% found this document useful
Post-Quantum Cryptography with Nick Sullivan and Adam Langley: Adam Langley and Nick Sullivan help us understand post-quantum cryptography and what security research looks like for quantum computing.
Podcast episode
Post-Quantum Cryptography with Nick Sullivan and Adam Langley: Adam Langley and Nick Sullivan help us understand post-quantum cryptography and what security research looks like for quantum computing.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
#104 - Leonard Susskind: Leonard Susskind is a professor of theoretical physics at Stanford University and he’s regarded as one of the fathers of string theory. 
Podcast episode
#104 - Leonard Susskind: Leonard Susskind is a professor of theoretical physics at Stanford University and he’s regarded as one of the fathers of string theory. 
byY Combinator
0 ratings
0% found this document useful
Create Interactive Maps & Geospatial Data Visualizations With Python
Podcast episode
Create Interactive Maps & Geospatial Data Visualizations With Python
byThe Real Python Podcast
0 ratings
0% found this document useful
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
Podcast episode
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
Chris Bleakley, "Poems That Solve Puzzles: The History and Science of Algorithms" (Oxford UP, 2020): An interview with Chris Bleakley
Podcast episode
Chris Bleakley, "Poems That Solve Puzzles: The History and Science of Algorithms" (Oxford UP, 2020): An interview with Chris Bleakley
byNew Books in Mathematics
0 ratings
0% found this document useful
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
Podcast episode
Deploying Edge and Embedded AI Systems with Heather Gorr - #655
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful

Skip carousel

Quantum Entanglement
Techfastly
Article
Quantum Entanglement
Oct 1, 2021
4 min read
Silq Is An Easier Quantum Programming Language
Futurity
Article
Silq Is An Easier Quantum Programming Language
Jun 22, 2020
3 min read
The Fundamental Limits of Machine Learning
Nautilus
Article
The Fundamental Limits of Machine Learning
Aug 14, 2017
5 min read
Software Pools Server Memory for Faster Networks
Futurity
Article
Software Pools Server Memory for Faster Networks
May 31, 2017
A group of engineers has created open-source software that allows for memory sharing among servers in a computer network, allowing for more efficient use of memory and even faster computer operations. For decades, operators of large computer clusters
2 min read
Quantum Entanglement Could Take GPS To The Next Level
Futurity
Article
Quantum Entanglement Could Take GPS To The Next Level
Apr 20, 2020
3 min read
Mining BTC with a GPU: NiceHash
APC
Article
Mining BTC with a GPU: NiceHash
May 17, 2021
1 min read
Stacked 2D Materials Reveal Exotic Quantum States
Futurity
Article
Stacked 2D Materials Reveal Exotic Quantum States
Jul 1, 2019
3 min read
A Simple Visual Proof of a Powerful Idea in Graph Theory
Nautilus
Article
A Simple Visual Proof of a Powerful Idea in Graph Theory
Sep 8, 2017
2 min read
How the Slowest Computer Programs Illuminate Math’s Fundamental Limits
Quanta
Article
How the Slowest Computer Programs Illuminate Math’s Fundamental Limits
Dec 10, 2020
6 min read
Why a Hedge Fund Started a Video Game Competition
Nautilus
Article
Why a Hedge Fund Started a Video Game Competition
Nov 30, 2017
There’s a weird way in which a hedge fund is a confluence of everything. There’s the money of course—Two Sigma, located in lower Manhattan, manages over $50 billion, an amount that has grown 600 percent in 6 years and is roughly the size of the econo
9 min read
Artificial Intelligence Is Already Weirdly Inhuman: What kind of world is our code creating?
Nautilus
Article
Artificial Intelligence Is Already Weirdly Inhuman: What kind of world is our code creating?
Aug 6, 2015
13 min read
The Lightning Seeds
Linux Format
Article
The Lightning Seeds
Apr 6, 2021
1 min read
Deep Learning
TechLife News
Article
Deep Learning
Dec 28, 2017
5 min read
What Question Will You Be Remembered For?
Nautilus
Article
What Question Will You Be Remembered For?
Feb 7, 2018
3 min read
Quantum Supremacy Is Coming: Here’s What You Should Know
Quanta
Article
Quantum Supremacy Is Coming: Here’s What You Should Know
Jul 18, 2019
7 min read
The Search For Wormholes
All About Space
Article
The Search For Wormholes
Aug 11, 2022
11 min read
Deep Learning Tests Billions Of Graphene Combos In 2 Days
Futurity
Article
Deep Learning Tests Billions Of Graphene Combos In 2 Days
Apr 11, 2019
2 min read
The Return Of GPU Computing
APC
Article
The Return Of GPU Computing
May 16, 2022
5 min read
The Dawn of a Global Cryptocurrency
Rotman Management
Article
The Dawn of a Global Cryptocurrency
Jan 1, 2020
AS OF JUNE 18, 2019, Facebook’s closely guarded cryptocurrency project was no longer a secret. That’s the day the Creative Destruction Lab (CDL) announced that it would be joining Facebook and 26 other organizations as a founding partner of the Libra
7 min read
The Verdict
Linux Format
Article
The Verdict
Feb 7, 2023
2 min read
Charities See More Crypto Donations. Who Is Benefiting?
TechLife News
Article
Charities See More Crypto Donations. Who Is Benefiting?
Nov 20, 2021
4 min read
The Future Of Home Networking
APC
Article
The Future Of Home Networking
Feb 22, 2021
10 min read
Terminal Tools
Linux Format
Article
Terminal Tools
Mar 7, 2023
Strap in for a thrill ride! Bash, short for the Bourne-Again Shell, is the GNU Project’s Unix shell and command language that was originally released in 1989 by Chet Ramey. It was the free software replacement for the Version 7 Unix Borne shell (sh)
2 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
The Fundamental Limits of Machine Learning
Nautilus
Article
The Fundamental Limits of Machine Learning
Sep 20, 2016
5 min read
Visionary Mathematician Vladimir Voevodsky Dies at 51
Quanta
Article
Visionary Mathematician Vladimir Voevodsky Dies at 51
Oct 11, 2017
3 min read
How to Build a Probability Microscope: The surprising mathematics of the extremely rare.
Nautilus
Article
How to Build a Probability Microscope: The surprising mathematics of the extremely rare.
Jan 26, 2017
If the rumors are true, 20th Century Fox will release a remake of the 1966 science-fiction film Fantastic Voyage in the next year or two. The conceit behind the film is that its protagonists are shrunk down and injected into the human body, through w
9 min read
Sum-of-Three-Cubes Problem Solved for ‘Stubborn’ Number 33
Quanta
Article
Sum-of-Three-Cubes Problem Solved for ‘Stubborn’ Number 33
Mar 26, 2019
4 min read
Computer Scientists Discover Limits of Major Research Algorithm
Quanta
Article
Computer Scientists Discover Limits of Major Research Algorithm
Aug 17, 2021
1 min read
There’s A New, Cheaper Way To Make Graphene
Futurity
Article
There’s A New, Cheaper Way To Make Graphene
Apr 9, 2018
Researchers have developed an economical and industrially viable strategy to produce graphene. The new technique addresses the long-standing challenge of an efficient process for large-scale production of graphene, and paves the way for sustainable s
1 min read

Related categories

Skip carousel

Reviews for Machine Learning for Future Fiber-Optic Communication Systems

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Machine Learning for Future Fiber-Optic Communication Systems - Alan Pak Tao Lau

Chapter One: Introduction to machine learning techniques: An optical communication's perspective

Faisal Nadeem Khana; Qirui Fanb; Chao Luc; Alan Pak Tao Laub aTsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China

bDepartment of Electrical Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China

cDepartment of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China

Abstract

Machine learning (ML) has revolutionized a number of science and engineering disciplines over the past few years. It is also being considered as a new direction of innovation to transform future fiber-optic communication systems. Recently, there has been an increasing amount of research in both industry and academia to embed and benefit from ML-based frameworks in various aspects of optical communications and networking and state-of-the-art results have already been achieved in many cases. However, in order to fathom real potential of ML in fiber-optic communication systems, it is imperative to have a basic understanding of fundamental ML concepts. In this chapter, we will describe the reasons behind recent popularity of ML paradigm in optical networks and why/where/how it can play a decisive role. We will discuss mathematical foundations of several key conventional ML techniques as well as modern deep learning (DL) methods from communication theory and signal processing perspectives and identify the kind of problems in optical communications and networking where they can be particularly helpful. The future role of ML as an enabling technology for next-generation intelligent and autonomous software-defined optical networks will be highlighted. A brief discussion on ML tools along with some useful links for online resources will also be provided for the sake of completion.

Keywords

Machine learning; Deep learning; Artificial intelligence; Network intelligence; Autonomous networks

1.1 Introduction

Artificial intelligence (AI) makes use of computers/machines to perform cognitive tasks, i.e., the ones requiring knowledge, perception, learning, reasoning, understanding and other similar cognitive abilities. An AI system is expected to do three things: (i) store knowledge, (ii) apply the stored knowledge to solve problems, and (iii) acquire new knowledge via experience. The three key components of an AI system include knowledge representation, machine learning (ML), and automated reasoning. ML is a branch of AI which is based on the idea that patterns and trends in a given data set can be learned automatically through algorithms. The learned patterns and structures can then be used to make decisions or predictions on some other data in the system of interest [1].

ML is not a new field as ML-related algorithms exist at least since the 1970s. However, tremendous increase in computational power over the last decade, recent groundbreaking developments in theory and algorithms surrounding ML, and easy access to an overabundance of all types of data worldwide (thanks to three decades of Internet growth) have all contributed to the advent of modern deep learning (DL) technology, a class of advanced ML approaches that displays superior performance in an ever-expanding range of domains. In the near future, ML is expected to power numerous aspects of modern society such as web searches, computer translation, content filtering on social media networks, healthcare, finance, and laws [2].

ML is an interdisciplinary field which shares common threads with the fields of statistics, optimization, information theory, and game theory. Most ML algorithms perform one of the following two types of pattern recognition tasks as shown in Fig. 1.1. In the first type, the algorithm tries to find some functional description of given data with the aim of predicting values for new inputs, i.e., regression problem. The second type attempts to find suitable decision boundaries to distinguish different data classes, i.e., classification problem [3], which is more commonly referred to as clustering problem in ML literature. ML techniques are well known for performing exceptionally well in scenarios in which it is too hard to explicitly describe the problem's underlying physics and mathematics.

Figure 1.1 Given a data set, ML attempts to solve two main types of problems: (a) functional description of given data and (b) classification of data by deriving appropriate decision boundaries. (c) Laser frequency offset and phase estimation for quadrature phase-shift keying (QPSK) systems by raising the signal phase ϕ to the 4 th power and performing regression to estimate the slope and intercept. (d) Decision boundaries for a received QPSK signal distribution.

Optical communication researchers are no strangers to regressions and classifications. Over the last decade, coherent detection and digital signal processing (DSP) techniques have been the cornerstone of optical transceivers in fiber-optic communication systems. Advanced modulation formats such as 16 quadrature amplitude modulation (16-QAM) and above together with DSP-based estimation and compensation of various transmission impairments such as laser phase noise have become the key drivers of innovation. In this context, parameter estimation and symbol detection are naturally regression and classification problems, respectively, as demonstrated by examples in Fig. 1.1(c) and (d). Currently, most of these parameter estimation and decision rules are derived from probability theory and adequate understanding of the problem's underlying physics. As high-capacity optical transmission links are increasingly being limited by transmission impairments such as fiber nonlinearity, explicit statistical characterizations of inputs/outputs become difficult. An example of 16-QAM multi-span dispersion-free transmissions in the presence of fiber nonlinearity and inline amplifier noise is shown in Fig. 1.2(a). The maximum likelihood decision boundaries in this case are curved and virtually impossible to derive analytically. Consequently, there has been an increasing amount of research on the application of ML techniques for fiber nonlinearity compensation (NLC). Another related area where ML flourishes is short-reach direct detection systems that are affected by chromatic dispersion (CD), laser chirp and other transceiver components imperfections, which render the overall communication system hard to analyze.

Figure 1.2 (a) Probability distribution and corresponding optimal decision boundaries for received 16-QAM symbols in the presence of fiber nonlinearity are hard to characterize analytically. (b) Probability distribution of received 64-QAM signal amplitudes. The distribution can be used to monitor optical signal-to-noise ratio (OSNR) and identify modulation format. However, this task will be extremely difficult if one relies on analytical modeling.

Optical performance monitoring (OPM) is another area with an increasing amount of ML-related research. OPM is the acquisition of real-time information about different channel impairments ubiquitously across the network to ensure reliable network operation and/or improve network capacity. Often, OPM is cost-limited so that one can only employ simple hardware components and obtain partial signal features to monitor different channel parameters such as OSNR, optical power, CD, etc. [4][5]. In this case, the mapping between input and output parameters is intractable from underlying physics/mathematics, which in turn warrants ML. An example of OSNR monitoring using received signal amplitudes distribution is shown in Fig. 1.2(b).

Besides physical layer-related developments, optical network architectures and operations are also undergoing major paradigm shifts under the software-defined networking (SDN) framework and are increasingly becoming complex, transparent and dynamic in nature [6]. One of the key features of SDNs is that they can assemble large amounts of data and perform so-called big data analysis to estimate the network states as shown in Fig. 1.3. This in turn can enable (i) adaptive provisioning of resources such as wavelength, modulation format, routing path, etc., according to dynamic traffic patterns and (ii) advance discovery of potential components faults so that preventative maintenance can be performed to avoid major network disruptions. The data accumulated in SDNs can span from physical layer (e.g., OSNR of a certain channel) to network layer (e.g., client-side speed demand) and obviously have no underlying physics to explain their interrelationships. Extracting patterns from such cross-layer parameters naturally demands the use of data-driven algorithms such as ML.

Figure 1.3 Dynamic network resources allocation and link capacity maximization via cross-layer optimization in SDNs.

This chapter is intended for the researchers in optical communications with a basic background in probability theory, communication theory and standard DSP techniques used in fiber-optic communications such as matched filters, maximum likelihood/maximum a posteriori (MAP) detection, equalization, adaptive filtering, etc. In this regard, a large class of ML techniques such as Kalman filtering, Bayesian learning, hidden Markov models (HMMs), etc., are actually standard statistical signal processing methods, and hence will not be covered here. We will first introduce supervised ML techniques such as artificial neural networks (ANNs), support vector machines (SVMs) and K-nearest neighbors (KNN) from communication theory and signal processing perspectives. This will be followed by popular unsupervised ML methods like K-means clustering, expectation-maximization (EM) algorithm, principal component analysis (PCA) and independent component analysis (ICA). Next, we will address reinforcement learning (RL) approach. Finally, more recent DL techniques such as deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs) and generative adversarial networks (GANs) will be discussed. The analytical derivations presented in this chapter are slightly different from those in standard introductory ML text to better align with the fields of communications and signal processing. By discussing ML through the language of communications and DSP, we hope to provide a more intuitive understanding of ML, its relation to optical communications and networking, and why/where/how it can play a unique role in specific areas of optical communications and networking.

The rest of the chapter is organized as follows. In Section 1.2, we will illustrate the fundamental conditions that warrant the use of a neural network and discuss the technical details of ANN, SVM and KNN algorithms. Section 1.3 will describe a range of basic unsupervised ML techniques while Section 1.4 will briefly discuss RL approach. Section 1.5 will be devoted to more recent DL algorithms. Section 1.6 will describe the future role of ML in optical communications and networking. Links for online resources and codes for standard ML algorithms will be provided in Section 1.7. Section 1.8 will conclude the chapter.

1.2 Supervised learning

What are the conditions that need ML for classification? Fig. 1.4 shows three scenarios with 2-dimensional (2D) data and their respective class labels depicted as ‘o’ and ‘×’ in the figure. In the first case, classifying the data is straightforward: the decision rule is to see whether or is greater or less than 0 where is the decision function as shown. The second case is slightly more complicated as the decision boundary is a slanted straight line. However, a simple rotation and shifting of the input, i.e., will map one class of data to below zero and the other class above. Here, the rotation and shifting are described by matrix W and vector b, respectively. This is followed by the decision function . The third case is even more complicated. The region for the ‘green’ (mid gray in print version) class depends on the outputs of the ‘red’ (dark gray in print version) and ‘blue’ (light gray in print version) decision boundaries. Therefore, one will need to implement an extra decision step to label the ‘green’ region. The graphical representation of this ‘decision of decisions’ algorithm is the simplest form of an ANN [8]. The intermediate decision output units are known as hidden neurons and they form the hidden layer.

Figure 1.4 The complexity of classification problems depends on how the different classes of data are distributed across the variable space [7].

1.2.1 Artificial neural networks (ANNs)

Let

be a set of L input-output pairs of M and K dimensional column vectors. ANNs are information processing systems comprising of an input layer, one or more hidden layers, and an output layer. The structure of a single hidden layer ANN with M input, H hidden and K output neurons is shown in Fig. 1.5. Neurons in two adjacent layers are interconnected where each connection has a variable weight assigned. Such ANN architecture is the simplest and most commonly-used one [8]. The number of neurons M in the input layer is determined by the dimension of the input data vectors . The hidden layer enables the modeling of complex relationships between the input and output parameters of an ANN. There are no fixed rules for choosing the optimum number of neurons for a given hidden layer and the optimum number of hidden layers in an ANN. Typically, the selection is made via experimentation, experience and other prior knowledge of the problem. These are known as the hyperparameters of an ANN. For regression problems, the dimension K of the vectors depends on the actual problem nature. For classification problems, K typically equals to the number of class labels such that if a data point belongs to class k, where the ‘1’ is located at the kth position. This is called one-hot encoding. The ANN output will naturally have the same dimension as and the mapping between input and can be expressed as

(1.1)

where are the activation functions for the hidden and output layer neurons, respectively. and are matrices containing the weights of connections between the input and hidden layer neurons and between the hidden and output layer neurons, respectively, while and are the bias vectors for the hidden and output layer neurons, respectively. For a vector of length K, is typically an element-wise nonlinear function such as the sigmoid function

(1.2)

As for the output layer neurons, is typically chosen to be a linear function for regression problems. In classification problems, one will normalize the output vector using the softmax function, i.e.,

(1.3)

where

(1.4)

The softmax operation ensures that the ANN outputs conform to a probability distribution for reasons we will discuss below.

Figure 1.5 Structure of a single hidden layer ANN with input vector x ( l ), target vector y ( l ) and actual output vector o ( l ).

To train the ANN is to optimize all the parameters such that the difference between the actual ANN outputs o and the target outputs y is minimized. One commonly-used objective function (also called loss function in ML literature) to optimize is the mean square error (MSE)

(1.5)

Like most optimization procedures in practice, gradient descent is used instead of full analytical optimization. In this case, the parameter estimates for n+1th iteration are given by

(1.6)

where the step size α is known as the learning rate. Note that for computational efficiency, one can use a single input-output pair instead of all the L pairs for each iteration in Eq. (1.6). This is known as stochastic gradient descent (SGD) which is the standard optimization method used in common adaptive DSP such as constant modulus algorithm (CMA) and least mean squares (LMS) algorithm. As a trade-off between computational efficiency and accuracy, one can use a mini-batch of data

of size P for the nth iteration instead. This can reduce the stochastic nature of SGD and improve accuracy. When all the data set has been used, the update algorithm will have completed one epoch. However, it is often the case that one epoch equivalent of updates is not enough for all the parameters to converge to their optimal values. Therefore, one can reuse the data set and the algorithm goes through the 2nd epoch for further parameter updates. There is no fixed rule to determine the number of epochs required for convergence [9].

The update algorithm is comprised of following main steps: (i) Model initialization: All the ANN weights and biases are randomly initialized, e.g., by drawing random numbers from a normal distribution with zero mean and unit variance; (ii) Forward propagation: In this step, the inputs x are passed through the network to generate the outputs o using Eq. (1.1). The input can be a single data point, a mini-batch or the complete set of L inputs. This step is named so because the computation flow is in the natural forward direction, i.e., starting from the input, passing through the network, and going to the output; (iii) Backward propagation and weights/biases update: For simplicity, let us assume SGD using 1 input-output pair for the n+1th iteration, sigmoid activation function for the hidden layer neurons and linear activation function for the output layer neurons such that . The parameters will be updated first followed by . Since and , the corresponding update equations are

(1.7)

where and denote the kth element of vectors and , respectively. In this case, is the Jacobian matrix in which the row and column is the derivative of the mth element of with respect to the jth element of . Also, the row and column of the matrix denotes the derivative of with respect to the row and column of . Interested readers are referred to [10] for an overview of matrix calculus. Since , is simply the identity matrix. For , its row is equal to (where denotes transpose) and is zero otherwise. Eq. (1.7) can be simplified as

(1.8)

With the updated and , one can calculate

(1.9)

Since the derivative of the sigmoid function is given by where ∘ denotes element-wise multiplication and 1 denotes a column vector of 1's with the same length as z,

(1.10)

where denotes a diagonal matrix with diagonal vector z. Next,

(1.11)

where is the row and column entry of . For , its row is and is zero otherwise. Eq. (1.11) can be simplified as

(1.12)

where is the row of . Since the parameters are updated group by group starting from the output layer back to the input layer, this algorithm is called back-propagation (BP) algorithm (Not to be confused with the digital back-propagation (DBP) algorithm for fiber NLC). The weights and biases are continuously updated until convergence.

For the learning and performance evaluation of an ANN, the data sets are typically divided into three groups: training, validation and testing. The training data set is used to train the ANN. Clearly, a larger training data set is better since the more data an ANN sees, the more likely it is that it has encountered examples of all possible types of input. However, the learning time also increases with the training data size. There is no fixed rule for determining the minimum amount of training data needed since it often depends on the given problem. A rule of thumb typically used is that the size of the training data should be at least 10 times the total number of weights [1]. The purpose of the validation data set is to keep a check on how well the ANN is doing as it learns since during training there is an inherent danger of over-fitting (or over-training). In this case, instead of finding the underlying general decision boundaries as shown in Fig. 1.6(a), the ANN tends to perfectly fit the training data (including any noise components of them) as shown in Fig. 1.6(b). This in turn makes the ANN customized for a few data points and reduces its generalization capability, i.e., its ability to make predictions about new inputs which it has never seen before. The over-fitting problem can be avoided by constantly examining ANN's error performance during the course of training against an independent validation data set and enforcing an early termination of the training process if the validation data set gives large errors. Typically, the size of the validation data set is just a fraction (∼ 1/3) of that of training data set. Finally, the testing data set evaluates the performance of the trained ANN. Note that an ANN may also be subjected to under-fitting problem which occurs when it is under-trained and thus unable to perform at an acceptable level as shown in Fig. 1.6(c). Under-fitting can again lead to poor ANN generalization. The reasons for under-fitting include insufficient training time or number of iterations, inappropriate choice of activation functions, and/or insufficient number of hidden neurons used.

Figure 1.6 Example illustrating ANN learning processes with (a) no over-fitting or under-fitting, (b) over-fitting, and (c) under-fitting.

It should be noted that given an adequate number of hidden neurons, proper nonlinearities, and appropriate training, an ANN with one hidden layer has great expressive power and can approximate any continuous function in principle. This is called the universal approximation theorem [11]. One can intuitively appreciate this characteristic by considering the classification problem in Fig. 1.7. Since each hidden neuron can be represented as a straight-line decision boundary, any arbitrary curved boundary can be approximated by a collection of hidden neurons in a single hidden layer ANN. This important property of an ANN enables it to be applied in many diverse applications.

Figure 1.7 Decision boundaries for appropriate data classification obtained using an ANN.

1.2.2 Choice of activation functions

The choice of activation functions has a significant effect on the training dynamics and final ANN performance. Historically, sigmoid and hyperbolic tangent have been the most commonly-used nonlinear activation functions for hidden layer neurons. However, the rectified linear unit (ReLU) activation function has become the default choice among ML community in recent years. The above-mentioned three functions are given by

(1.13)

and their plots are shown in Fig. 1.8. Sigmoid and hyperbolic tangent are both differentiable. However, a major problem with these functions is that their gradients tend to zero as |z| becomes large and thus the activation output gets saturated. In this case, the weights and biases updates for a certain layer will be minimal, which in turn will slow down the weights and biases updates for all the preceding layers. This is known as vanishing gradient problem and is particularly an issue when training ANNs with large number of hidden layers. To circumvent this problem, ReLU was proposed since its gradient does not vanish as z increases. Note that although ReLU is not differentiable at , it is not a problem in practice since the probability of having an entry exactly equal to 0 is generally very low. Also, as the ReLU function and its derivative are 0 for , around 50% of hidden neurons' outputs will be 0, i.e., only half of total neurons will be active when the ANN weights and biases are randomly initialized. It has been found that such sparsity of activation not only reduces computational complexity (and thus training time) but also leads to better ANN performance [12]. Note that while using ReLU activation function, the ANN weights and biases are often initialized using the method proposed by He et al. [13]. On the other hand, Xavier initialization technique [14] is more commonly employed for the hyperbolic tangent activation function. These heuristics-based approaches initialize the weights and biases by drawing random numbers from a truncated normal distribution (instead of standard normal distribution) with variance which depends on the size of the previous ANN layer.

Figure 1.8 Common activation functions used in ANNs.

1.2.3 Choice of loss functions

The choice of loss function E has a considerable effect on the performance of an ANN. The MSE is a common choice in adaptive signal processing and other DSP in telecommunications. For regression problems, MSE works well in general and is also easy to compute. On the other hand, for classification problems, cross-entropy loss function defined as

(1.14)

is often used instead of MSE [11]. The cross-entropy function can be interpreted by viewing the softmax output and the class label with one-hot encoding as probability distributions. In this case, has zero entropy and one can subtract the zero-entropy term from Eq. (1.14) to obtain

(1.15)

which is simply the Kullback-Leibler (KL) divergence between the distributions and averaged over all input-output pairs. Therefore, the cross-entropy is in fact a measure of the similarity between ANN outputs and the class labels. The cross-entropy function also leads to simple gradient updates as the logarithm cancels out the exponential operation inherent in the softmax calculation, thus leading to faster ANN training. Appendix 1.A shows the derivation of BP algorithm for the single hidden layer ANN in Fig. 1.5 with cross-entropy loss function and softmax activation function for the output layer neurons.

In many applications, a common approach to prevent over-fitting is to reduce the magnitude of the weights as large weights produce high curvatures which make the decision boundaries overly complicated. This can be achieved by including an extra regularization term in the loss function, i.e.,

(1.16)

where is the sum of squared element-wise weights. The parameter λ, called regularization coefficient, defines the relative importance of the training error E and the regularization term. The regularization term thus discourages weights from reaching large values and this often results in significant improvement in ANN's generalization ability [15].

1.2.4 Support vector machines (SVMs)

In many classification tasks, it often happens that the two data categories are not easily separable with straight lines or planes in the original variable space. SVM is an ML technique that preprocesses the input data and transforms it into (sometimes) a higher-dimensional space , called feature space, where the data belonging to two different classes can be separated easily by a simple straight plane decision boundary or hyperplane [16]. An example is shown in Fig. 1.9 where one class of data lies within a circle of radius 3 and the other class lies outside. When transformed into the feature space , the two data classes can be separated simply by the hyperplane .

Figure 1.9 Example showing how a linearly inseparable problem (in the original 2D data space) can undergo a nonlinear transformation and becomes a linearly separable one in the 3-dimensional (3D) feature space.

Let us first focus on finding the right decision hyperplane after the transformation into feature space as shown in Fig. 1.10(a). The right hyperplane should have the largest (and also equal) distance from the borderline points of the two data classes. This is graphically illustrated in Fig. 1.10(b). Had the data points been generated from two probability density functions (PDFs), finding a hyperplane with maximal margin from the borderline points is conceptually analogous to finding a maximum likelihood decision boundary. The borderline points, represented as solid dot and triangle in Fig. 1.10(b), are referred to as support vectors and are often most informative for the classification task.

Figure 1.10 (a) Mapping from input space to a higher-dimensional feature space using a nonlinear kernel function φ . (b) Separation of two data classes in the feature space through an optimal hyperplane.

More technically, in the feature space, a general hyperplane is defined as . If it classifies all the data points correctly, all the violet (dark gray in print version) points will lie in the region and the orange (light gray in print version) points will lie in the region . We seek to find a hyperplane that maximizes the margin d as shown in Fig. 1.10(b). Without loss of generality, let the point reside on the hyperplane and is closest to the hyperplane on which resides. Since the vectors and the angle ϕ are related by , the margin d is given

Enjoying the preview?

Page 1 of 1

Machine Learning for Future Fiber-Optic Communication Systems

About this ebook

Related to Machine Learning for Future Fiber-Optic Communication Systems

Related ebooks

Technology & Engineering For You

Related podcast episodes

Related articles

Related categories

Reviews for Machine Learning for Future Fiber-Optic Communication Systems

What did you think?

Book preview

Machine Learning for Future Fiber-Optic Communication Systems - Alan Pak Tao Lau

Abstract

Keywords

Machine learning; Deep learning; Artificial intelligence; Network intelligence; Autonomous networks

1.1 Introduction

1.2 Supervised learning

1.2.1 Artificial neural networks (ANNs)

(1.1)

(1.2)

(1.4)

(1.5)

(1.7)

(1.8)

(1.9)

(1.10)

(1.11)

(1.12)

1.2.2 Choice of activation functions

(1.13)

1.2.3 Choice of loss functions

(1.15)

1.2.4 Support vector machines (SVMs)