Artificial Intelligence and Machine Learning for EDGE Computing

Ebook1,421 pages12 hours

Artificial Intelligence and Machine Learning for EDGE Computing

Name: Artificial Intelligence and Machine Learning for EDGE Computing
ISBN: 9780128240557

By Neeraj Kumar Singh

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Artificial Intelligence and Machine Learning for Predictive and Analytical Rendering in Edge Computing focuses on the role of AI and machine learning as it impacts and works alongside Edge Computing. Sections cover the growing number of devices and applications in diversified domains of industry, including gaming, speech recognition, medical diagnostics, robotics and computer vision and how they are being driven by Big Data, Artificial Intelligence, Machine Learning and distributed computing, may it be Cloud Computing or the evolving Fog and Edge Computing paradigms.

Challenges covered include remote storage and computing, bandwidth overload due to transportation of data from End nodes to Cloud leading in latency issues, security issues in transporting sensitive medical and financial information across larger gaps in points of data generation and computing, as well as design features of Edge nodes to store and run AI/ML algorithms for effective rendering.

Provides a reference handbook on the evolution of distributed systems, including Cloud, Fog and Edge Computing
Integrates the various Artificial Intelligence and Machine Learning techniques for effective predictions at Edge rather than Cloud or remote Data Centers
Provides insight into the features and constraints in Edge Computing and storage, including hardware constraints and the technological/architectural developments that shall overcome those constraints

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateApr 26, 2022

ISBN9780128240557

Related to Artificial Intelligence and Machine Learning for EDGE Computing

Related ebooks

Skip carousel

Artificial Intelligence and Machine Learning in Smart City Planning
Ebook
Artificial Intelligence and Machine Learning in Smart City Planning
byVedik Basetti
Rating: 0 out of 5 stars
0 ratings
Distributed Computing to Blockchain: Architecture, Technology, and Applications
Ebook
Distributed Computing to Blockchain: Architecture, Technology, and Applications
byRajiv Pandey
Rating: 0 out of 5 stars
0 ratings
Smart Cities and Homes: Key Enabling Technologies
Ebook
Smart Cities and Homes: Key Enabling Technologies
byPetros Nicopolitidis
Rating: 5 out of 5 stars
5/5
Autonomous and Connected Heavy Vehicle Technology
Ebook
Autonomous and Connected Heavy Vehicle Technology
byRajalakshmi Krishnamurthi
Rating: 0 out of 5 stars
0 ratings
System Assurances: Modeling and Management
Ebook
System Assurances: Modeling and Management
byPrashant Johri
Rating: 0 out of 5 stars
0 ratings
Modeling and Simulation of Computer Networks and Systems: Methodologies and Applications
Ebook
Modeling and Simulation of Computer Networks and Systems: Methodologies and Applications
byFaouzi Zarai
Rating: 0 out of 5 stars
0 ratings
Recent Trends in Computational Intelligence Enabled Research: Theoretical Foundations and Applications
Ebook
Recent Trends in Computational Intelligence Enabled Research: Theoretical Foundations and Applications
bySiddhartha Bhattacharyya
Rating: 0 out of 5 stars
0 ratings
Quantum Inspired Computational Intelligence: Research and Applications
Ebook
Quantum Inspired Computational Intelligence: Research and Applications
bySiddhartha Bhattacharyya
Rating: 0 out of 5 stars
0 ratings
Cyber-Physical Systems: Foundations, Principles and Applications
Ebook
Cyber-Physical Systems: Foundations, Principles and Applications
byHoubing H. Song
Rating: 0 out of 5 stars
0 ratings
Internet of Things: Principles and Paradigms
Ebook
Internet of Things: Principles and Paradigms
byRajkumar Buyya
Rating: 4 out of 5 stars
4/5
Cognitive Radio Communication and Networking: Principles and Practice
Ebook
Cognitive Radio Communication and Networking: Principles and Practice
byRobert Caiming Qiu
Rating: 0 out of 5 stars
0 ratings
Advances in GPU Research and Practice
Ebook
Advances in GPU Research and Practice
byHamid Sarbazi-Azad
Rating: 0 out of 5 stars
0 ratings
Material-Integrated Intelligent Systems: Technology and Applications
Ebook
Material-Integrated Intelligent Systems: Technology and Applications
byStefan Bosse
Rating: 0 out of 5 stars
0 ratings
Deep Learning and Parallel Computing Environment for Bioengineering Systems
Ebook
Deep Learning and Parallel Computing Environment for Bioengineering Systems
byArun Kumar Sangaiah
Rating: 0 out of 5 stars
0 ratings
Pervasive Computing: Next Generation Platforms for Intelligent Data Collection
Ebook
Pervasive Computing: Next Generation Platforms for Intelligent Data Collection
byCiprian Dobre
Rating: 5 out of 5 stars
5/5
Big Data Analytics for Cyber-Physical Systems: Machine Learning for the Internet of Things
Ebook
Big Data Analytics for Cyber-Physical Systems: Machine Learning for the Internet of Things
byGuido Dartmann
Rating: 0 out of 5 stars
0 ratings
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
Ebook
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
byMassimo Ficco
Rating: 0 out of 5 stars
0 ratings
Smart Cities Cybersecurity and Privacy
Ebook
Smart Cities Cybersecurity and Privacy
byDanda B. Rawat
Rating: 5 out of 5 stars
5/5
Cooperative and Graph Signal Processing: Principles and Applications
Ebook
Cooperative and Graph Signal Processing: Principles and Applications
byPetar Djuric
Rating: 0 out of 5 stars
0 ratings
Green Blockchain Technology for Sustainable Smart Cities
Ebook
Green Blockchain Technology for Sustainable Smart Cities
bySaravanan Krishnan
Rating: 0 out of 5 stars
0 ratings
Python for Machine Learning: From Fundamentals to Real-World Applications
Ebook
Python for Machine Learning: From Fundamentals to Real-World Applications
byKameron Hussain
Rating: 0 out of 5 stars
0 ratings
Smart Electrical and Mechanical Systems: An Application of Artificial Intelligence and Machine Learning
Ebook
Smart Electrical and Mechanical Systems: An Application of Artificial Intelligence and Machine Learning
byRakesh Sehgal
Rating: 0 out of 5 stars
0 ratings
Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications
Ebook
Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications
byArun Kumar Sangaiah
Rating: 0 out of 5 stars
0 ratings
AI and IoT-based intelligent Health Care & Sanitation
Ebook
AI and IoT-based intelligent Health Care & Sanitation
byShashank Awasthi
Rating: 0 out of 5 stars
0 ratings
Pattern Recognition
Ebook
Pattern Recognition
byKonstantinos Koutroumbas
Rating: 4 out of 5 stars
4/5
Emerging Trends in ICT Security
Ebook
Emerging Trends in ICT Security
byBabak Akhgar
Rating: 0 out of 5 stars
0 ratings
The Art and Science of Analyzing Software Data
Ebook
The Art and Science of Analyzing Software Data
byChristian Bird
Rating: 0 out of 5 stars
0 ratings
Handbook of Green Information and Communication Systems
Ebook
Handbook of Green Information and Communication Systems
byAlagan Anpalagan
Rating: 0 out of 5 stars
0 ratings
Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques
Ebook
Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques
bySaeid Eslamian
Rating: 0 out of 5 stars
0 ratings
Smart Spaces
Ebook
Smart Spaces
byZhihan Lyu
Rating: 0 out of 5 stars
0 ratings

Science & Mathematics For You

Skip carousel

Outsmart Your Brain: Why Learning is Hard and How You Can Make It Easy
Ebook
Outsmart Your Brain: Why Learning is Hard and How You Can Make It Easy
byDaniel T. Willingham
Rating: 4 out of 5 stars
4/5
Becoming Cliterate: Why Orgasm Equality Matters--And How to Get It
Ebook
Becoming Cliterate: Why Orgasm Equality Matters--And How to Get It
byDr. Laurie Mintz
Rating: 4 out of 5 stars
4/5
Activate Your Brain: How Understanding Your Brain Can Improve Your Work - and Your Life
Ebook
Activate Your Brain: How Understanding Your Brain Can Improve Your Work - and Your Life
byScott G. Halford
Rating: 4 out of 5 stars
4/5
A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals
Ebook
A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals
byRobert F. Kennedy, Jr.
Rating: 3 out of 5 stars
3/5
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
The Big Fat Surprise: Why Butter, Meat and Cheese Belong in a Healthy Diet
Ebook
The Big Fat Surprise: Why Butter, Meat and Cheese Belong in a Healthy Diet
byNina Teicholz
Rating: 4 out of 5 stars
4/5
The Dorito Effect: The Surprising New Truth About Food and Flavor
Ebook
The Dorito Effect: The Surprising New Truth About Food and Flavor
byMark Schatzker
Rating: 4 out of 5 stars
4/5
The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos,
Ebook
The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos,
byAlbert Rutherford
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Memory Craft: Improve Your Memory with the Most Powerful Methods in History
Ebook
Memory Craft: Improve Your Memory with the Most Powerful Methods in History
byLynne Kelly
Rating: 3 out of 5 stars
3/5
How Emotions Are Made: The Secret Life of the Brain
Ebook
How Emotions Are Made: The Secret Life of the Brain
byLisa Feldman Barrett
Rating: 4 out of 5 stars
4/5
Born for Love: Why Empathy Is Essential--and Endangered
Ebook
Born for Love: Why Empathy Is Essential--and Endangered
byBruce D. Perry
Rating: 4 out of 5 stars
4/5
The Big Book of Hacks: 264 Amazing DIY Tech Projects
Ebook
The Big Book of Hacks: 264 Amazing DIY Tech Projects
byDoug Cantor
Rating: 4 out of 5 stars
4/5
Homo Deus: A Brief History of Tomorrow
Ebook
Homo Deus: A Brief History of Tomorrow
byYuval Noah Harari
Rating: 4 out of 5 stars
4/5
Why People Believe Weird Things: Pseudoscience, Superstition, and Other Confusions of Our Time
Ebook
Why People Believe Weird Things: Pseudoscience, Superstition, and Other Confusions of Our Time
byMichael Shermer
Rating: 4 out of 5 stars
4/5
The Wisdom of Psychopaths: What Saints, Spies, and Serial Killers Can Teach Us About Success
Ebook
The Wisdom of Psychopaths: What Saints, Spies, and Serial Killers Can Teach Us About Success
byKevin Dutton
Rating: 4 out of 5 stars
4/5
Metaphors We Live By
Ebook
Metaphors We Live By
byGeorge Lakoff
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C Lennox
Rating: 4 out of 5 stars
4/5
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career
Ebook
Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career
byScott H. Young
Rating: 4 out of 5 stars
4/5
Free Will
Ebook
Free Will
bySam Harris
Rating: 4 out of 5 stars
4/5
On Food and Cooking: The Science and Lore of the Kitchen
Ebook
On Food and Cooking: The Science and Lore of the Kitchen
byHarold McGee
Rating: 5 out of 5 stars
5/5
Oppenheimer: The Tragic Intellect
Ebook
Oppenheimer: The Tragic Intellect
byCharles Thorpe
Rating: 5 out of 5 stars
5/5
The Great Mortality: An Intimate History of the Black Death, the Most Devastating Plague of All Time
Ebook
The Great Mortality: An Intimate History of the Black Death, the Most Devastating Plague of All Time
byJohn Kelly
Rating: 4 out of 5 stars
4/5
The Psychology of Totalitarianism
Ebook
The Psychology of Totalitarianism
byMattias Desmet
Rating: 5 out of 5 stars
5/5
Hunt for the Skinwalker: Science Confronts the Unexplained at a Remote Ranch in Utah
Ebook
Hunt for the Skinwalker: Science Confronts the Unexplained at a Remote Ranch in Utah
byColm A. Kelleher
Rating: 4 out of 5 stars
4/5
Fantastic Fungi: How Mushrooms Can Heal, Shift Consciousness, and Save the Planet
Ebook
Fantastic Fungi: How Mushrooms Can Heal, Shift Consciousness, and Save the Planet
byPaul Stamets
Rating: 5 out of 5 stars
5/5
No Stone Unturned: The True Story of the World's Premier Forensic Investigators
Ebook
No Stone Unturned: The True Story of the World's Premier Forensic Investigators
bySteve Jackson
Rating: 4 out of 5 stars
4/5
Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness
Ebook
Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness
byPeter Godfrey-Smith
Rating: 4 out of 5 stars
4/5
Lies My Gov't Told Me: And the Better Future Coming
Ebook
Lies My Gov't Told Me: And the Better Future Coming
byRobert W. Malone
Rating: 4 out of 5 stars
4/5
The Misinformation Age: How False Beliefs Spread
Ebook
The Misinformation Age: How False Beliefs Spread
byCailin O'Connor
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
Podcast episode
MLOps Coffee Sessions #11: Analyzing “Continuous Delivery and Automation Pipelines in ML" // Part 3
byMLOps.community
0 ratings
0% found this document useful
Inventing the future of computing, with Alessandro Curioni
Podcast episode
Inventing the future of computing, with Alessandro Curioni
byLondon Futurists
0 ratings
0% found this document useful
ATLAS with Dr. Mario Lassnig: Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN!
Podcast episode
ATLAS with Dr. Mario Lassnig: Our guest today is Dr. Mario Lassnig, a software engineer working on the ATLAS Experiment at CERN!
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL: We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace for high-availability blockchain data indexes.
Podcast episode
Yaniv Tal: The Graph – A Marketplace for Web3 Data Indexes Based on GraphQL: We're joined by Yaniv Tal, Project Lead at The Graph. The project aims to create a scalable marketplace for high-availability blockchain data indexes.
byEpicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies
0 ratings
0% found this document useful
Post-Quantum Cryptography with Nick Sullivan and Adam Langley: Adam Langley and Nick Sullivan help us understand post-quantum cryptography and what security research looks like for quantum computing.
Podcast episode
Post-Quantum Cryptography with Nick Sullivan and Adam Langley: Adam Langley and Nick Sullivan help us understand post-quantum cryptography and what security research looks like for quantum computing.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
How LLMs and Generative AI are Revolutionizing AI for Science with Anima Anandkumar - #614
Podcast episode
How LLMs and Generative AI are Revolutionizing AI for Science with Anima Anandkumar - #614
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Analyzing the Google Paper on Continuous Delivery in ML // Part 4 // MLOps Coffee Sessions #17
Podcast episode
Analyzing the Google Paper on Continuous Delivery in ML // Part 4 // MLOps Coffee Sessions #17
byMLOps.community
0 ratings
0% found this document useful
Nathan Selikoff on Omnimodal's real-time tech stack: Nathan joins Sam and Ryan to talk about how he's using Ember, Rails, Node, and AWS infrastructure to build Omnimodal, the startup he co-founded to help cities manage their transportation demand in real time.
Podcast episode
Nathan Selikoff on Omnimodal's real-time tech stack: Nathan joins Sam and Ryan to talk about how he's using Ember, Rails, Node, and AWS infrastructure to build Omnimodal, the startup he co-founded to help cities manage their transportation demand in real time.
byFrontend First
0 ratings
0% found this document useful
Crypto Points Systems Are a 100x Opportunity, But Founders, Be Wary - Ep. 585: Li Jin of Variant Fund dives into the trend of points in crypto: why projects favor points over tokens, the art of designing such systems, and the potential of on-chain points.
Podcast episode
Crypto Points Systems Are a 100x Opportunity, But Founders, Be Wary - Ep. 585: Li Jin of Variant Fund dives into the trend of points in crypto: why projects favor points over tokens, the art of designing such systems, and the potential of on-chain points.
byUnchained
0 ratings
0% found this document useful
Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635
Podcast episode
Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Assessing Quantum Computing, with Ignacio Cirac
Podcast episode
Assessing Quantum Computing, with Ignacio Cirac
byLondon Futurists
0 ratings
0% found this document useful
Web3 Is Reimagining the Architecture of Applications: Preethi Kasireddy, Founder of DappCamp
Podcast episode
Web3 Is Reimagining the Architecture of Applications: Preethi Kasireddy, Founder of DappCamp
byThe Delphi Podcast
0 ratings
0% found this document useful
Monitoring, Metrics and M3, with Martin Mao and Rob Skillington: Martin Mao and Rob Skillington are co-founders of Chronosphere; CEO and CTO respectively. They both worked on the monitoring team at Uber, where they created M3 - a metrics platform with an open source time-series database built for scale. They join hosts Craig and Adam to talk about monitoring, metrics and M3 on the last episode of 2019.
Podcast episode
Monitoring, Metrics and M3, with Martin Mao and Rob Skillington: Martin Mao and Rob Skillington are co-founders of Chronosphere; CEO and CTO respectively. They both worked on the monitoring team at Uber, where they created M3 - a metrics platform with an open source time-series database built for scale. They join hosts Craig and Adam to talk about monitoring, metrics and M3 on the last episode of 2019.
byKubernetes Podcast from Google
0 ratings
0% found this document useful
AsyncAPI V3 with Fran Méndez: This is a re-post from December 2023. In this ep…
Podcast episode
AsyncAPI V3 with Fran Méndez: This is a re-post from December 2023. In this ep…
byThe InfoQ Podcast
0 ratings
0% found this document useful
Micro Grids: Modellansatz 186
Podcast episode
Micro Grids: Modellansatz 186
byModellansatz - English episodes only
0 ratings
0% found this document useful
The Cloudcast #355 - Exploring IoT Edge
Podcast episode
The Cloudcast #355 - Exploring IoT Edge
byThe Cloudcast
0 ratings
0% found this document useful
TripoSR: Fast 3D Object Reconstruction from a Single Image: This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0.5 seconds. Building upon the LRM network architecture, Tripo...
Podcast episode
TripoSR: Fast 3D Object Reconstruction from a Single Image: This technical report introduces TripoSR, a 3D reconstruction model leveraging transformer architecture for fast feed-forward 3D generation, producing 3D mesh from a single image in under 0.5 seconds. Building upon the LRM network architecture, Tripo...
byPapers Read on AI
0 ratings
0% found this document useful
State of Containers in the Public Cloud
Podcast episode
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Overcoming the next hurdle to get to 800G pluggable optics, with Mark Nowell, 2 of 4: What are the industry’s technical experts in plug…
Podcast episode
Overcoming the next hurdle to get to 800G pluggable optics, with Mark Nowell, 2 of 4: What are the industry’s technical experts in plug…
byCisco Podcast Network
0 ratings
0% found this document useful
Quantum Queries — Dr. Florian Neukart, Principle Scientist at Volkswagen — Quantum Computing for Research and Simulation, the Pathway to Understanding Materials and Building Better Products: Dr. Florian Neukart, principle scientist at Volkswagen, delivers a comprehensive analysis of quantum computing and simulation utilized for research. As principal scientist at Volkswagen Group, Dr. Neukart, focuses on intensive research in the fields...
Podcast episode
Quantum Queries — Dr. Florian Neukart, Principle Scientist at Volkswagen — Quantum Computing for Research and Simulation, the Pathway to Understanding Materials and Building Better Products: Dr. Florian Neukart, principle scientist at Volkswagen, delivers a comprehensive analysis of quantum computing and simulation utilized for research. As principal scientist at Volkswagen Group, Dr. Neukart, focuses on intensive research in the fields...
byFinding Genius Podcast
0 ratings
0% found this document useful
Pushing the Boundaries of AI, Cheaply and Efficiently: Murat Onen Explains: Large-scale AI models that enable next-generation applications like natural language processing and autonomous systems require intensive training and immense power. The monetary and environmental expense is too great. This is where analog deep...
Podcast episode
Pushing the Boundaries of AI, Cheaply and Efficiently: Murat Onen Explains: Large-scale AI models that enable next-generation applications like natural language processing and autonomous systems require intensive training and immense power. The monetary and environmental expense is too great. This is where analog deep...
byFinding Genius Podcast
0 ratings
0% found this document useful
Anyone Listening? Quantum Cryptography Applications with Vlatko Vedral: Upgrading isn't just for phone systems. Quantum information science tackles the upgrade of old existing technologies, which run by classical physics laws, to those that function in the quantum realm. It's as easy as it sounds: Vlatko Vederal tells...
Podcast episode
Anyone Listening? Quantum Cryptography Applications with Vlatko Vedral: Upgrading isn't just for phone systems. Quantum information science tackles the upgrade of old existing technologies, which run by classical physics laws, to those that function in the quantum realm. It's as easy as it sounds: Vlatko Vederal tells...
byFinding Genius Podcast
0 ratings
0% found this document useful
The Cloudcast #363 - IoT and APIs
Podcast episode
The Cloudcast #363 - IoT and APIs
byThe Cloudcast
0 ratings
0% found this document useful
37. Sean Knapp - The brave new world of data engineering
Podcast episode
37. Sean Knapp - The brave new world of data engineering
byTowards Data Science
0 ratings
0% found this document useful
Why and how is AI taking over the tissue image analysis field? w/ Jeppe Thagaard, Visiopharm
Podcast episode
Why and how is AI taking over the tissue image analysis field? w/ Jeppe Thagaard, Visiopharm
byDigital Pathology Podcast
0 ratings
0% found this document useful
Setting the Standard: Impact of Method Standardization in Chromatography
Podcast episode
Setting the Standard: Impact of Method Standardization in Chromatography
byThe Analytical Wavelength
0 ratings
0% found this document useful
36. Max Welling - The future of machine learning
Podcast episode
36. Max Welling - The future of machine learning
byTowards Data Science
0 ratings
0% found this document useful
070R_Citizen-centred big data analysis-driven governance intelligence framework for smart cities (research summary)
Podcast episode
070R_Citizen-centred big data analysis-driven governance intelligence framework for smart cities (research summary)
byWhat is The Future for Cities?
0 ratings
0% found this document useful
SQL Commenter with Nimesh Bhagat and Morgan McLean: First time co-host joins this week to talk about database observability and the cool tools that make it possible. Morgan McLean and Nimesh Bhagat describe database observability, which uses metrics, logs, and other tools to help users understand the...
Podcast episode
SQL Commenter with Nimesh Bhagat and Morgan McLean: First time co-host joins this week to talk about database observability and the cool tools that make it possible. Morgan McLean and Nimesh Bhagat describe database observability, which uses metrics, logs, and other tools to help users understand the...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
The Cloudcast #333 - Infrastructure 3.0 for AI and ML: Aaron and Brian talk with Lenny Pruss (@lennypruss, Partner at Amplify Partners) about the evolution of application and infrastructure architectures, how AI/ML are radically changing how applications are designed, the new inputs to application systems,...
Podcast episode
The Cloudcast #333 - Infrastructure 3.0 for AI and ML: Aaron and Brian talk with Lenny Pruss (@lennypruss, Partner at Amplify Partners) about the evolution of application and infrastructure architectures, how AI/ML are radically changing how applications are designed, the new inputs to application systems,...
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Quantum Computing Is Here…with One Small Caveat
PC Pro Magazine
Article
Quantum Computing Is Here…with One Small Caveat
Jan 4, 2024
7 min read
Data Model For Embedded Machine Learning
The Shed
Article
Data Model For Embedded Machine Learning
Feb 13, 2023
4 min read
Quantum Computing Is Here… With One Small Caveat
APC
Article
Quantum Computing Is Here… With One Small Caveat
Feb 5, 2024
8 min read
Data Model For Embedded Machine Learning
The Shed
Article
Data Model For Embedded Machine Learning
Feb 13, 2023
4 min read
The AI race
Racecar Engineering
Article
The AI race
Jul 7, 2023
10 min read
Quantum Simulators An Overview
Techfastly
Article
Quantum Simulators An Overview
Oct 1, 2021
4 min read
Forward Thinking
Racecar Engineering
Article
Forward Thinking
Feb 4, 2022
8 min read
Business applications For Quantum computing
Rotman Management
Article
Business applications For Quantum computing
May 1, 2022
COMPUTERS DO ARITHMETIC. Underlying every amazing application of computers today is math, calculated using binary digits or ‘bits.’ The original computers of the early 1950s could perform about 465 multiplications per second — much faster than the ‘h
11 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
The Race To Exascale Supercomputers
Maximum PC
Article
The Race To Exascale Supercomputers
Jun 21, 2022
9 min read
Dawn Of A New Era
TechLife
Article
Dawn Of A New Era
Mar 7, 2022
3 min read
Countdown To Cybersecurity In The Quantum Era: Will Businesses Be Ready In Time?
The European Business Review
Article
Countdown To Cybersecurity In The Quantum Era: Will Businesses Be Ready In Time?
Jul 31, 2023
☑ As of today, there are no large-scale quantum computers available that could break cryptographic algorithms, but we know they are coming. Due to the time it takes to implement and promulgate a defense, businesses should act now to counter this thre
7 min read
Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
Article
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
Tech Lets People Play Games With Their Thoughts
Futurity
Article
Tech Lets People Play Games With Their Thoughts
Apr 1, 2024
Engineers have created a program that lets people use their thoughts to control video games. The innovation is part of research into brain-computer interfaces to help improve the lives of people with motor disabilities. The researchers incorporated m
2 min read
The Superfast Processors
India Today
Article
The Superfast Processors
Aug 19, 2023
2 min read
This Material Makes Beautiful, Potentially Useful Rainbows
Futurity
Article
This Material Makes Beautiful, Potentially Useful Rainbows
Sep 8, 2021
2 min read
India Pushes For Storage Of Private Data Using Technology Built For Anonymity
Global Voices
Article
India Pushes For Storage Of Private Data Using Technology Built For Anonymity
May 16, 2022
As VPNs and blockchain-based services are often designed to assure user anonymity and privacy, this direction might force many service providers to shut down operations in India.
4 min read
Tales For Makers
The Shed
Article
Tales For Makers
Oct 3, 2022
4 min read
Edge Computing The Key To IoT Success
Techfastly
Article
Edge Computing The Key To IoT Success
Jun 1, 2022
6 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
How To Implement Edge Computing in Your Organization?
Techfastly
Article
How To Implement Edge Computing in Your Organization?
Jun 1, 2022
5 min read
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?: Despite new biology-like tools, some insist interpretation is impossible.
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
Is Artificial Intelligence Permanently Inscrutable?
Nautilus
Article
Is Artificial Intelligence Permanently Inscrutable?
Sep 1, 2016
Dmitry Malioutov can’t say much about what he built. As a research scientist at IBM, Malioutov spends part of his time building machine learning systems that solve difficult problems faced by IBM’s corporate clients. One such program was meant for a
13 min read
IoT SECURITY: ARE WE READY FOR A QUANTUM WORLD?
The European Business Review
Article
IoT SECURITY: ARE WE READY FOR A QUANTUM WORLD?
Sep 30, 2020
4 min read
Quantum Computers Vs. Cryptography – Which Will Prevail?
Muse: The magazine of science, culture, and smart laughs for kids and children
Article
Quantum Computers Vs. Cryptography – Which Will Prevail?
Jan 1, 2023
1 min read
Tiny Component May Be Able To Reach ‘Quantum Supremacy’
Futurity
Article
Tiny Component May Be Able To Reach ‘Quantum Supremacy’
Apr 23, 2019
2 min read
Quantum Entanglement
Techfastly
Article
Quantum Entanglement
Oct 1, 2021
4 min read
Quantum Computing’s DISRUPTION IN Finance Industry
Techfastly
Article
Quantum Computing’s DISRUPTION IN Finance Industry
Oct 1, 2021
5 min read

Related categories

Skip carousel

Reviews for Artificial Intelligence and Machine Learning for EDGE Computing

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Artificial Intelligence and Machine Learning for EDGE Computing - Rajiv Pandey

Part I

AI and machine learning

Chapter 1: Supervised learning

Kanishka Tyagia; Chinmay Raneb; Michael Manryc a Aptiv Advanced Research Center, Agoura Hills, CA, United States

b Quantiphi, Inc., Marlborough, MA, United States

c Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, United States

Abstract

Machine learning models learn different tasks with different paradigms that effectively aim to get the models better through training. Supervised learning is a common form of machine learning training paradigm that has been used successfully in real-world machine learning applications. Typical supervised learning involves two phases. In phase 1 (commonly called training), we give input and expect output (also known as ground truth) and train a model with respect to a metric using an optimization algorithm. In phase 2 (commonly called testing), we deploy the model with unseen data and expect it to either classify or approximate the outputs. Although supervised learning is covered in almost all machine learning textbooks, we will introduce and explain supervised learning from an application point of view and its relationship to edge computing. All the algorithms will be explained from a mathematical and theoretical point of view and a programmer’s perspective. This will help in doing hands-on experience in implementing algorithms for a variety of problems.

Keywords

Linear regression; Logistic regression; Steepest descent; Conjugate gradient; Multilayer perceptron; Second-order algorithms; KL divergence; Generalized linear models; Kernel machines; Bootstrapping

1: Introduction

This chapter discusses various supervised learning paradigm that is used to train and deploy machine learning models. While the theory on supervised learning is predominated mainly by the computer vision and natural language processing community, much research is required in its application to edge computing. With the recent advances in machine learning algorithms, combined with the increasing computational power, edge computing is another area of application where machine learning will be improving the current technology. According to a white paper from Cisco [1], 50 billion internet of things (IoT) devices will be connected to the internet by the end of 2020, and even though the estimation that nearly 850 ZB of data will be generated per year outside the cloud by 2021, the global data center traffic is only 21 ZB approximately. This means that a transformation from bug cloud data centers to a wide range of edge devices is happening.

We plan to cover the traditional supervised learning algorithm basics and the trade tricks while implementing them for various real-life applications. The assumptions made and the programming decisions made while implementing them are also discussed. We start with generalized linear models along with optimization algorithms (gradient descent, Newton’s method) and metrics (mean square error [MSE], cross-entropy) used in training the model. Then, Naïve Bayes and kernel methods are discussed along with the decision tree and its variants. We provide pseudocode for all the supervised learning algorithms that we discuss to give the readers an idea of how the algorithms are implemented from theory to practice. In addition, supervised learning algorithms that capture the variation of input data distribution and algorithms that project data in higher subspace and first- and second-order learning paradigms are the subject matter of this chapter. The chapter covers the learning paradigms in sync with the modern-day application to edge computing from an implementation perspective.

Edge computing and machine learning will enable the devices to have a more pervasive and fine-grained intelligence. Supervised learning is a common way of training machine learning algorithms that can be later used in the inference stage on IoT and other edge devices.

This chapter covers several supervised learning algorithms that are commonly used in various real-life applications. Section 2 covers the basic perceptron model that helps understand the learning algorithm. In Section 3, the linear regression algorithm is discussed that extends the idea of perceptron algorithm to solve supervised regression tasks. We discuss commonly used gradient-based learning algorithms along with their pseudocodes. Section 4 discusses logistic regression, which is a very commonly used classifier due to its simplicity in structure and generalizability. In Section 5, we bring the concepts together from the previous sections to discuss the multilayer perceptron (MLP) network, a widely used neural network in real-life applications. We explain the structure, initialization strategy, and detail out various learning algorithms that are reused to train an MLP. Section 6 discusses the Kullback-Leibler (KL) divergence measure used to calculate how two probability distributions are different from each other. Section 7 discusses the generalized linear models that extend to the linear models but with a nonlinear activation on the output nodes. Section 8 talks about the kernel methods leading to Section 9 that explains nonlinear support vector machine (SVM) classifiers. We conclude this chapter in Section 10 by discussing various tree ensemble algorithms.

2: Perceptron

Perceptron algorithms came in early 1960; however, Minsky and Papert [2] show the restriction they have. Rosenblatt [3] described an alpha perceptron that is an example of a statistical pattern recognition (SPR) system. In a typical SPR system, the features are obtained from the raw input using no learning mechanism but instead using a common-sense rule. Therefore, a human decides what is a good feature and sees if it works, and if it does not work, tries another. We learn the weight associated with each feature activation to get a single scalar quantity and then based on if this quantity is above or below a threshold. In a typical perceptron model for input vector, x is used to compute a weighted sum from all the neurons and added with a bias vector. This bias vector is also known as the threshold vector in the literature.

si5_e (1)

The linear perceptron will output y depending on the following rule as shown in Fig. 1.

si6_e (2)

The perception algorithm is fast and straightforward, and if the dataset is linearly separable, it is guaranteed to converge.

Fig. 1

Fig. 1 Perceptron activation functions.

3: Linear regression

In this section, we discuss the structure and notation of a linear regression. As shown in Fig. 2, a linear regression is a weight matrix W that transforms an input vector x into a discriminant vector y [4]. The weight w(m, n) connects the nth input to the mth output. The training dataset (xp, tp) consists of N-dimensional input vectors xp and M-dimensional desired output vectors tp. The pattern number p varies from 1 to Nv, where Nv denotes the number of training patterns. The threshold is handled by augmenting x with an extra element, which is equal to 1 as xa = [1 : xT]T. So xa contains Nu basis functions, where Nu = N + 1. For the pth training pattern, the network output vector, yp can be written as

si7_e (3)

where xap denotes xa for the pth pattern.

Fig. 2

Fig. 2 Linear regression.

3.1: Training a linear regression

To train the linear regression, we minimize the error function E that is a surrogate for a nonsmooth classification error. As in Ref. [5], from a Bayesian point of view, we consider maximizing likelihood function or minimizing MSE in the least-square sense, where the MSE between the inputs and the outputs is

si8_e (4)

Here, the target output for the correct output. M denotes the total number of outputs. We minimize the error function from Eq. (9) with respect to W by solving the M sets of N + 1 linear equations given by

si9_e (5)

where the cross-correlation matrix C and the auto-correlation matrix R are, respectively,

si10_e (6)

si11_e (7)

Since R in Eq. (35) is often ill-conditioned, it is unsafe to use Gauss-Jordan elimination. Eq. (35) is solved using orthogonal least squares (OLS) [6]. In Ref. [7], OLS is used to solve for radial basis function network parameters. OLS is useful for practical applications for two primary reasons. First, the training is fast since solving linear equations is straightforward. Second, it helps us to avoid some local minima [8]. In terms of optimization theory, solving Eq. (35) for W is merely Newton’s algorithm for the output weights [9].

Given

si12_e (8)

and the error function

si13_e (9)

expanding on Eq. (9)

si14_e

(10)

si15_e (11)

Doing a partial of error w.r.t. weight w.

si16_e (12)

Eq. (12) equal to 0 leads to the following normal equation:

si17_e (13)

Since there is a matrix inversion in Eq. (13), the computation cost is roughly O(n³) using a Gauss-Jordan elimination method. Therefore, normal equations are used only for a few thousand patterns, and with a more complex or higher-order equation, gradient descent-based algorithms are applied.

si18_e

(14)

si19_e

(15)

3.2: Steepest descent

The steepest descent gradient algorithm can be summarized in Algorithm 1.

Algorithm 1 Steepest descent gradient algorithm.

1: Initialize w, Nit, it ← 02: while it < Nitdo3: Calculate g from Eq. (12)4: Compute B2 from Eq. (15)5: Update w as w ← w + B2 ⋅ g6: it ← it + 17: end while

Fig. 3 illustrates gradient descent using a two-dimensional (2D) contour plot. The X-axis denotes the first weight w1 and the Y-axis is second weight w2. The arrows [g0,g1,g2,g3,g4,g5] in Fig. 3 determines the direction of the negative gradients for each of the weights to reach a minimum. The learning factor controls the step size derived in Eq. (15). We can observe that gradients get smaller as they approach the minimal point.

Fig. 3

Fig. 3 Steepest descent 2D contour plot.

From a programmers perspective, some of the debugging steps are:

1.E is nonincreasing

2.Eg approaches 0

3.B2 ≥ 0

4. si20_e

3.3: Conjugate gradient

As we see from the previous section, the weights are updated in the negative gradient direction in a basic gradient algorithm. Although the error function reduces most rapidly along the negative direction of the gradient, it does not necessarily create fast convergence. Conjugate gradient (CG) algorithm [10] performs a line search in the conjugate direction and has faster convergence than the backpropagation (BP) algorithm. Although CG is a general unconstrained optimization technique, its use in efficiently training an MLP is well documented in Ref. [11].

To train an MLP using CG algorithm, we use a direction vector that is obtained from the gradient g as

si21_e (16)

Here, p = vec(P, Poh, Poi) and P, Poi, and Poh are the direction vectors. B1 is the ratio of the gradient energy from two consecutive iterations. This direction vector, in turn, updates all the weights simultaneously as follows:

si22_e (17)

CG algorithm has many attractive qualities such as the following:

1.The number of iterations it takes to converge is equal to the number of unknowns.

2.It performs better than steepest descent, and it can be applied to nonquadratic error functions.

3.Since there is no Hessian involved, we do not invert any matrix, and the computational cost is O(w), where w is the size of the weight vector.

The CG algorithm can be summarized in Algorithm 2.

Algorithm 2 Conjugate gradient algorithm.

1: Initialize w, Nit, it ← 02: while it < Nitdo3: Calculate p from g4: Compute z from Eq. (15)5: Update w as w ← w + z ⋅ p6: it ← it +17: end while

Fig. 4 illustrates CG using a 2D contour plot. The X-axis denotes the first weight w1 and the Y-axis is second weight w2. From the plot, we can observe that CG needs N + 1 number of steps to reach the minimum.

Fig. 4

Fig. 4 Conjugate gradient 2D contour plot.

From a programmer’s perspective, some of the debugging steps are as follows:

1.E is nonincreasing if E is quadratic.

2.Forcing B1 = 0. This should result in a steepest descent algorithm.

3.B2 ≥ 0.

4.Calculate IP = si23_e for every it.

4: Logistic regression

The name for logistic regression is a misnomer since it is still a classification. A general two-class classification with posterior probability of one class being

si24_e (18)

si25_e (19)

si26_e (20)

Since it is a two-class classification, the classes C1 and C2 can be denoted as y ∈ 0, 1. Linear regression does not normally work in classification since even if y = 0 or 1, si27_e can be greater than or less than 1. Therefore, we use logistic regression in which y ∈ 0, 1. It should be emphasized that this is still a classification rather than a regression. Here, σ() is a logistic function defined as

si28_e (21)

when wTx tends to ∞ then si29_e tends to 1 and when wTx tends to −∞ then si29_e tends to 0.

The decision boundary for logistic regression is the property of the parameters and not the training set. So, if there is a nonlinear decision boundary, it is mapped according to the nonlinear activation function used in logistic regression.

Since si27_e is nonlinear (sigmoid), the decision boundary can be learned as manifold learning of nonlinear surfaces. In order to train the logistic regression, the cost function J(w) should preferably be convex so that only one global minimum is present. We would not use the MSE as in linear regression since it is a nonconvex function with multiple local minima. However, there are ways to train a logistic regression along with MSE [12, 13].

The logistic regression cost function will be as follows:

si32_e

(22)

Eq. (22) comes from maximum-likelihood estimation (MSE). To fit parameter w, we minimize J(w). Once the training is completed, the prediction for new input x is given as

si33_e (23)

In order to get g, we do the following calculations by chain rule:

si34_e (24)

si35_e (25)

si36_e (26)

si37_e (27)

Therefore,

si38_e (28)

In order to train, we minimize J(w) using gradient descent algorithm as in Algorithm 3. Algorithm 3 is a fundamental building block for more advanced algorithms like CG, BFGS, or L-BFGS.

Algorithm 3 Gradient descent algorithm.

1: Initialize w, Nit, it ← 02: while it < Nitdo3: Calculate g4: Update w as w ← w + z ⋅ g5: it ← it + 16: end while

4.1: Softmax classifier

The softmax classifier [5] is a generalized logistic regression classifier that outputs approximate class probabilities. Structurally, it is a linear model with softmax function [14] at the output units. For the pth pattern, it maps the input vector xp to the output class labels as

si39_e (29)

where W is a weight matrix. The performance measure is a cross-entropy loss function [5] as

si40_e

(30)

The softmax classifier is often trained using L-BFGS training algorithm [15].

5: Multilayer perceptron

In this chapter, we start by describing the multilayer perceptron (MLP), a nonlinear signal processor with good approximation and classification properties. The MLP has basis functions that can adapt during the training process by utilizing example input and desired outputs. An MLP will minimize an error criterion and closely mimic an optimal processor in which the computational burden in processing an input vector is controlled by slowly varying the number of coefficients [16, 17]. We review the first- and second-order training algorithms for MLP followed by a classifier design of MLP through regression.

5.1: Structure and notation

Fig. 5 illustrates a single-layer fully connected MLP. The input weights w(k, n) connect the nth input to the kth hidden unit. Output weights woh(m, k) connect the kth hidden unit’s nonlinear activation Op(k) to the mth output yp(m), which has a linear activation. The bypass weights woi(m, n) connects the nth input to the mth output. The training data described by the set of independent identically distributed input-output pair si41_e consist of N-dimensional input vectors xp and M-dimensional desired output vectors tp. The pattern number p varies from 1 to Nv, where Nv denotes the number of training vectors present in the datasets. Let Nh denote the number of hidden units. In order to handle the thresholds in the input layer, the input unit is augmented by an extra element xp(N + 1), where xp(N + 1) = 1. For each training pattern p, the hidden layer net function vector np can be written as

si42_e (31)

The kth element of the hidden unit activation vector Op is calculated as Op(k) = f(np(k)), where f(⋅) denotes the sigmoid activation function. The network output vector yp can be written as

si43_e (32)

The expression for the actual outputs given in Eq. (32) can be rewritten as

si44_e (33)

where Xa = [xpT:OpT]T is the augmented input column vector with Nu basis functions, where Nu = 1 + N + Nh. The total number of weights Nw = Nh ⋅ (1 + N) + M ⋅ Nh. Similarly, Wo is the M- by Nu-dimensional augmented weight matrix defined as Wo = [Woh:Woi].

Fig. 5

Fig. 5 Fully connected MLP.

To train an MLP, we recast the MLP learning problem as an optimization problem and use a structural risk minimization framework to design the learning algorithm [5, 18]. Essentially, this framework is used to minimize the error function E as in Eq. (9) that is a surrogate for a nonsmooth classification error. As in Ref. [5], from a Bayesian point of view, we consider maximizing-likelihood function or minimizing MSE in a least-square sense. Therefore, the MSE between the inputs and the outputs is defined as

si45_e

(34)

Here, λ is an L2 regularization parameter used to avoid memorization and overfitting. The nonlinearity in yp causes the error E to be nonconvex, and so in practice, local minima of the error function may be found. In the earlier discussion, we have assumed that tp has a Gaussian distribution with input xp. In case the conditional distribution of targets, given input has a Bernoulli distribution, the error function, which is given by the negative log likelihood, is then a cross-entropy error function [5].

In Ref. [5], it is concluded that using a cross-entropy error function instead of the MSE for a classification problem leads to faster training as well as improved generalization. Apart from cross-entropy and L2 error form, we also have an L1 error measure. Golik et al. [19] and Simard et al. [20] discuss a good comparison between the L2 and cross-entropy and suggest using cross-entropy error function for classification in order to have faster training and improved generalization. Our goal is to obtain an optimal value of the weights connected in an MLP. In order to achieve this, we use empirical risk minimization [17] framework to design the learning algorithms. An essential benefit of converting the training of an MLP into an optimization problem is that we can now use various optimizing algorithms to optimize the learning of an MLP.

5.2: Initialization

5.2.1: Input means and standard deviations

If some inputs have even more significant standard deviations than others, they can dominate the training, even if they are relatively useless. Inputs are normalized with zero mean and unit standard deviation.

5.2.2: Randomizing the input weights

As from Manry [16], the input weights matrix W is initialized randomly from a zero-mean Gaussian random number generator. The training of the input weights strongly depends on the gradient of the hidden unit’s activation functions with respect to the inputs. Training of input weights will cease if the hidden units it feeds into have an activation function derivative of zero for all patterns. In order to remove the dominance of large variance inputs, we divide the input weights by the input’s standard deviation. Therefore, we adjust the mean and standard deviation of all the hidden units net functions. This is called net control as in Ref. [21]. At this point, we have determined the initial input weights, and we are now ready to initialize the output weights. To solve for the weights connected to the output of the network, we use a technique called output weight optimization (OWO) [22, 23]. OWO minimizes the error function from Eq. (9) with respect to Wo by solving the M sets of Nu equations given by

si46_e (35)

Here, the cross-correlation matrix C, auto-correlation matrix R

si47_e

(36)

In order to incorporate the regularization, we modify the R matrix elements except the threshold as

si48_e (37)

where r is a vector containing the diagonal elements of R and diag() is an operator that creates a diagonal matrix from the vector.

The MLP network is now initialized and ready to be trained with first- or second-order algorithms. Training an MLP can be seen as an unconstrained optimization problem that usually involves first-order gradient methods such as BP, CG, and second-order Levenberg-Marquardt (LM), Newton’s method as the most popular learning algorithm. Training algorithms can be classified as

1.One-stage algorithm, in which all the weights of the network are updated simultaneously.

2.Two-stage algorithm, in which input and output weights are trained alternately.

Fig. 6 shows a flowchart that summarizes all the training algorithms that will be described in the subsequent sections.

Fig. 6

Fig. 6 Typical training algorithms for training an MLP.

5.3: First-order learning algorithms

The first-order learning algorithms update the weights of the MLP based on gradient matrices, that is, the first-order information, hence the name. We start by discussing the training of an MLP with a one-stage algorithm. In this, we train both the output and input weights simultaneously using either BP or CG algorithm. We then describe a two-stage algorithm called OWO-hidden weight optimization.

5.3.1: Backpropagation algorithm

The BP algorithm is a greedy line search algorithms that have a step size to achieve the maximum amount of decrease of the objective function at each step [5]. BP is a computationally efficient method in conjunction with gradient-based algorithms that are used widely to train an MLP [24]. However, due to the nonconvexity of error function (Eq. 9) in neural networks, BP is not guaranteed to find global minima but rather only local minima. Although this is considered as a major drawback, recently in Ref. [25] it is discussed as to why local minima are still useful in many practical problems. In each training epoch, we update all the weights of the network in a BP algorithm as follows:

si49_e (38)

Here, w is a vector of network weights as w = vec(W, Woh, Woi) and g is a vector of network gradients g = vec(G, Goh, Goi). The gradient matrices are negative partial of E w.r.t. the weights, si50_e , si51_e , and si52_e . vec() operator performs a lexicographic ordering of a matrix into a vector. z is the optimal learning factor that is derived using a Taylor series expansion of the MSE E, expressed in terms of z, as [26]

si53_e (39)

The BP algorithm can be summarized Algorithm 4.

Algorithm 4 Backpropagation algorithm.

1: Initialize w, Nit, it ← 02: while it < Nitdo3: Calculate g4: Compute z from Eq. (39)5: Update w as w ←w + z ⋅ g6: it ← it + 17: end while

As in Ref. [5], the BP algorithm has two major criticism. First, it does not scale well, that is, it takes si54_e operations for sufficiently large Nw and second, being a simple gradient descent procedure, it is unduly slow in the presence of flat error surfaces and is not a very reliable learning paradigm.

5.3.2: Training lemmas

Lemma 1

For the kth hidden unit, if f ′(np(k)) = 0 for all patterns, then weights feeding into the unit will not change during BP training.

Proof.

Observe the partial derivative formula below. The partial of E with respect to w(k, n) is 0 under the conditions of the lemma.

si55_e (40)

si56_e

(41)

Note that f ′(np(k)) is greatest for 0-valued net functions.

Implications

1.It would seem then that 0-valued initial weights are a good idea, since f ′() is largest for net values equal to 0.

2.During net control, it seems that standard deviations should be less than 2 and mean should be close to 0.

Lemma 2

Let the kth hidden unit have any set of weights. Let units k + j for j between 1 and (K − 1) have identically valued input and output weights as the kth unit. In other words, w(k + j, n) is the same for j between 0 and (K − 1), and woh(i, k + j) is the same for j between 0 and (K − 1). After training with BP, all K of these hidden units still have identically valued weights, and can be replaced by a single hidden unit.

Proof.

If the units start with identical weights, they have identically valued weight changes and are still identical after training.

The hidden unit outputs start identical and may experience identical weight changes.

OWO-BP is more difficult to predict.

Implications

Random initial weights ensure that no hidden units are identical.

Lemma 3

During BP training, input weights are sensitive to input means.

Proof.

Modeling xp(n) as ep(n) + mn, where ep(n) is zero mean and mn is the mean of the nth input, the negative partial of E with respect to w(k, n) is found as

si57_e

(42)

Note that as mn increases, si58_e becomes dominated by the first term, which has no information related to changes in the nth input.

Lemma 4

OWO-BP will not change thresholds (θ(k) or w(k, N + 1)) unless f′(np(k)) changes with p.

Proof

Assume f′(np(k)) does not change with p. Hence, we get

si59_e

(43)

Lemma 5

An optimal MLP structure for the multioutput approximation case (M > 1) has different hidden units for each output.

Proof.

The input weight gradient matrix G is found as

si60_e (44)

Updating the weights the Gi and learning factor differs for each output, the optimal MLP will have different hidden units with different input weights for each output.

Implications

1.If the kth hidden unit’s activation has constant slope f′(np(k)), θk will not change during BP training. So, during net control, we should not have mean = 0 and standard deviation too small.

2.Together, Lemmas 1 through 4 provide the motivation for zero-mean inputs, random initial weights, and net control with mean and standard deviation ≠0.

3.Lemma 5 shows that if a multioutput MLP is trained, using the same hidden units for each output, it is not optimal.

4.Lemma 3 shows that BP lacks affine invariance [27].

5.4: Second-order learning algorithms

The basic idea behind using a second-order method is to improve the first-order algorithms by using the second derivative along with the first derivative [5]. We present two, one-stage algorithms, namely Newton’s method and LM and then a two-stage algorithm called as OWO-multiple optimal learning factor [28–30].

5.4.1: Newton’s method

For Newton’s method, given a starting point, we construct a quadratic approximation to a double differentiable error function that matches the first- and second-order derivative value at that point. We then minimize this quadratic function instead of the original error function by expanding the Taylor series of E′ about the point wk as is clear from the equation below:

si61_e

(45)

Here, si62_e is the new version of w that we are trying to find. We calculate the Hessian and gradient, H and g of the MSE error function where the elements of H are given by

si63_e (46)

and the elements of g are given by

si64_e (47)

We calculate the second-order direction, d, by solving the set of linear equations with OLS

si65_e (48)

Assuming quadratic error function in Eq. (9) and H to be positive definite, applying first-order necessary condition (FONC) [31], on all the weights in an MLP, we update the weights as

si66_e (49)

The Newton’s algorithm can be summarized in Algorithm 5.

Algorithm 5 Newton’s algorithm.

1: Initialize w, Nit, it ← 02: while it < Nitdo3: Calculate g and H from Eqs. (46), (47).4: Compute d from Eq. (48).5: Update w as w ←w + d6: it ← it + 17: end while

Newton’s method is quadratic convergent and affine invariant [16]. Since it converges fast, we would like to use it to train an MLP, but generally, the Hessian H is singular [32].

If the error function is quadratic, then the approximation is exactly a one-step solution; otherwise the approximation will provide only an estimate to the exact solution. In case of a nonquadratic error measure, we will require a line search and w is updated as

si67_e (50)

5.4.2: LM algorithm

The LM algorithm is a compromise between Newton’s method, which converges rapidly near local or global minima but may diverge, and gradient descent, which has assured convergence through a proper selection of step size parameter but converge slowly. Following Eq. (45), the LM algorithm is a suboptimal method. Since usually H is singular in Newton’s method, an alternate is to modify the Hessian matrix as in LM [33] algorithm or use a two-step method such as layer-by-layer training [34]. In LM, we modify the Hessian as

si68_e (51)

Here, I is the identity matrix of the exact dimensions as H and λ is a regularizing parameter that forces the sum matrix (H + λ ⋅I) to be positive definite and safely well-conditioned throughout the computation. We calculate the second-order direction, d, similar to Newton’s method as

si69_e (52)

After obtaining HLM, weights of the model are updated using Eq. (49).

The regularizing parameter λ plays a crucial role in the way the LM algorithm functions. If we set λ equal to 0, then Eq. (52) reduces to Newton’s method (Eq. 49). On the other hand, if we assign a large value to λ such that λ ⋅I overpowers the Hessian H, the LM algorithms are effective as a gradient descent algorithm. Press et al. [35] recommend an excellent Marquardt recipe for the selection of λ.

From a practical perspective, the computational complexity of obtaining HLM can be demanding, mainly when the dimensionality of the weight vector w is high. Therefore, due to scalability constraints, LM is particularly suitable for a small network.

The LM algorithm can be summarized in Algorithm 6.

Algorithm 6 LM algorithm.

1: Initialize w, Nit, it ← 02: while it < Nitdo3: Present all patterns to the network to computer error Eold from Eq. (9)4: Calculate g and H from Eqs. (46), (47)5: Obtain HLM from Eq. (51)6: Compute d from Eq. (52)7: Update w as w ←w + d8: Recompute the error Enew by using the updated weights.9: ifEnew < Eoldthen10: Reduce the value of λ11: goto Step 312: else13: Increase the value of λ14: end if15: it ← it + 116: end while

6: KL divergence

From an information theory point of view, Kullback-Leibler (KL) divergence is a relative entropy measure that estimates how close we have modeled an approximate distribution q(x) with respect to the unknown distribution p(x). In order to arrive at the expression for KL divergence, we follow the probabilistic viewpoint. The following explanation is suitable for both continuous and discrete distribution. In order to quantitatively determine how good the approximate model q(x) is compared to (x), we calculate the log-likelihood ratio for an entire dataset

Enjoying the preview?

Page 1 of 1

Artificial Intelligence and Machine Learning for EDGE Computing

About this ebook

Related to Artificial Intelligence and Machine Learning for EDGE Computing

Related ebooks

Science & Mathematics For You

Related podcast episodes

Related articles

Related categories

Reviews for Artificial Intelligence and Machine Learning for EDGE Computing

What did you think?

Book preview

Artificial Intelligence and Machine Learning for EDGE Computing - Rajiv Pandey

Chapter 1: Supervised learning

Abstract

Keywords

Linear regression; Logistic regression; Steepest descent; Conjugate gradient; Multilayer perceptron; Second-order algorithms; KL divergence; Generalized linear models; Kernel machines; Bootstrapping

1: Introduction

2: Perceptron

3: Linear regression

3.1: Training a linear regression

(10)

(14)

(15)

3.2: Steepest descent

3.3: Conjugate gradient

4: Logistic regression

(22)

4.1: Softmax classifier

(30)

5: Multilayer perceptron

5.1: Structure and notation

(34)

5.2: Initialization

5.2.1: Input means and standard deviations

5.2.2: Randomizing the input weights

(36)

5.3: First-order learning algorithms

5.3.1: Backpropagation algorithm

5.3.2: Training lemmas

(41)

(42)

(43)

5.4: Second-order learning algorithms

5.4.1: Newton’s method

(45)

5.4.2: LM algorithm

6: KL divergence