Ebook906 pages8 hours

Federated Learning: Theory and Practice

Name: Federated Learning: Theory and Practice
ISBN: 9780443190384

By Pin-Yu Chen

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Federated Learning: Theory and Practi ce provides a holisti c treatment to federated learning as a distributed learning system with various forms of decentralized data and features. Part I of the book begins with a broad overview of opti mizati on fundamentals and modeling challenges, covering various aspects of communicati on effi ciency, theoretical convergence, and security. Part II features
emerging challenges stemming from many socially driven concerns of federated learning as a future public machine learning service. Part III concludes the book with a wide array of industrial applicati ons of federated learning, as well as ethical considerations, showcasing its immense potential for driving innovation while safeguarding sensitive data.

Federated Learning: Theory and Practi ce provides a comprehensive and accessible introducti on to federated learning which is suitable for researchers and students in academia, and industrial practitioners who seek to leverage the latest advance in machine learning for their entrepreneurial endeavors.

Presents the fundamentals and a survey of key developments in the field of federated learning
Provides emerging, state-of-the art topics that build on fundamentals
Contains industry applications
Gives an overview of visions of the future

Skip carousel

Intelligence (AI) & Semantics

LanguageEnglish

PublisherAcademic Press

Release dateFeb 9, 2024

ISBN9780443190384

Related to Federated Learning

Related ebooks

Skip carousel

Meta Learning With Medical Imaging and Health Informatics Applications
Ebook
Meta Learning With Medical Imaging and Health Informatics Applications
byHien Van Nguyen
Rating: 0 out of 5 stars
0 ratings
Intelligent Data Analysis for e-Learning: Enhancing Security and Trustworthiness in Online Learning Systems
Ebook
Intelligent Data Analysis for e-Learning: Enhancing Security and Trustworthiness in Online Learning Systems
byJorge Miguel
Rating: 5 out of 5 stars
5/5
Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications
Ebook
Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications
byArun Kumar Sangaiah
Rating: 0 out of 5 stars
0 ratings
Intelligent Systems and Learning Data Analytics in Online Education
Ebook
Intelligent Systems and Learning Data Analytics in Online Education
bySanti Caballé
Rating: 0 out of 5 stars
0 ratings
Trolley Crash: Approaching Key Metrics for Ethical AI Practitioners, Researchers, and Policy Makers
Ebook
Trolley Crash: Approaching Key Metrics for Ethical AI Practitioners, Researchers, and Policy Makers
byPeggy Wu
Rating: 0 out of 5 stars
0 ratings
Managing the Web of Things: Linking the Real World to the Web
Ebook
Managing the Web of Things: Linking the Real World to the Web
byMichael Sheng
Rating: 0 out of 5 stars
0 ratings
Human-Machine Shared Contexts
Ebook
Human-Machine Shared Contexts
byWilliam Lawless
Rating: 0 out of 5 stars
0 ratings
Risk Management Framework: A Lab-Based Approach to Securing Information Systems
Ebook
Risk Management Framework: A Lab-Based Approach to Securing Information Systems
byJames Broad
Rating: 2 out of 5 stars
2/5
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
Ebook
Security and Resilience in Intelligent Data-Centric Systems and Communication Networks
byMassimo Ficco
Rating: 0 out of 5 stars
0 ratings
Eleventh Hour Security+: Exam SY0-201 Study Guide
Ebook
Eleventh Hour Security+: Exam SY0-201 Study Guide
byIdo Dubrawsky
Rating: 0 out of 5 stars
0 ratings
FISMA Compliance Handbook: Second Edition
Ebook
FISMA Compliance Handbook: Second Edition
byLaura P. Taylor
Rating: 5 out of 5 stars
5/5
Novel AI and Data Science Advancements for Sustainability in the Era of COVID-19
Ebook
Novel AI and Data Science Advancements for Sustainability in the Era of COVID-19
byVictor Chang
Rating: 0 out of 5 stars
0 ratings
Digital Image Enhancement and Reconstruction
Ebook
Digital Image Enhancement and Reconstruction
byShyam Singh Rajput
Rating: 0 out of 5 stars
0 ratings
Web Semantics: Cutting Edge and Future Directions in Healthcare
Ebook
Web Semantics: Cutting Edge and Future Directions in Healthcare
bySarika Jain
Rating: 0 out of 5 stars
0 ratings
Pervasive Computing: Next Generation Platforms for Intelligent Data Collection
Ebook
Pervasive Computing: Next Generation Platforms for Intelligent Data Collection
byCiprian Dobre
Rating: 5 out of 5 stars
5/5
Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 1: Theory, Case Studies, and Applications
Ebook
Cognitive Informatics, Computer Modelling, and Cognitive Science: Volume 1: Theory, Case Studies, and Applications
byG. R. Sinha
Rating: 0 out of 5 stars
0 ratings
Modeling, Identification, and Control for Cyber- Physical Systems Towards Industry 4.0
Ebook
Modeling, Identification, and Control for Cyber- Physical Systems Towards Industry 4.0
byPaolo Mercorelli
Rating: 0 out of 5 stars
0 ratings
Data Fusion Techniques and Applications for Smart Healthcare
Ebook
Data Fusion Techniques and Applications for Smart Healthcare
byAmit Kumar Singh
Rating: 0 out of 5 stars
0 ratings
Managing Trade-offs in Adaptable Software Architectures
Ebook
Managing Trade-offs in Adaptable Software Architectures
byIvan Mistrik
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Robot Perception and Cognition
Ebook
Deep Learning for Robot Perception and Cognition
byAlexandros Iosifidis
Rating: 4 out of 5 stars
4/5
Big Data: Principles and Paradigms
Ebook
Big Data: Principles and Paradigms
byRajkumar Buyya
Rating: 0 out of 5 stars
0 ratings
Innovation in Health Informatics: A Smart Healthcare Primer
Ebook
Innovation in Health Informatics: A Smart Healthcare Primer
byMiltiadis Lytras
Rating: 0 out of 5 stars
0 ratings
Semantic Models in IoT and eHealth Applications
Ebook
Semantic Models in IoT and eHealth Applications
bySanju Tiwari
Rating: 0 out of 5 stars
0 ratings
Contemporary Digital Forensic Investigations of Cloud and Mobile Applications
Ebook
Contemporary Digital Forensic Investigations of Cloud and Mobile Applications
byKim-Kwang Raymond Choo
Rating: 0 out of 5 stars
0 ratings
CISSP Study Guide
Ebook
CISSP Study Guide
byEric Conrad
Rating: 3 out of 5 stars
3/5
Internet of Things: Principles and Paradigms
Ebook
Internet of Things: Principles and Paradigms
byRajkumar Buyya
Rating: 4 out of 5 stars
4/5
Advances in Computational Techniques for Biomedical Image Analysis: Methods and Applications
Ebook
Advances in Computational Techniques for Biomedical Image Analysis: Methods and Applications
byDeepika Koundal
Rating: 0 out of 5 stars
0 ratings
Intelligent Edge Computing for Cyber Physical Applications
Ebook
Intelligent Edge Computing for Cyber Physical Applications
byD. Jude Hemanth
Rating: 0 out of 5 stars
0 ratings
Academic Press Library in Signal Processing, Volume 6: Image and Video Processing and Analysis and Computer Vision
Ebook
Academic Press Library in Signal Processing, Volume 6: Image and Video Processing and Analysis and Computer Vision
byElsevier Books Reference
Rating: 0 out of 5 stars
0 ratings
Data Deduplication Approaches: Concepts, Strategies, and Challenges
Ebook
Data Deduplication Approaches: Concepts, Strategies, and Challenges
byTin Thein Thwel
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C. Lennox
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Summary of Super-Intelligence From Nick Bostrom
Ebook
Summary of Super-Intelligence From Nick Bostrom
bySummary Station
Rating: 5 out of 5 stars
5/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Impromptu: Amplifying Our Humanity Through AI
Ebook
Impromptu: Amplifying Our Humanity Through AI
byReid Hoffman
Rating: 5 out of 5 stars
5/5
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Dancing with Qubits: How quantum computing works and how it can change the world
Ebook
Dancing with Qubits: How quantum computing works and how it can change the world
byRobert S. Sutor
Rating: 5 out of 5 stars
5/5
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 4 out of 5 stars
4/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
10 Great Ways to Earn Money Through Artificial Intelligence(AI)
Ebook
10 Great Ways to Earn Money Through Artificial Intelligence(AI)
byAli Musa
Rating: 5 out of 5 stars
5/5
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Our Final Invention: Artificial Intelligence and the End of the Human Era
Ebook
Our Final Invention: Artificial Intelligence and the End of the Human Era
byJames Barrat
Rating: 4 out of 5 stars
4/5
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
Ebook
Humans Need Not Apply: A Guide to Wealth & Work in the Age of Artificial Intelligence
byJerry Kaplan
Rating: 4 out of 5 stars
4/5
The Age of AI: Artificial Intelligence and the Future of Humanity
Ebook
The Age of AI: Artificial Intelligence and the Future of Humanity
byJason Thacker
Rating: 0 out of 5 stars
0 ratings
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
Ebook
What Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions
byJasmine Wang
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Hacking With Linux 2020:A Complete Beginners Guide to the World of Hacking Using Linux - Explore the Methods and Tools of Ethical Hacking with Linux
Ebook
Hacking With Linux 2020:A Complete Beginners Guide to the World of Hacking Using Linux - Explore the Methods and Tools of Ethical Hacking with Linux
byJoseph Kenna
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT
Ebook
Mastering ChatGPT
byCharles J. Jones
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Privacy Engineering at CMU and Privacy Decision Making with Dr. Lorrie Cranor: Dr. Lorrie Cranor began her career in privacy 25 years ago and has been a professor at Carnegie Mellon University in the School of Computer Science for 19 years. Today, she serves as director and professor for the CMU privacy engineering program.In this ...
Podcast episode
Privacy Engineering at CMU and Privacy Decision Making with Dr. Lorrie Cranor: Dr. Lorrie Cranor began her career in privacy 25 years ago and has been a professor at Carnegie Mellon University in the School of Computer Science for 19 years. Today, she serves as director and professor for the CMU privacy engineering program.In this ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
eQMS in Academia: Practical Learning for Biomedical Engineering Students: Have you ever thought about the versatility of an eQMS? As it turns out, the use of one medical device eQMS solution in particular is extending across multiple sectors.In this episode of the Global Medical Device Podcast, Jon Speer talks to R...
Podcast episode
eQMS in Academia: Practical Learning for Biomedical Engineering Students: Have you ever thought about the versatility of an eQMS? As it turns out, the use of one medical device eQMS solution in particular is extending across multiple sectors.In this episode of the Global Medical Device Podcast, Jon Speer talks to R...
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
Alignment Newsletter #167: Concrete ML safety problems and their relevance to x-risk: Concrete ML safety problems and their relevance to x-risk
Podcast episode
Alignment Newsletter #167: Concrete ML safety problems and their relevance to x-risk: Concrete ML safety problems and their relevance to x-risk
byAlignment Newsletter Podcast
0 ratings
0% found this document useful
#338: Site Selection for Clinical Trials
Podcast episode
#338: Site Selection for Clinical Trials
byGlobal Medical Device Podcast powered by Greenlight Guru
0 ratings
0% found this document useful
Why Microservices Are Better Than Cloud Computing: This episode on Systems—one of the four Domains of Data Science UVA uses to define the field—explores the challenges of cloud computing within the framework of biomedical research. Phil Bourne, Dean of the UVA School of Data Science, speaks with computational biologist and associate professor Nathan Sheffield about a paper they co-wrote on systemic issues from cloud platforms that do not support FAIRness, including platform lock-in, poor integration across platforms, and duplicated efforts for users and developers. They suggest instead prioritizing microservices and access to modular data in smaller chunks or summarized form. Emphasizing modularity and interoperability would lead to a more powerful Unix-like ecosystem of web services for biomedical analysis and data retrieval. The two discuss how funders, developers, and researchers can support microservices as the next generation of cloud-based bioinformatics. From Cloud Computing to
Podcast episode
Why Microservices Are Better Than Cloud Computing: This episode on Systems—one of the four Domains of Data Science UVA uses to define the field—explores the challenges of cloud computing within the framework of biomedical research. Phil Bourne, Dean of the UVA School of Data Science, speaks with computational biologist and associate professor Nathan Sheffield about a paper they co-wrote on systemic issues from cloud platforms that do not support FAIRness, including platform lock-in, poor integration across platforms, and duplicated efforts for users and developers. They suggest instead prioritizing microservices and access to modular data in smaller chunks or summarized form. Emphasizing modularity and interoperability would lead to a more powerful Unix-like ecosystem of web services for biomedical analysis and data retrieval. The two discuss how funders, developers, and researchers can support microservices as the next generation of cloud-based bioinformatics. From Cloud Computing to
byUVA Data Points
0 ratings
0% found this document useful
Ep 46: Dr. Sam Johnston on Design-based Research: On this episode, I am joined by Dr. Sam Johnston, a research scientist with The Center for Applied Special Technology, or CAST. With support of the Gates Foundation’s Open Professionals Education Network, she recently led the development of UDL On...
Podcast episode
Ep 46: Dr. Sam Johnston on Design-based Research: On this episode, I am joined by Dr. Sam Johnston, a research scientist with The Center for Applied Special Technology, or CAST. With support of the Gates Foundation’s Open Professionals Education Network, she recently led the development of UDL On...
byResearch in Action | A podcast for faculty & higher education professionals on research design, methods, productivity & more
0 ratings
0% found this document useful
Data Protocol’s Privacy Engineering Certificate Course with Jake Ward: Data Protocol is a developer education platform designed specifically to serve the learning styles and needs of engineers. The platform includes a live terminal environment and immersive platform to teach, train, and certify professionals.Companies like ...
Podcast episode
Data Protocol’s Privacy Engineering Certificate Course with Jake Ward: Data Protocol is a developer education platform designed specifically to serve the learning styles and needs of engineers. The platform includes a live terminal environment and immersive platform to teach, train, and certify professionals.Companies like ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
Platform Engineering at a FAANG Company
Podcast episode
Platform Engineering at a FAANG Company
byThe Cloudcast
0 ratings
0% found this document useful
Live from TWIMLcon! Operationalizing Responsible AI - #310: An often forgotten about topic garnered high praise at TWIMLcon this month: operationalizing responsible and ethical AI. This important topic was combined with an impressive panel of speakers, including: Rachel Thomas, Director, Center for Applied...
Podcast episode
Live from TWIMLcon! Operationalizing Responsible AI - #310: An often forgotten about topic garnered high praise at TWIMLcon this month: operationalizing responsible and ethical AI. This important topic was combined with an impressive panel of speakers, including: Rachel Thomas, Director, Center for Applied...
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
What is Facilitated Communication? Session 199 with Jason Travers: If your social media consumption is anything like mine, you've likely seen some as of late that report on non-speaking students - generally students with Autism - who are graduating from college, giving valedictorian speeches, and so...
Podcast episode
What is Facilitated Communication? Session 199 with Jason Travers: If your social media consumption is anything like mine, you've likely seen some as of late that report on non-speaking students - generally students with Autism - who are graduating from college, giving valedictorian speeches, and so...
byThe Behavioral Observations Podcast with Matt Cicoria
0 ratings
0% found this document useful
PMP BUSINESS (Task 1) COMPLIANCE ??Day #32
Podcast episode
PMP BUSINESS (Task 1) COMPLIANCE ??Day #32
byPMP Exam Success in 40 Days! - Project Management 101
0 ratings
0% found this document useful
The State of Software Supply Chain Security and Trends in Managing Threats - Eric Tice - ESW #292: Discuss the current state of the industry as it relates to the threats to application usage of open source. Understand what is being done to define risk, improve education and provide ways to proactively mitigate those risks. Segment Resources:...
Podcast episode
The State of Software Supply Chain Security and Trends in Managing Threats - Eric Tice - ESW #292: Discuss the current state of the industry as it relates to the threats to application usage of open source. Understand what is being done to define risk, improve education and provide ways to proactively mitigate those risks. Segment Resources:...
byEnterprise Security Weekly (Video)
0 ratings
0% found this document useful
The State of Software Supply Chain Security and Trends in Managing Threats - Eric Tice - ESW #292: Discuss the current state of the industry as it relates to the threats to application usage of open source. Understand what is being done to define risk, improve education and provide ways to proactively mitigate those risks. Segment Resources:...
Podcast episode
The State of Software Supply Chain Security and Trends in Managing Threats - Eric Tice - ESW #292: Discuss the current state of the industry as it relates to the threats to application usage of open source. Understand what is being done to define risk, improve education and provide ways to proactively mitigate those risks. Segment Resources:...
bySecurity Weekly Podcast Network (Video)
0 ratings
0% found this document useful
Episode 234 - Palm Print Black Box Study: In this episode, Eric and Glenn are joined by Dr.…
Podcast episode
Episode 234 - Palm Print Black Box Study: In this episode, Eric and Glenn are joined by Dr.…
byDouble Loop Podcast
0 ratings
0% found this document useful
Data Selection for Data-Centric AI: Data Quality Over Quantity // Cody Coleman // Coffee Sessions #59
Podcast episode
Data Selection for Data-Centric AI: Data Quality Over Quantity // Cody Coleman // Coffee Sessions #59
byMLOps.community
0 ratings
0% found this document useful
Europe & Privacy, Why It Matters to Security Pros - Isabelle Roccia - ESW #302: Europe is a global driver for privacy rules and digital legislation. Which means it is also a force to be reckoned with when it comes to enforcement. With privacy and security being so intertwined, this conversation will focus on the current mindset...
Podcast episode
Europe & Privacy, Why It Matters to Security Pros - Isabelle Roccia - ESW #302: Europe is a global driver for privacy rules and digital legislation. Which means it is also a force to be reckoned with when it comes to enforcement. With privacy and security being so intertwined, this conversation will focus on the current mindset...
byEnterprise Security Weekly (Video)
0 ratings
0% found this document useful
Europe & Privacy, Why It Matters to Security Pros - Isabelle Roccia - ESW #302: Europe is a global driver for privacy rules and digital legislation. Which means it is also a force to be reckoned with when it comes to enforcement. With privacy and security being so intertwined, this conversation will focus on the current mindset...
Podcast episode
Europe & Privacy, Why It Matters to Security Pros - Isabelle Roccia - ESW #302: Europe is a global driver for privacy rules and digital legislation. Which means it is also a force to be reckoned with when it comes to enforcement. With privacy and security being so intertwined, this conversation will focus on the current mindset...
bySecurity Weekly Podcast Network (Video)
0 ratings
0% found this document useful
Journal Review in Surgical Education: The OR Black Box
Podcast episode
Journal Review in Surgical Education: The OR Black Box
byBehind The Knife: The Surgery Podcast
0 ratings
0% found this document useful
Making Sense of Chaos with Doyne Farmer
Podcast episode
Making Sense of Chaos with Doyne Farmer
byThinkers & Ideas
0 ratings
0% found this document useful
The Cloudcast #355 - Exploring IoT Edge
Podcast episode
The Cloudcast #355 - Exploring IoT Edge
byThe Cloudcast
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
PSW #768 - Robert Martin: In the Security News: The Roblox prison yard, password manager problems, PyTorch gets torched with a supply chain attack, Oppenheimer cleared, Puckungfu, spice up your persistence with PHP, turning Google home into a wiretap device, Nintendo 3DS...
Podcast episode
PSW #768 - Robert Martin: In the Security News: The Roblox prison yard, password manager problems, PyTorch gets torched with a supply chain attack, Oppenheimer cleared, Puckungfu, spice up your persistence with PHP, turning Google home into a wiretap device, Nintendo 3DS...
bySecurity Weekly Podcast Network (Audio)
0 ratings
0% found this document useful
Blumira Sponsor #3 - Emily Eubanks, more actionable events, incident response help, and more: In this sponsored BDS episode, Bryan Brake and Amanda Berlin interview Emily Eubanks, a Security Operations Analyst for #Blumira. We discuss common business risks like IT staff turnover, a lack of Incident Response procedures, choosing not to follow...
Podcast episode
Blumira Sponsor #3 - Emily Eubanks, more actionable events, incident response help, and more: In this sponsored BDS episode, Bryan Brake and Amanda Berlin interview Emily Eubanks, a Security Operations Analyst for #Blumira. We discuss common business risks like IT staff turnover, a lack of Incident Response procedures, choosing not to follow...
byBrakeSec Education Podcast
0 ratings
0% found this document useful
CERAWeek: How energy transition discussions are shifting: This week the ESG Insider podcast is covering key themes from one of the world’s largest energy conferences — the annual CERAWeek gathering hosted by S&P Global in Houston, Texas. The event convenes stakeholders from across sectors...
Podcast episode
CERAWeek: How energy transition discussions are shifting: This week the ESG Insider podcast is covering key themes from one of the world’s largest energy conferences — the annual CERAWeek gathering hosted by S&P Global in Houston, Texas. The event convenes stakeholders from across sectors...
byESG Insider: A podcast from S&P Global
0 ratings
0% found this document useful
Developer Security
Podcast episode
Developer Security
byThe Cloudcast
0 ratings
0% found this document useful
Journal Review in Surgical Education: Artificial Intelligence
Podcast episode
Journal Review in Surgical Education: Artificial Intelligence
byBehind The Knife: The Surgery Podcast
0 ratings
0% found this document useful
The Future of Chemistry Education
Podcast episode
The Future of Chemistry Education
byThe Analytical Wavelength
0 ratings
0% found this document useful
Build Your Own Data Pipeline - Andreas Kretz
Podcast episode
Build Your Own Data Pipeline - Andreas Kretz
byDataTalks.Club
0 ratings
0% found this document useful
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
Podcast episode
Jeremiah Lowin – Machine Learning in Investing – [Invest Like the Best, EP.105]: My guest this week is one of my best and oldest friends, Jeremiah Lowin. Jeremiah has had a fascinating career, starting with advanced work in statistics before moving into the risk management field in the hedge fund world. Through his career he has studi
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
From Campus to Spreadsheet: The Microsoft Excel Collegiate Challenge Journey: Welcome to Financial Modeler's Corner (FMC) where we discuss the art and science of financial modeling with your host Paul Barnhurst. Financial Modeler's Corner is sponsored by Financial Modeling Institute (FMI) the most respected accreditation in...
Podcast episode
From Campus to Spreadsheet: The Microsoft Excel Collegiate Challenge Journey: Welcome to Financial Modeler's Corner (FMC) where we discuss the art and science of financial modeling with your host Paul Barnhurst. Financial Modeler's Corner is sponsored by Financial Modeling Institute (FMI) the most respected accreditation in...
byFinancial Modeler's Corner
0 ratings
0% found this document useful

Skip carousel

Federated Learning Uses The Data Right On Our Devices
Futurity
Article
Federated Learning Uses The Data Right On Our Devices
Jul 21, 2022
2 min read
Building PCs
Linux Format
Article
Building PCs
Apr 7, 2020
2 min read
What Do Academics Think?
The Big Issue Magazine
Article
What Do Academics Think?
May 19, 2023
3 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
How To Make Sense From And With AI ?
The European Business Review
Article
How To Make Sense From And With AI ?
Sep 25, 2021
4 min read
Opinion: Federated Learning: Collaboration Without Compromise For Health Care Research
STAT
Article
Opinion: Federated Learning: Collaboration Without Compromise For Health Care Research
Feb 13, 2020
Here's a new way to learn from massive collections of data while avoiding the privacy and other risks typically associated with sharing such information: federated learning.
3 min read
Education’s Big Leap
Business Today
Article
Education’s Big Leap
May 28, 2020
5 min read
Rise Of The Net Trainers
Business Today
Article
Rise Of The Net Trainers
Apr 2, 2018
7 min read
Machine Learning Makes A Cost-effective Environmental Watchdog
Futurity
Article
Machine Learning Makes A Cost-effective Environmental Watchdog
Oct 10, 2018
Machine learning could help safeguard public health and spot environmental dangers, according to new research. As Hurricane Florence ground its way through North Carolina, it released what might politely be called an excrement storm. Massive hog farm
3 min read
We Told OMB Why EPA’s So-Called Transparency Rule Is a Trap
Union of Concerned Scientists
Article
We Told OMB Why EPA’s So-Called Transparency Rule Is a Trap
Oct 2, 2020
2 min read
Tailor Made
Business Today
Article
Tailor Made
Aug 5, 2019
6 min read
Rebuilding A Critical American Industry
Fast Company
Article
Rebuilding A Critical American Industry
May 2, 2023
2 min read
Global Classrooms | E-Learning
India Today
Article
Global Classrooms | E-Learning
Jul 12, 2019
It doesn't matter which college you go to or what field your career is in; e-learning platforms can augment your skill set. Five experts on why massive online open courses are the way forward: There is a notable demand for educational programmes from
4 min read
Why We Adopt And Then Ditch Online Security Tips
Futurity
Article
Why We Adopt And Then Ditch Online Security Tips
Apr 29, 2020
2 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read
Increasing Driver Safety
Australasian Transport News (ATN)
Article
Increasing Driver Safety
Nov 18, 2019
2 min read
Championing Triple Ownership Programs (TOPs) In Executive Education
The European Business Review
Article
Championing Triple Ownership Programs (TOPs) In Executive Education
Mar 1, 2022
6 min read
Bolster Your Cyber Security
MoneyWeek
Article
Bolster Your Cyber Security
Apr 5, 2024
2 min read
Classes Go Hi-Tech
Business Today
Article
Classes Go Hi-Tech
Jun 10, 2021
6 min read
There’s A New Career In Town
True Love
Article
There’s A New Career In Town
Oct 21, 2019
2 min read
College, Coursework, and Covid
Manhattan Institute
Article
College, Coursework, and Covid
Aug 4, 2020
4 min read
GUARDIAN Angel
Deals On Wheels Australia
Article
GUARDIAN Angel
May 11, 2020
5 min read
Magic In The Sky
CQ Amateur Radio
Article
Magic In The Sky
Oct 1, 2020
The opinions expressed in this column are those of the author and do not necessarily reflect the views of CQ. – ed. Given that we know this year is like no other, “the new normal” is a considerable distance from “the old normal” in ways too numerous
6 min read
AI Teaches Brain Tumor Surgery Better Than Human Experts
Futurity
Article
AI Teaches Brain Tumor Surgery Better Than Human Experts
Feb 24, 2022
2 min read
Opinion: Teach Medical Students To Use Diagnostic Tools With Their Patients
STAT
Article
Opinion: Teach Medical Students To Use Diagnostic Tools With Their Patients
Sep 19, 2018
Medical students deserve to get consistent training in skills that define high quality, evidence-based, modern medical practice.
3 min read
The Tech Takeover
Business Today
Article
The Tech Takeover
Oct 15, 2020
6 min read
App Developer Ad Choices Can Leave Users At Risk
Futurity
Article
App Developer Ad Choices Can Leave Users At Risk
Aug 14, 2019
2 min read
A Method Of Foresight In The Field Of Strategy Innovation
The European Business Review
Article
A Method Of Foresight In The Field Of Strategy Innovation
Feb 25, 2021
This research consisted of two phases. Phase 1. Information gathering: systematic collection of relevant signals and trends, informed by academic knowledge and consulting company reports. Phase 2. Diagnosis is a three-step exercise: In-Depth Ana
1 min read
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Union of Concerned Scientists
Article
Updated Restricted Science Rule Spells Reanalysis Paralysis for the EPA
Nov 12, 2019
7 min read
Let Your Supply Chains Be Sustainable Too: SOUND ADVICE From LUXEMBOURG
The European Business Review
Article
Let Your Supply Chains Be Sustainable Too: SOUND ADVICE From LUXEMBOURG
Mar 31, 2020
4 min read

Related categories

Skip carousel

Reviews for Federated Learning

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Federated Learning - Lam M. Nguyen

Part 1: Optimization fundamentals for secure federated learning

Outline

Chapter 1. Gradient descent-type methods

Chapter 2. Considerations on the theory of training models with differential privacy

Chapter 3. Privacy-preserving federated learning: algorithms and guarantees

Chapter 4. Assessing vulnerabilities and securing federated learning

Chapter 5. Adversarial robustness in federated learning

Chapter 6. Evaluating gradient inversion attacks and defenses

Chapter 1: Gradient descent-type methods

Background and simple unified convergence analysis

Quoc Tran-Dinha; Marten van Dijkb,c aDepartment of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

bCWI, Amsterdam, Netherlands

cVU University Amsterdam, Amsterdam, Netherlands

Abstract

In this book chapter, we briefly describe the main components that constitute the gradient descent method and its accelerated and stochastic variants. We aim at explaining these components from a mathematical point of view, including theoretical and practical aspects, but at an elementary level. We will focus on basic variants of the gradient descent method and then extend our view to recent variants, especially variance-reduced stochastic gradient schemes (SGD). Our approach relies on revealing the structures presented inside the problem and the assumptions imposed on the objective function. Our convergence analysis unifies several known results and relies on a general, but elementary recursive expression. We have illustrated this analysis on several common schemes.

Keywords

Gradient descent method; Stochastic gradient method; Convex and nonconvex optimization; Convergence guarantees; Complexity estimates

Acknowledgements

The work of Q. Tran-Dinh is partly supported by the Office of Naval Research [grant number ONR-N00014-20-1-2088] (2020–2023) and [grant number ONR-N00014-23-1-2588 (2023-2026)], and the National Science Foundation (NSF) [grant number NSF DMS-2134107] (2022-2027).

1.1 Introduction

The core problem in many optimization applications such as signal and image processing, engineering, operations research, statistics, and machine learning is the following optimization problem, see, e.g., [1,2,7,27,28,49,60]:

(1.1)

where is a given objective or loss function, and w is a vector of decision variables (also called model parameters). Depending on the form or structures of F, we obtain different classes of optimization problems. For instance, the following structures are common in practice:

• Nonsmooth convex optimization. If F is M-Lipschitz continuous (i.e., there exists such that for all ) and convex, but often nonsmooth, then (1.1) is called a nonsmooth convex minimization. Note that the M-Lipschitz continuity is often imposed for nonsmooth functions such as for any norm, or for special smooth functions, e.g., the objective function of a logistic regression, where is given for . Obviously, the Lipschitz continuity also holds if we consider F to be continuous on a given compact set .

• Smooth and convex optimization. If F is L-smooth (i.e., there exists such that for all ) and convex, then (1.1) is called a smooth and convex minimization. Examples of L-smooth and convex functions are vast. For example, a least-squares function for a given data matrix X and an output vector y is L-smooth with . The logistic regression function above is also convex and L-smooth with . However, exponential functions such as or logarithmic functions such as are convex, but not L-smooth on their domain, unless we limit their domain on a given compact set, see, e.g., [64].

• Smooth and nonconvex optimization. If F is L-smooth and nonconvex, then (1.1) is called a smooth and nonconvex minimization. The L-smoothness is a key condition required in most gradient-based methods for nonconvex optimization. Again, this assumption obviously holds if we assume that F is continuously differentiable and then limit the domain of F to a compact set. But there exist L-smooth functions on the entire space . For instance, for given symmetric matrix Q and is L-smooth with , but not necessarily convex.

• Composite optimization. If , where f is usually L-smooth and convex/nonconvex, and g is convex and possibly nonsmooth, then (1.1) is called an [additive] composite minimization. This model is ubiquitous in machine learning and statistical learning, where f presents a loss function or a data fidelity term, while g is a regularizer or a penalty term to promote solution structures or to handle constraints. Examples can be found, e.g., in [10,52]. If is the indicator of a convex set as if , and , otherwise, then (1.1) covers the constrained problem .

• Finite-sum optimization. If for some , then (1.1) is called a finite-sum minimization, an empirical risk minimization, or a distributed optimization depending on the context. This structure is presented in most supervised learning tasks, network and distributed optimization, and federated learning. The most interesting case is when .

• Stochastic optimization. If , the expectation of a stochastic function , where is a given probability space, then (1.1) is called a stochastic program [34,43,59]. This setting also covers the finite-sum as a special case by setting and .

Apart from these individual settings, many other combinations between them are also possible; we do not list all of these here. For example, the combination of a composite structure and finite-sum is very common.

Existence of solutions We first assume that is bounded from below, i.e., to guarantee the well-definedness of (1.1). Many machine learning applications automatically satisfy this condition since the underlying loss function is usually nonnegative. One obvious example is the least-squares problem.

Our next question is: Does (1.1) have an optimal solution? To discuss this aspect, we use a coercivity concept from nonlinear analysis [18]. We say that F is coercive if . A common coercive function is , where is M-Lipschitz continuous (but not necessarily convex) and . If F is continuous and coercive, then by the well-known Weierstrass theorem, (1.1) has global optimal solutions . In this case, we denote , its optimal value. If F is nonconvex and differentiable, then we use to denote its stationary points, i.e., . If F is not differentiable, a generalization of stationary points is required [40]. To keep it simple, we assume throughout this chapter that F is continuously differentiable.

If F is strongly convex, then it is continuous and coercive and (1.1) has a unique global optimal solution. For convex problems, our goal is to find an approximate global solution of in some sense (see Section 1.2.7). For nonconvex problems, we only expect to compute an approximate stationary point , which can be a candidate for a local minimizer. However, we do not attempt to check if it is an approximate local minimizer or not in this chapter.

Contribution Our contribution can be summarized as follows. We provide a comprehensive discussion for the main components of the gradient descent method and its variants, including stochastic schemes. We also propose a unified and simple approach to analyze convergence rates of these algorithms, and demonstrate it through concrete schemes. This approach can perhaps be extended to analyzing other algorithms, which are not covered in this chapter. We also discuss some enhanced implementation aspects of the basic algorithms.

Outline The rest of this chapter is organized as follows. Section 1.2 reviews basic components of gradient methods. Section 1.3 focuses on stochastic gradient methods, while Section 1.4 makes some concluding remarks and raises a few possible research directions.

1.2 Basic components of GD-type methods

The gradient descent (GD) method is an iterative method aimed at finding an approximate solution of (1.1). This dates back to the works of Cauchy in the 19th century, and has been intensively studied in numerical analysis, including optimization for many decades. During the last two decades, there has been a great surge in first-order methods, especially gradient-type algorithms, due to applications in signal and image processing, modern statistics, and machine learning. In this chapter, we do not attempt to review the literature of GD-type methods, but only focus on summarizing their basic components.

Formally, the gradient descent algorithm starts from a given initial point and, at each iteration , it updates

(1.2)

where is the current iterate, is called a step-size or learning rate, is called a search direction, and is an operator to handle constraints or regularizers; if this is not needed, one can set , the identity operator. This method generates a sequence of iterate vectors using only first-order information of F (e.g., function values, proximal operators, or [sub]gradients). Here, we add an operator , which can also be used to handle constraints, regularizers, penalty, or Bregman distance (e.g., mapping between the primal and dual spaces). Let us discuss each component of the scheme (1.2).

1.2.1 Search direction

The most important component in (1.2) is the search direction , which determines the type of algorithm such as first-order, second-order, quasi-Newton-type, or stochastic methods. Let us consider the following possibilities:

• Descent direction. Assuming F is continuously differentiable, a search direction is called a descent direction at the iterate if . We can even impose a stronger condition, called a direction of strict descent, which is for some . The name descent comes from the fact that if we move from along the direction with an appropriate stepsize , then we have a descent, i.e., . If , the identity operator, then (1.2) reduces to . By Taylor's expansion of F, we have

for sufficiently small due to .

• Steepest descent direction. If we take , then (1.2) becomes

(1.3)

and we have provided that is not a stationary point of F. With this choice of , we obtain a gradient descent or the so-called steepest descent method. It actually realizes the largest decrease of F at as for any such that .

• Stochastic gradient direction. If we choose to be a stochastic estimator of , then we obtain a stochastic approximation (also called stochastic gradient descent) method. A stochastic gradient direction is generally not a descent one, i.e., . Examples of stochastic estimators include standard unbiased estimator and its mini-batch version for a minibatch , and various variance-reduced estimators, see, e.g., [13,29,48,58,66].

• Newton and quasi-Newton direction. We can go beyond gradient-based methods by incorporating second-order information, or curvature of F, as , where is a given symmetric and invertible matrix. For instance, if , then we obtain a Newton method, while if is an approximation to , then we obtain a quasi-Newton method.

• Inexact descent direction. If we do not evaluate the gradient exactly, but allow some error as for some Gaussian noise , then we obtain an inexact or noisy gradient method [24]. Another example is called a sign gradient method, which uses , the sign of gradient, see, e.g., [41]. Inexact Newton-type methods compute by approximately solving such that for .

Apart from the above examples, other methods such as [block]-coordinate, incremental gradient, Frank–Wolfe or conditional gradient, proximal-point, prox-linear, Gauss–Newton, extragradient, optimistic gradient, and operator splitting techniques can also be written in the form (1.2) by formulating appropriate search directions . For instance, the proximal point method can be viewed as the gradient method applied to its Moreau's envelope, see, e.g., [56].

1.2.2 Step-size

The second important component of (1.2) is the step-size . In machine learning community, this quantity is called a learning rate. Choosing an appropriate is a crucial step that affects the performance of the algorithm. Classical optimization techniques have proposed several strategies, which are known as globalization strategies, including (i) line-search and its variants, (ii) trust-region, and (iii) filter [11,23,49]. Line-search and trust-region strategies have been widely used in numerical optimization, see [11,49]. In recent optimization algorithms and machine learning training tasks, we often observe the following techniques:

• Constant learning rate. Constant learning rates are usually used to derive convergence rates or complexity bounds due to their simplicity. The algorithm often performs well with a constant learning rate if the problem is easy, e.g., strongly convex and L-smooth, but it becomes poor if the landscape of F is complex such as in deep neural networks. Usually, theoretical analysis gives us a range (i.e., an interval like in standard gradient methods) to choose a constant learning rate. However, this range could easily be underestimated by using global parameters, and does not capture the desired region of optimal solutions. In practice, nevertheless, we need to tune this learning rate using different strategies such as grid search or bisection, etc.

• Diminishing learning rate. Diminishing learning rates are usually used in subgradient or stochastic gradient methods. One common diminishing learning rate is for some positive constant C, a shifting factor β, and an order . Depending on the structure assumptions of F, we can choose appropriate order ν, e.g., or . Another possibility is to choose for an additional integer s to maintain fixed learning rate in each s iterations. In stochastic gradient methods, diminishing learning rates are often required if the variance of is nondecreasing (i.e., is not computed from a variance-reduced estimator of ∇F). A diminishing learning rate also determines the convergence rate of the underlying algorithm.

• Scheduled learning rate. In practice, we often use a schedule to tune an appropriate learning rate so that we can achieve the best performance. Different ideas have been proposed such as using exponential decay rate, cosine annealing, or even with a varying mini-batch size, see, e.g., [37,62].

• Adaptive learning rate. The above strategies of learning rate selection do not take into account the local geometry of the objective function. They may perform poorly on hard problems. This motivates the use of adaptive learning rates which exploit local landscape or curvature of the objective function. The first common strategy is line-search, which [approximately] solves to find . If F is quadratic, then we can compute exactly by solving this one-variable minimization problem. However, most algorithms use inexact line-search such as bisection or golden ratio search. Another strategy is using a Barzilai–Borwein step-size, e.g., , which gives an estimate of . (Here, L is the Lipschitz constant of ∇F.)

Recently, several adaptive methods have been proposed, see, e.g., [12,17,32]. The underlying learning rate is usually given by , where is some gradient estimator at iteration j, is given, and is a small constant to avoid division by zero and provides numerical stability.

Among the above strategies and tricks for selecting learning rates, one can also compute them in different ways when solving a specific problem, even using a trial-and-error method or a combination of the above techniques. The main goal is to tune a good learning rate for such a problem, but still guarantee the convergence of the underlying algorithm.

1.2.3 Proximal operator

Many problems covered by (1.1) have constraints or nonsmooth objective terms. For example, we may have , where g is nonsmooth. In this case, we cannot use the full gradient of F. One way to handle the nonsmooth term g is to use proximal operators, and in particular, use the projections if we have simple constraints. Mathematically, the proximal operator of a proper, lower semicontinuous, and convex (or more generallly, a prox-regular) function g is defined as

(1.4)

Note that under appropriate choices of γ, the minimization problem in (1.4) is strongly convex, and hence has a unique solution, leading to the well-definedness of . If is the indicator function of a closed and convex set , i.e., if , and , otherwise, then reduces to the projection onto , i.e., . In terms of computation, evaluating is generally as hard as solving a [strongly] convex problem. There are at least three ways of evaluating :

• Separable functions. The most obvious case is when g is component-wise separable as (e.g., ), then evaluating requires solving p one-variable strongly convex minimization problems, which can be done in a closed form. This idea can be extended to block separable functions, e.g., , where are subvectors.

• Dual approach. Moreau's identity suggests that we can compute from its Fenchel conjugate . Since many convex functions have simple conjugates such as norms (e.g., ) or Lipschitz continuous functions, this approach is more tractable.

• Optimality approach. If g is differentiable, then we can directly use its optimality condition , and solve it as a nonlinear equation in z. Examples include and .

Note that the second and third techniques are only used for convex functions, while the first one can be used for nonconvex functions. The number of convex functions g where can be computed efficiently is vast, see, e.g., [1,51] for more examples and computational techniques.

1.2.4 Momentum

One way to explain the role of momentum is to use a dynamical system of the form rooted from Newton's second law, where presents a friction or a damping factor. If we discretize this differential equation using and , then we obtain

, leading to

, see, e.g., [61]. Therefore, we can specify momentum variants of (1.2) when (the identity operator) as follows:

(1.5)

where is a momentum step-size. The search direction can be evaluated at , leading to a so-called heavy ball method [54]. Alternatively, if is evaluated at an intermediate point, e.g., , then we obtain Nesterov's accelerated scheme in the convex case [44]. This scheme can be written into two steps as

(1.6)

where presents the direction evaluated at instead of . Note that momentum terms do not significantly add computational costs on top of (1.2). Yet, they can accelerate the algorithm in convex cases [44,46] (see also Section 1.2.8), and possibly in some nonconvex settings, see, e.g., [35,63].

1.2.5 Dual averaging variant

The scheme (1.2) can be viewed as a forward update, but in convex optimization, dual averaging schemes are also closely related to (1.2). Unlike (1.2), a dual averaging scheme works as follows. Starting from , for , we update

(1.7)

where are given dual directions (e.g., ), are the weights of , and is a given dual step-size. In general settings, we can replace by a general Bregman distance . If the norm is the Euclidean norm, then we have . If is fixed and we choose , then we have

, which is exactly covered by (1.2). Therefore, for the Euclidean norm , the dual averaging scheme (1.7) is identical to the gradient descent scheme . However, under a non-Euclidean norm or a Bregman distance, these methods are different from each other.

1.2.6 Structure assumptions

One main theoretical task when designing a gradient-based algorithm is to establish its convergence. From a computational perspective, estimating the convergence rate, as well as complexity, is also critically important. However, to establish these, we require F to satisfy a set of assumptions. The following structures are commonly used in optimization modeling and algorithms:

• Lipschitz continuity. Function F in (1.1) is said to be M-Lipschitz continuous if

(1.8)

Examples of Lipschitz continuous functions include norms, smoothed approximation of norms (e.g., for a small ϵ), or continuous functions with a bounded domain. Note that when F is convex, M-Lipschitz continuity is equivalent to M-bounded [sub]gradient, i.e., for all . This assumption is usually used in subgradient-type or stochastic gradient-type methods.

• L-smoothness. Function F is called L-smooth if the gradient ∇F of F satisfies

(1.9)

If for a compact domain and F is continuously differentiable, then F is L-smooth on . This concept can be extended to an L-average smoothness in the finite-sum or stochastic settings. For instance, if , then we can modify (1.9) as

for all . Alternatively, if , then we can use

for all . These assumptions are usually used in variance-reduction SGD methods, see, e.g., [52,66]. Note that other extensions are possible, see, e.g., [34]. Verifying the L-smoothness is generally not straightforward. However, if as, e.g., in a generalized linear model, then we can verify the L-smoothness of F by verifying the L-smoothness of each one-variable function . This model is ubiquitous in machine learning.

One key property of (1.9) is the following bound:

(1.10)

which shows that F can be globally upper bounded by a convex quadratic function and globally lower bounded by a concave quadratic function. If, additionally, F is convex, then stronger bounds as well as the co-coerciveness of ∇F can be obtained, see, e.g., [45]. One can also extend the L-smoothness of F to a Hölder smoothness as for some . This concept unifies both the L-smoothness ( ) and the bounded gradient ( ) conditions. It has been used in universal first-order methods for both deterministic and stochastic methods, e.g., [47].

• Convexity. Function F is said to be μ-[strongly] convex if

(1.11)

Here, can be a gradient or a subgradient of F at w. This inequality shows that F can be lower bounded by either a linear ( ) or a quadratic approximation ( ). If , then F is just convex or merely convex. If , then F is strongly convex, and μ is called the strong convexity parameter. If , then F is called weakly convex. Convexity and strong convexity are key concepts in convex analysis, optimization, and related fields, see, e.g., [7,57], and we do not further discuss them here.

These are three key and also basic assumptions to analyze the convergence of (1.2) and its variants. Nevertheless, other assumptions can also be exploited. For example, the following conditions are commonly used in different methods:

• Gradient dominance and PL condition. Function F is called σ-gradient dominant if for all and is a minimizer of F. Clearly, if F is strongly convex, then it is gradient dominant. However, there exist nonconvex functions that are gradient dominant. Note that one can consider local gradient dominance by limiting w in a neighborhood of . We can also extend this concept to different variants. The gradient dominance condition allows us to obtain a convergence guarantee on the objective residual even in the nonconvex setting. Note that this condition is also called Polyak–Łojasiewicz (PL) condition. These conditions can be used to establish linear convergence or linear-like convergence rates (i.e., linear converge to a small neighborhood of an optimal solution) [30,52].

• Uniform convexity and star-convexity. Function F is said to be μ-Hölder uniformly convex of order if

for all , see, e.g., [68]. Clearly, if , then we obtain the strong convexity. If and , a minimizer of F, then F is said to be μ-star strongly convex. These conditions are often used in gradient-type methods to establish linear convergence rates [42].

• Sharpness, quadratic growth, and error bound conditions. Assume that there exist and such that for all and a minimizer of F. If , then we say that F is sharped at . If , then we say that F has a quadratic growth property. Clearly, if F is strongly convex, then it has a quadratic growth property. However, nonconvex functions may still have a quadratic growth property. This property can be extended to an ω-convexity as in [14]. Another related concept is error bound [39], which is defined as for some and all . Both quadratic growth and error bound conditions can be used to establish [local] linear convergence of gradient-type methods, see, e.g., [16].

Other properties can be used to analyze convergence of gradient methods such as essential strong convexity, weak strong convexity, restricted secant inequality [30,42], Kurdyka–Łojasiewicz (KL) condition [4], and Aubin's property [56].

1.2.7 Optimality certification

Finding an exact solution of (1.1) is impractical. Our goal is to approximate a solution of this problem in some sense. Let us discuss what we can approximate for (1.1) in both convex and nonconvex problems.

Assume that is a global optimal solution of (1.1) with the optimal value and is an approximate solution produced by an algorithm. One obvious condition to certify the optimality is to compute the objective residual . We often expect to find such that for a given tolerance . This condition is usually used for convex optimization or special classes of nonconvex problems, e.g., under a gradient dominance condition. The construction of usually relies on two possible ways. The first is to simply take the last iterate as , where is the final iterate of the algorithm. The second option is to form an averaging or a weighted averaging vector as

(1.12)

where are given weights (usually related to the step-size , but could be different), and . In general, averaging vectors have better theoretical convergence rate guarantees, but they may break desired properties of solutions such as sparsity or low-rankness, etc., compared to the last-iterate . In convex optimization, we often use Jensen's inequality to obtain

for our convergence rate bounds since we obtain a convergence rate bound for the right-hand side.

The second criterion is to use the norm of gradient of F, e.g., or its squared norm. Note that only provides us stationary points, which are candidates for local minimizers in nonconvex settings. Hence, any vector such that for a given tolerance only provides us an approximate stationary point of (1.1). To guarantee an approximate local solution, we may add a second-order condition such as for some , where is the smallest eigenvalue of . The construction of in the nonconvex case often relies on the best iterate from , in the sense that . For nonsmooth optimization, where , we can use the norm of the gradient mapping for some . For stochastic optimization, one needs to characterize the optimality condition using expectation , , or high probability or for a small .

1.2.8 Unified convergence analysis

(a) General approach. Most convergence analysis of first-order methods of the form (1.2) relies on the following recursive inequality often generated by two or three consecutive iterates:

(1.13)

where , , and are nonnegative quantities, and is a contraction factor. Very often these quantities depend on two consecutive iterates and , but sometimes they also depend on . The error term usually satisfies . Moreover, we often have or for sublinear rates, and a fixed for linear rates. The quantity can be referred to as a potential or Lyapunov function. There is no general and universal method to construct , but for gradient-type methods, it is usually either , , , (in Euclidean or weighted norms), or a combination of these terms. Clearly, if , then is nonincreasing, showing a descent property of . However, if , then we no longer have a descent property of , which is usually the case in SGD or subgradient methods. There are two cases:

Case 1. If contains an optimality measure, e.g., , then we can show that for the last iterate , where C is a constant depending on and possibly on or .

Case 2. If contains an optimality measure, e.g., , then we can show that for some constant C and .

The recursive estimate (1.13) can be used to prove the convergence of different gradient-type methods, including standard and accelerated algorithms. Let us illustrate how to obtain (1.13) for some common schemes.

(b) Subgradient method. Let us consider the classical [sub]gradient method to minimize as , which is a special case of (1.2), where is a [sub]gradient of F at . Then, for any , we have

(1.14)

If F is convex, then . Combining this inequality and (1.14), we obtain

(1.15)

This inequality is exactly in the form (1.13) with . To guarantee convergence, we need to take as a solution of (1.1) and assume that . Then, (1.15) implies that . By induction, we have

. Therefore, we obtain

where and as computed by (1.12). To obtain a convergence rate bound, we require and . These are exactly the conditions to guarantee the convergence of [sub]gradient methods, see, e.g., [8].

(c) Gradient descent method for nonconvex problems. If we assume that F is only L-smooth and not necessarily convex, then using (1.10) with and , we have

(1.16)

By adding , where (our assumption), to both sides and rearranging the result, inequality (1.16) leads to

This is exactly in the form (1.13) with and . Without any further assumption, we have and, by induction, get , leading to

where and , provided that . This result allows us to certify the best-iterate convergence rate of the algorithm to a stationary point of (1.1).

(d) Gradient descent method for smooth and convex problems. Assume that F is convex and L-smooth. Let us choose in (1.2) to get . Then, from (1.14) and (1.16), and the convexity of F, we have

By summing up these three inequalities and canceling terms, we obtain

This is exactly (1.13) with , , , and . This recursive estimate implies , and therefore, using the definition of and dropping , we get

which shows an -last-iterate convergence rate on . It also implies that (by using ) and (by using ) for all .

(e) Accelerated gradient method for smooth and convex problems. Our last illustration follows Nesterov's accelerated gradient scheme:

(1.17)

where for such that with . This is an accelerated variant of (1.2) with the momentum . It is well-known [44] that, after a few elementary transformations, (1.17) can be written as

Let . Then, . Moreover, by convexity of F, we have

. Hence, multiplying both sides by , we obtain

Similar to the proof of (1.14) and (1.16), respectively, we have

Summing up the last three inequalities, we obtain

(1.18)

which is exactly (1.13) with and , provided that (note that and satisfy this condition). The recursive estimate (1.18) implies that , leading to

In particular, if we choose , then , then we get the last-iterate convergence guarantee .

We have illustrated how to employ the unified recursive expression (1.13) to analyze four different deterministic gradient-type algorithms. It provides a simple approach with a few lines to derive convergence rate analysis compared to classical techniques in the literature. We believe that this approach can be extended to analyze other methods that have not been listed here.

1.2.9 Convergence rates and complexity analysis

Classical optimization literature often characterizes asymptotic convergence or linear convergence rates of the underlying algorithm, while sublinear rates or oracle complexity are largely elusive, see, e.g., [3,21,22,31,38,53]. Sublinear convergence rates have been widely studied in convex optimization methods, see, e.g., [45], while oracle complexity analysis was formally studied in [43]. Recently, these topics have gained in popularity due to applications to large-scale problems in modern signal and image processing, machine learning, and statistical learning [9,28,67]. Let us discuss these concepts in detail

Enjoying the preview?

Page 1 of 1

Federated Learning: Theory and Practice

About this ebook

Related to Federated Learning

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Federated Learning

What did you think?

Book preview

Federated Learning - Lam M. Nguyen

Part 1: Optimization fundamentals for secure federated learning

Chapter 1: Gradient descent-type methods

Abstract

Keywords

Acknowledgements

1.1 Introduction

1.2 Basic components of GD-type methods

1.2.1 Search direction

1.2.2 Step-size

1.2.3 Proximal operator

1.2.4 Momentum

1.2.5 Dual averaging variant

1.2.6 Structure assumptions

1.2.7 Optimality certification

1.2.8 Unified convergence analysis

1.2.9 Convergence rates and complexity analysis