Building Transformer Models with PyTorch 2.0: NLP, computer vision, and speech processing with PyTorch and Hugging Face (English Edition)

Ebook684 pages3 hours

Building Transformer Models with PyTorch 2.0: NLP, computer vision, and speech processing with PyTorch and Hugging Face (English Edition)

Name: Building Transformer Models with PyTorch 2.0: NLP, computer vision, and speech processing with PyTorch and Hugging Face (English Edition)
Author: Prem Timsina
ISBN: 9789355519900

By Prem Timsina

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book covers transformer architecture for various applications including NLP, computer vision, speech processing, and predictive modeling with tabular data. It is a valuable resource for anyone looking to harness the power of transformer architecture in their machine learning projects.

The book provides a step-by-step guide to building transformer models from scratch and fine-tuning pre-trained open-source models. It explores foundational model architecture, including GPT, VIT, Whisper, TabTransformer, Stable Diffusion, and the core principles for solving various problems with transformers. The book also covers transfer learning, model training, and fine-tuning, and discusses how to utilize recent models from Hugging Face. Additionally, the book explores advanced topics such as model benchmarking, multimodal learning, reinforcement learning, and deploying and serving transformer models.

In conclusion, this book offers a comprehensive and thorough guide to transformer models and their various applications.

Skip carousel

LanguageEnglish

PublisherBPB Online LLP

Release dateMar 8, 2024

ISBN9789355519900

Author

Prem Timsina

Related authors

Skip carousel

Related to Building Transformer Models with PyTorch 2.0

Related ebooks

Skip carousel

Hands-on TinyML: Harness the power of Machine Learning on the edge devices (English Edition)
Ebook
Hands-on TinyML: Harness the power of Machine Learning on the edge devices (English Edition)
byRohan Banerjee
Rating: 5 out of 5 stars
5/5
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
Ebook
Machine Learning in Production: Master the art of delivering robust Machine Learning solutions with MLOps (English Edition)
bySuhas Pote
Rating: 0 out of 5 stars
0 ratings
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Ebook
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
byChitra Lele
Rating: 0 out of 5 stars
0 ratings
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
Ebook
Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
byPartha Majumdar
Rating: 0 out of 5 stars
0 ratings
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
Ebook
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
byAvishek Nag
Rating: 0 out of 5 stars
0 ratings
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
Ebook
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
byGiuseppe Bonaccorso
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition)
Ebook
Python Machine Learning Projects: Learn how to build Machine Learning projects from scratch (English Edition)
byDr. Deepali R Vora
Rating: 0 out of 5 stars
0 ratings
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Ebook
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
byAlexandra George
Rating: 0 out of 5 stars
0 ratings
Pythonic AI: A beginner's guide to building AI applications in Python (English Edition)
Ebook
Pythonic AI: A beginner's guide to building AI applications in Python (English Edition)
byArindam Banerjee
Rating: 5 out of 5 stars
5/5
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
Ebook
Practical Mathematics for AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, NLP, Complex Deep Neural Networks and Machine Learning (English Edition)
byTamoghna Ghosh
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)
Ebook
Deep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition)
byShekhar Khandelwal
Rating: 0 out of 5 stars
0 ratings
Mastering PyCharm
Ebook
Mastering PyCharm
byIslam Quazi Nafiul
Rating: 5 out of 5 stars
5/5
Python for Developers
Ebook
Python for Developers
byMohit Raj
Rating: 0 out of 5 stars
0 ratings
Instant .NET 4.5 Extension Methods How-to
Ebook
Instant .NET 4.5 Extension Methods How-to
byShawn R. McLean
Rating: 0 out of 5 stars
0 ratings
Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow
Ebook
Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow
byHisham El-Amir
Rating: 0 out of 5 stars
0 ratings
Python Data Visualization Essentials Guide: Become a Data Visualization expert by building strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh
Ebook
Python Data Visualization Essentials Guide: Become a Data Visualization expert by building strong proficiency in Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh
byKalilur Rahman
Rating: 0 out of 5 stars
0 ratings
Continuous Machine Learning with Kubeflow: Performing Reliable MLOps with Capabilities of TFX, Sagemaker and Kubernetes (English Edition)
Ebook
Continuous Machine Learning with Kubeflow: Performing Reliable MLOps with Capabilities of TFX, Sagemaker and Kubernetes (English Edition)
byAniruddha Choudhury
Rating: 0 out of 5 stars
0 ratings
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
Ebook
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
byDr. Rajkumar Tekchandani
Rating: 0 out of 5 stars
0 ratings
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
Ebook
Mastering TensorFlow 2.x: Implement Powerful Neural Nets across Structured, Unstructured datasets and Time Series Data
byRajdeep Dua
Rating: 0 out of 5 stars
0 ratings
Artificial Neural Networks with TensorFlow 2: ANN Architecture Machine Learning Projects
Ebook
Artificial Neural Networks with TensorFlow 2: ANN Architecture Machine Learning Projects
byPoornachandra Sarang
Rating: 0 out of 5 stars
0 ratings
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
Ebook
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
byMirza Rahim Baig
Rating: 0 out of 5 stars
0 ratings
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
Ebook
Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
byMark Wickham
Rating: 0 out of 5 stars
0 ratings
Computer Vision with Maker Tech: Detecting People With a Raspberry Pi, a Thermal Camera, and Machine Learning
Ebook
Computer Vision with Maker Tech: Detecting People With a Raspberry Pi, a Thermal Camera, and Machine Learning
byFabio Manganiello
Rating: 0 out of 5 stars
0 ratings
A Python Guide for Web Scraping: Explore Python Tools, Web Scraping Techniques, and How to Automata Data for Industrial Applications (English Edition)
Ebook
A Python Guide for Web Scraping: Explore Python Tools, Web Scraping Techniques, and How to Automata Data for Industrial Applications (English Edition)
byPradumna Milind Panditrao
Rating: 0 out of 5 stars
0 ratings
Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
Ebook
Deploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform
byPramod Singh
Rating: 0 out of 5 stars
0 ratings
Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud
Ebook
Monetizing Machine Learning: Quickly Turn Python ML Ideas into Web Applications on the Serverless Cloud
byManuel Amunategui
Rating: 0 out of 5 stars
0 ratings
Linear Programming for Project Management Professionals: Explore Concepts, Techniques, and Tools to Achieve Project Management Objectives
Ebook
Linear Programming for Project Management Professionals: Explore Concepts, Techniques, and Tools to Achieve Project Management Objectives
byPartha Majumdar
Rating: 0 out of 5 stars
0 ratings
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Ebook
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
byAlok Kumar
Rating: 0 out of 5 stars
0 ratings
Getting Started with Simulink
Ebook
Getting Started with Simulink
byLuca Zamboni
Rating: 5 out of 5 stars
5/5
Deep Learning on Microcontrollers: Learn how to develop embedded AI applications using TinyML (English Edition)
Ebook
Deep Learning on Microcontrollers: Learn how to develop embedded AI applications using TinyML (English Edition)
byAtul Krishna Gupta
Rating: 5 out of 5 stars
5/5

Intelligence (AI) & Semantics For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
Killer ChatGPT Prompts: Harness the Power of AI for Success and Profit
Ebook
Killer ChatGPT Prompts: Harness the Power of AI for Success and Profit
byGuy Hart-Davis
Rating: 2 out of 5 stars
2/5
101 Midjourney Prompt Secrets
Ebook
101 Midjourney Prompt Secrets
byMarcus Byrne
Rating: 3 out of 5 stars
3/5
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 0 out of 5 stars
0 ratings
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Ebook
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
bySebastian Raschka
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
ChatGPT For Fiction Writing: AI for Authors
Ebook
ChatGPT For Fiction Writing: AI for Authors
byNova Leigh
Rating: 5 out of 5 stars
5/5
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
ChatGPT
Ebook
ChatGPT
byGary Stevens
Rating: 3 out of 5 stars
3/5
Hacking : Guide to Computer Hacking and Penetration Testing
Ebook
Hacking : Guide to Computer Hacking and Penetration Testing
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT
Ebook
Mastering ChatGPT
byCharles J. Jones
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
Dancing with Qubits: How quantum computing works and how it can change the world
Ebook
Dancing with Qubits: How quantum computing works and how it can change the world
byRobert S. Sutor
Rating: 5 out of 5 stars
5/5
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
Ebook
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
byS M Howard
Rating: 4 out of 5 stars
4/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
The Algorithm of the Universe (A New Perspective to Cognitive AI)
Ebook
The Algorithm of the Universe (A New Perspective to Cognitive AI)
byAncient Philosophy
Rating: 5 out of 5 stars
5/5
ChatGPT
Ebook
ChatGPT
byRobert Conway
Rating: 1 out of 5 stars
1/5
Hacking With Linux 2020:A Complete Beginners Guide to the World of Hacking Using Linux - Explore the Methods and Tools of Ethical Hacking with Linux
Ebook
Hacking With Linux 2020:A Complete Beginners Guide to the World of Hacking Using Linux - Explore the Methods and Tools of Ethical Hacking with Linux
byJoseph Kenna
Rating: 0 out of 5 stars
0 ratings
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
2084: Artificial Intelligence and the Future of Humanity
Ebook
2084: Artificial Intelligence and the Future of Humanity
byJohn C. Lennox
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Understanding Machine Learning Features and Platforms
Podcast episode
Understanding Machine Learning Features and Platforms
byThe Cloudcast
0 ratings
0% found this document useful
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
Podcast episode
Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling: For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.
byData Engineering Podcast
0 ratings
0% found this document useful
10. Unlocking Contract Intelligence: The Intersection of AI and Transformative Mathematics with Randy Friedman: The CLM Rx
Podcast episode
10. Unlocking Contract Intelligence: The Intersection of AI and Transformative Mathematics with Randy Friedman: The CLM Rx
byThe CLM Rx
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
Podcast episode
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
Cost/Performance Optimization with LLMs [Panel]
Podcast episode
Cost/Performance Optimization with LLMs [Panel]
byMLOps.community
0 ratings
0% found this document useful
Great Data Models Need Great Features
Podcast episode
Great Data Models Need Great Features
byThe Cloudcast
0 ratings
0% found this document useful
The Cloudcast #208 - Infrastructure as Code: Brian talks with Nathen Harvey (@nathenharvey, Community Manager @chef) about how he became a Community Manager, his passion for DevOps, The Food Fight podcast, the future of configuration management and the best first steps to developing the skills to...
Podcast episode
The Cloudcast #208 - Infrastructure as Code: Brian talks with Nathen Harvey (@nathenharvey, Community Manager @chef) about how he became a Community Manager, his passion for DevOps, The Food Fight podcast, the future of configuration management and the best first steps to developing the skills to...
byThe Cloudcast
0 ratings
0% found this document useful
From MVP to Production // Day 2 Panel 2 // AI in Production Conference
Podcast episode
From MVP to Production // Day 2 Panel 2 // AI in Production Conference
byMLOps.community
0 ratings
0% found this document useful
MLOps Coffee Sessions #12: Journey of Flyte at Lyft and Through Open-source // Ketan Umare
Podcast episode
MLOps Coffee Sessions #12: Journey of Flyte at Lyft and Through Open-source // Ketan Umare
byMLOps.community
0 ratings
0% found this document useful
OpenLLMetry - Observing the Quality of LLMs with Nir Gazit: Its only been a year since ChatGPT was introduced. Since then we see LLMs (Large Language Models) and Generative AIs being integrated into every days life software applications. Developers have the hard choice to pick the right model for their use...
Podcast episode
OpenLLMetry - Observing the Quality of LLMs with Nir Gazit: Its only been a year since ChatGPT was introduced. Since then we see LLMs (Large Language Models) and Generative AIs being integrated into every days life software applications. Developers have the hard choice to pick the right model for their use...
byPurePerformance
0 ratings
0% found this document useful
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101
Podcast episode
Declarative Machine Learning Systems: Big Tech Level ML Without a Big Tech Team // Piero Molino // MLOps Coffee Sessions #101
byMLOps.community
0 ratings
0% found this document useful
Engineering MLOps // Emmanuel Raj // MLOps Meetup #69
Podcast episode
Engineering MLOps // Emmanuel Raj // MLOps Meetup #69
byMLOps.community
0 ratings
0% found this document useful
The Cloudcast #333 - Infrastructure 3.0 for AI and ML: Aaron and Brian talk with Lenny Pruss (@lennypruss, Partner at Amplify Partners) about the evolution of application and infrastructure architectures, how AI/ML are radically changing how applications are designed, the new inputs to application systems,...
Podcast episode
The Cloudcast #333 - Infrastructure 3.0 for AI and ML: Aaron and Brian talk with Lenny Pruss (@lennypruss, Partner at Amplify Partners) about the evolution of application and infrastructure architectures, how AI/ML are radically changing how applications are designed, the new inputs to application systems,...
byThe Cloudcast
0 ratings
0% found this document useful
2 tools to get you 90% operational // Michael Del Balso - Willem Pienaar - David Aronchick // MLOps Meetup #50
Podcast episode
2 tools to get you 90% operational // Michael Del Balso - Willem Pienaar - David Aronchick // MLOps Meetup #50
byMLOps.community
0 ratings
0% found this document useful
Bin Ren – text2quant (S7E2)
Podcast episode
Bin Ren – text2quant (S7E2)
byFlirting with Models
0 ratings
0% found this document useful
759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko: Encoders, cross attention and masking for LLMs: S…
Podcast episode
759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko: Encoders, cross attention and masking for LLMs: S…
bySuper Data Science: ML & AI Podcast with Jon Krohn
0 ratings
0% found this document useful
Experiment Tracking in the Age of LLMs // Piotr Niedźwiedź // MLOps Podcast #168
Podcast episode
Experiment Tracking in the Age of LLMs // Piotr Niedźwiedź // MLOps Podcast #168
byMLOps.community
0 ratings
0% found this document useful
Doing it the Hard Way: Making the AI engine and language ? of the future — with Chris Lattner of Modular
Podcast episode
Doing it the Hard Way: Making the AI engine and language ? of the future — with Chris Lattner of Modular
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
LM101-056: How to Build Generative Latent Probabilistic Topic Models for Search Engine and Recommender System Applications: In this NEW episode we discuss Latent Semantic Indexing type machine learning algorithms which have a probabilistic interpretation. We explain why such a probabilistic interpretation is important and discuss how such algorithms can be used in the design o
Podcast episode
LM101-056: How to Build Generative Latent Probabilistic Topic Models for Search Engine and Recommender System Applications: In this NEW episode we discuss Latent Semantic Indexing type machine learning algorithms which have a probabilistic interpretation. We explain why such a probabilistic interpretation is important and discuss how such algorithms can be used in the design o
byLearning Machines 101
0 ratings
0% found this document useful
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
Podcast episode
RLHF 201 - with Nathan Lambert of AI2 and Interconnects
byLatent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0
0 ratings
0% found this document useful
Miki Rezentes on how it's APIs All the Way Down: According to an ancient myth, the world rests on the back of a turtle. Miki Rezentes believes that all software rests on the back of APIs and shares highlights from her talk, ‘APIs All the Way Down’.
Podcast episode
Miki Rezentes on how it's APIs All the Way Down: According to an ancient myth, the world rests on the back of a turtle. Miki Rezentes believes that all software rests on the back of APIs and shares highlights from her talk, ‘APIs All the Way Down’.
byElixir Wizards
0 ratings
0% found this document useful
Bringing Feature Stores and MLOps to the Enterprise at Tecton: An interview with Kevin Stumpf, CTO of Tecton, about his work building an enterprise grade feature store and how it functions as the core element of an MLOps strategy.
Podcast episode
Bringing Feature Stores and MLOps to the Enterprise at Tecton: An interview with Kevin Stumpf, CTO of Tecton, about his work building an enterprise grade feature store and how it functions as the core element of an MLOps strategy.
byData Engineering Podcast
0 ratings
0% found this document useful
Machine Learning: Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how...
Podcast episode
Machine Learning: Does machine learning feel like too convoluted a topic? Not anymore! Listen to hosts Lois Houston and Nikita Abraham, along with Senior Principal OCI Instructor Hemant Gahankari, talk about foundational machine learning concepts and dive into how...
byOracle University Podcast
0 ratings
0% found this document useful
Practitioners Guide to MLOps // Donna Schut and Christos Aniftos // Coffee Sessions #82
Podcast episode
Practitioners Guide to MLOps // Donna Schut and Christos Aniftos // Coffee Sessions #82
byMLOps.community
0 ratings
0% found this document useful
Continuous Application Profiling
Podcast episode
Continuous Application Profiling
byThe Cloudcast
0 ratings
0% found this document useful
Ep. 24 - How to run a successful development process (even if you're not technical): This episode is for anyone who wants to effectively orchestrate a development process without becoming the butt of their team’s water-cooler jokes. It's more attainable than you think, because it's all about process. Don't be a Bill Lumbergh - be...
Podcast episode
Ep. 24 - How to run a successful development process (even if you're not technical): This episode is for anyone who wants to effectively orchestrate a development process without becoming the butt of their team’s water-cooler jokes. It's more attainable than you think, because it's all about process. Don't be a Bill Lumbergh - be...
byfreeCodeCamp Podcast
0 ratings
0% found this document useful
The Role of Infrastructure in ML // Niels Bantilan // #197
Podcast episode
The Role of Infrastructure in ML // Niels Bantilan // #197
byMLOps.community
0 ratings
0% found this document useful
LLM on K8s Panel // LLMs in Conference in Production Conference Part II
Podcast episode
LLM on K8s Panel // LLMs in Conference in Production Conference Part II
byMLOps.community
0 ratings
0% found this document useful
MLOps at the Crossroads // Patrick Barker & Farhood Etaati // #204
Podcast episode
MLOps at the Crossroads // Patrick Barker & Farhood Etaati // #204
byMLOps.community
0 ratings
0% found this document useful
EP 53: How to Use AI to Teach Employees New Skills
Podcast episode
EP 53: How to Use AI to Teach Employees New Skills
byEveryday AI Podcast – An AI and ChatGPT Podcast
0 ratings
0% found this document useful

Skip carousel

Generative AI: What Leaders Need To Know
Rotman Management
Article
Generative AI: What Leaders Need To Know
Jan 1, 2024
12 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
In Conversation with Surbhi Rathore
Techfastly
Article
In Conversation with Surbhi Rathore
Oct 1, 2021
4 min read
Quantum Computing and The Rise Of Machine Learning
Techfastly
Article
Quantum Computing and The Rise Of Machine Learning
Oct 1, 2021
2 min read
Tales For Makers
The Shed
Article
Tales For Makers
Oct 3, 2022
4 min read
Picture In A Mainframe
Linux Format
Article
Picture In A Mainframe
Jul 2, 2019
11 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
2024: What Is The Near Future Of Generative AI?
The European Business Review
Article
2024: What Is The Near Future Of Generative AI?
Jan 26, 2024
8 min read
SYNC OR SWIM Rough Animator
Screen Education
Article
SYNC OR SWIM Rough Animator
Dec 1, 2019
11 min read
The Future of Growth: AI Comes of Age
Rotman Management
Article
The Future of Growth: AI Comes of Age
Jan 1, 2018
11 min read
The Verdict
Linux Format
Article
The Verdict
Jan 10, 2023
1 min read
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
The European Business Review
Article
COMPETITIVE ADVANTAGE THROUGH SOFTWARE: Contrasting Enterprises & Startups
Feb 4, 2019
6 min read
The Most Important Job Skill of This Century
The Atlantic
Article
The Most Important Job Skill of This Century
Feb 8, 2023
8 min read
The Deep Learning Revolution For Artificial Intelligence
Facility Management
Article
The Deep Learning Revolution For Artificial Intelligence
Mar 28, 2019
3 min read
What Have Humans Just Unleashed?
The Atlantic
Article
What Have Humans Just Unleashed?
Mar 16, 2023
9 min read
Generativity: Driving The Promise Of Generative AI
The European Business Review
Article
Generativity: Driving The Promise Of Generative AI
May 31, 2023
7 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
TechLife News
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 30, 2024
4 min read
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
AppleMagazine
Article
Tired Of AI Doomsday Tropes, Cohere CEO Says His Goal Is Technology That’s ‘Additive To Humanity’
Mar 29, 2024
4 min read
Top Five AI-ML Books For Business Leaders
Techfastly
Article
Top Five AI-ML Books For Business Leaders
Aug 2, 2021
5 min read
Code A Cataloguing Application In Python
Linux Format
Article
Code A Cataloguing Application In Python
Nov 15, 2022
Credit: www.djangoproject.com Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://github.com/mat
8 min read
In Conversation with Rajesh Dhuddu Global Head, Blockchain & Metaverse Practice, Tech Mahindra
Techfastly
Article
In Conversation with Rajesh Dhuddu Global Head, Blockchain & Metaverse Practice, Tech Mahindra
Nov 1, 2022
6 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
The Verdict Static Site Generators
Linux Format
Article
The Verdict Static Site Generators
Oct 19, 2021
2 min read
Use Katana For Lookdev And Lighting
3D World
Article
Use Katana For Lookdev And Lighting
Sep 7, 2021
3 min read
The Race To Exascale Supercomputers
Maximum PC
Article
The Race To Exascale Supercomputers
Jun 21, 2022
9 min read
“Be Global But Act Local because Each Economy Is Unique”
Business Today
Article
“Be Global But Act Local because Each Economy Is Unique”
Dec 8, 2023
6 min read
This PC Does Not Exist
Maximum PC
Article
This PC Does Not Exist
May 23, 2023
7 min read

Related categories

Skip carousel

Reviews for Building Transformer Models with PyTorch 2.0

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Building Transformer Models with PyTorch 2.0 - Prem Timsina

HAPTER

Transformer Architecture

Introduction

Imagine you are a software engineer working on an exciting project and searching for a programming language to help create software quickly and efficiently. You hear about a revolutionary new type of language that is the Swiss knife of programming language: this language is most efficient in creating Machine Learning (ML) models—plus, this programming language creates stunning websites faster than other web development frameworks and supports hardware programming. Furthermore, its performance in network programming and other related tasks is also outstanding. Would it not be interesting to learn about this powerful programming language?

Similar developments can be observed in the world of ML frameworks. The transformer architecture is an incredibly versatile ML architecture. Transformers were initially developed for Natural Language Processing (NLP). Due to their superior results, this architecture has rendered other NLP architectures like RNN and long short-term memory networks (LSTM) obsolete. More recently, transformers have begun impacting other ML fields as well. According to SUPERB (https://superbbenchmark.org/leaderboard) the best foundational model for speech processing is also based on the transformer. Furthermore, transformers have shown excellent results in computer vision and other machine learning fields as well. Therefore, transformers have the potential to converge all AI frameworks into a solitary, highly adaptable architecture.

In this chapter, we will look into the base architecture of this versatile machine learning in depth. The chapter specifically focuses on understanding the original transformer architecture proposed by Vaswani et al. (2017). Since the transformer was originally proposed for NLP—we will understand important NLP models and how the transformer was influenced by those models.

Structure

This chapter covers the following topics:

Chronology of NLP model development.

Transformer architecture

Training process of transformer

Inference process of transformer

Types of transformers and their applications

Objectives

This book chapter intends to provide readers with a broad understanding of the evolution and significant milestones in the development of NLP models, with a special emphasis on the transformer architecture. It seeks to offer an in-depth examination of various NLP models, drawing comparisons and highlighting the distinctive ways in which the transformer model addresses the limitations of its predecessors. A key focus will be placed on investigating the essential components that make up the transformer architecture. Additionally, the chapter aims to educate readers about the different variations of the transformer model, showcasing their broad spectrum of applications in the field of NLP. The overarching theme of this chapter is to trace the journey of NLP models’ development, culminating in the rise of the transformer as a ground-breaking innovation in the landscape of language processing technologies.

Chronology of NLP model development

The transformer was originally proposed for NLP, specifically machine translation, by Vaswani et al. in 2017¹. It is currently the most popular and effective model in NLP, as well as other wide-ranging tasks (speech processing, computer vision, and others). However, the development of the transformer was not a sudden occurrence. In fact, it was the culmination of years of research and development in NLP models, with each model building upon the previous ones. Let us examine the chronological history of different NLP models. This is important because as we understand the transformer architecture, we will be able to contextualize it within the historical development of NLP models, their shortcomings, and how transformer is unique and versatile.

In the upcoming section, we will explore the timeline of NLP model evolution and contrast various NLP models. Figure 1.1 shows the chronology of NLP research:

Figure 1.1: Chronology of NLP models development

The transformer model was the culmination of all the previous research developments. Vaswani et al., cited a few of that original research. Specifically, Vaswani et al. cited the following research, and the transformer model seems to have been highly influenced by them.

In the following sections, we will discuss a few of the most important NLP models, their benefits, and their shortcomings.

Recurrent neural network

First, let us discuss the concept of next-word prediction. For instance, let us say we have a sentence, The color of the sky is …. Based on the information already processed by our brain, we can predict that the next word in this sentence would be blue. However, this prediction is not solely based on previous words, but rather on multiple preceding words.

Traditional machine learning algorithms, such as linear regression and multilayer perceptron, are not equipped to store previous information and utilize it for predictions. These algorithms do not have the capability to retain information from prior inputs. Here, recurrent neural networks come into play, which is capable of retaining prior information and utilizing it for making accurate predictions.

Figure 1.2 shows the structure of RNN. Here, each cell takes the output of its previous cell as its input. This allows the network to retain information from previous time steps and incorporate it into the computation at each subsequent iteration:

Figure 1.2: RNN structure

Limitation of RNN

Let us consider the following example: England is my hometown. I spent my whole life there. I just moved to Spain two days ago. I can speak only one language, which is .... In this example, the next word is English. The most important contextual word, in this case, is England, which appears at the beginning of the sentence. However, in some cases, the relevant information may be located far away from where it is needed in an RNN. For example, in this case, the gap between the relevant information and the predicted word is about 26-time steps, that is, England is at time step 1, and the predicted word is at time step 27. This large gap can pose a problem for RNNs, as they may not be able to retain contextual information over such long sequences, or the weights associated with that information may become very small. This is due to the structure of RNNs, where the gradients can become very small or even zero as they are repeatedly multiplied by the weight matrices in the network. This can make it difficult for the network to learn and can cause training to be slow or even fail altogether.

LSTM

To overcome the issue of the vanishing gradient problem, LSTM was introduced.

In contrast to RNNs, LSTMs have a memory gate that allows them to store information about long-term dependencies in data. Furthermore, they possess a forget gate which helps filter out unnecessary information from previous states.

Another advantage of LSTMs is their low likelihood of encountering the problem of vanishing gradients. This occurs when gradients become very small or even zero during backpropagation, making it difficult for the network to learn. LSTMs address this issue by employing gates that regulate information flow through the network, allowing it to retain relevant details and discard irrelevant ones. Figure 1.3 shows the comparison of RNN and LSTM structures. As compared to RNN, LSTM structure is complex:

Figure 1.3: Comparison of RNN and LSTM

Limitation of LSTM

Limited ability to handle long sequences: Even though LSTM has a memory gate, they still struggle to handle long sequences. This is because they use a fixed length hidden state, which may be a problem if the input sequence is very long.

LSTMs process sequences sequentially, this can be slow and limit the ability to parallelize computations across multiple processors.

Cho’s (2014) RNN encoder decoder

The RNN encoder-decoder model is a sequence-to-sequence algorithm. It has three major components. Let us explore the components of an RNN encoder-decoder model with an example of English-to-French translation:

Encoder: This is an RNN that encodes a variable-length input sequence (in this case, an English sentence) into a fixed-length vector.

Encoded vector: The fixed-length vector output by the encoder.

Decoder: This is also an RNN that takes the encoded vector as input and produces a variable-length output sequence (in this case, the French translation of the English input sequence).

The encoder-decoder model is especially beneficial for tasks such as machine translation and speech recognition, where the input sequence and output sequence may be of differing lengths. Figure 1.4 illustrates a simplified representation of the RNN encoder-decoder model:

Figure 1.4: Simplified representation of Cho’s encoder-decoder model

Limitation: The major limitation is vanishing gradient problem. The model generates a fixed-length vector representation of the input sequence using the final hidden state of the encoder RNN, which can result in the loss of important information from earlier time steps.

Bahdanau’s (2014) attention mechanism

Bahdanau’s 2014 paper on attention mechanism introduced an extension to the RNN encoder-decoder model. It is also the encoder-decoder model with the addition of attention. Let us discuss what the attention mechanism is:

It allows the model to selectively attend to certain parts of the input sequence that are more relevant to the output while ignoring others that are not as relevant.

For example, in machine translation—the attention mechanism allows the model to focus on the most important words or phrases in predicting correct translation.

In essence, the attention mechanism mimics human cognitive behavior by focusing on the most important words while filtering out noise.

Limitation: The major limitation is Bahdanau’s mechanism is a local attention mechanism that only looks at a subset of the input sequence at a time. This works fine for the shorter sentence. However, performance reduces significantly if the input sentence is long.

Let us summarize the important concept based on the above four architecture:

The encoder-decoder approach is effective because it can handle different lengths of input and output sequences, which is often the case in machine translation and other NLP tasks where the number of words in input and output sequences may differ.

Attention-mechanism is a crucial component in this approach because it enables a neural network to concentrate on specific parts of the input data that are essential for the task being performed. This helps the network to capture the relevant information more effectively, leading to better performance on various NLP tasks.

In the next section, we will discuss the transformer architecture and understand how encoder-decoder architecture and attention-mechanism are the major components of transformer architecture.

Transformer architecture

There are many variants of the transformer; however, in this section, we will discuss the original transformer architecture proposed by Vaswani et al. (2017). They proposed the architecture for machine translation, (for example, English to the French Language). Let us highlight the most important aspects of transformer architecture before going into detail:

Transformer uses an encoder-decoder architecture for machine translation.

The encoder converts the input sequence into a sequence vector, with the length of the vector being equal to the length of the input sequence. It consists of multiple encoder blocks.

The decoder also consists of multiple decoder blocks, and the sequence vector (output of encoder) is fed to all decoder blocks.

Multi-head attention is a primary component of both the encoder and decoder.

Positional encoding is a new concept introduced in the transformer architecture that encodes the positional information of each input token, representing its position in the input sequence.

Figure 1.5 shows the transformer architecture:

Figure 1.5: Transformer architecture

Embedding

As shown in Figure 1.5, the input sequence in the transformer is represented by an embedding vector. Embedding is the process of representing a word or token as vectors of fixed length.

Before we go in-depth about embeddings, let us understand how the text was traditionally represented in NLP. This will help us appreciate why we use embeddings. Traditionally, textual data in machine learning has been represented as n-gram words. Let us consider the example of 1-gram: if the total sample has 50,000 unique words, each input sequence would be represented with a 50,000-dimensional vector. We would fill these dimensions with the number of times each word appears in the specific input sequence. However, this approach has several problems:

Even for small input sequences (for example, those with only two tokens), we require a high-dimensional vector (50,000), resulting in a highly sparse vector.

There is no meaningful way to perform mathematical operations on these high-dimensional vector representations.

Embedding overcomes those challenges. Embedding is a technique used to represent the word or sequence by a vector of real numbers that captures the meaning and context of the word or phrase.

A very simple example of embedding is taking a set of words, such as [cabbage, rabbit, eggplant, elephant, dog, cauliflower]; and representing each word as a vector in 2-dimensional space capturing for animal and color features. The embedding is shown in Figure 1.6. The final embedding vector may look like as follows:

Figure 1.6: Embedding plotting

[cabbage, cauliflower, eggplant, dog, rabbit, elephant]=[[0.2,0.1], [0.2,0.3], [0.2,0.8],[0.8,0.4],[0.75,0.6], [0.9,0.7]

We can see that the first dimension of cabbage and cauliflower is almost the same, as both represent vegetables. They are located nearby in the first dimension. Also, we can perform addition and subtraction on these embeddings because each dimension represents a specific concept, and tokens are near if they represent similar concepts.

Interestingly, in the real world, we mostly use a pre-trained model like BERT or word2vec, which has been trained with billions of examples and extract large dimension of feature (BERT use 768 dimensions). The embedding is highly accurate as compared to n-gram and offers greater flexibility during NLP.

Positional encoding

Positional encoding in a transformer is used to provide the model with information about the position of each word in the input sequence. Unlike previous architecture (like LSTM) where each token is processed in sequence (one by one); the transformer processes the input tokens in parallel. This means each token should also have positional information.

Let us understand how positional encoding is done. In the Attention is All You Need paper, the authors use a specific formula for positional encoding. The formula is as follows:

PE(pos, 2i) and PE(pos, 2i + 1) are the i – th and (i + 1) – th dimensions of the positional encoding vector for position pos in the input sequence.

pos is the position of the word in the input sequence, starting from 0.

i is the index of the dimension in the positional encoding vector, starting from 0.

d is the dimensionality of the embedding (512 in the original architecture)

This formula generates a set of positional encodings that are unique for each position in the input sequence and that change smoothly as the position changes.

It is important to understand that there are 256 pairs (512/2) of sine and cosine values. Thus, i goes from 0 to 255.

Let us unpack the formula:

| |

The encoding of first word (position=0) will be:

| |

Thus, positional encoding of the first word will look like [0,1,0,1,…1]. The positional encoding for the second word will look like [0.8414,0.5403,0.8218,..]. If the embedding is of 512 dimensions. The position encoding vector looks like:

Model input

As depicted in Figure 1.7, model input is the pointwise addition of positional encoding and embedding vector. Let us understand how we achieve this.

Figure 1.7: Model input

To represent I Live In New York with a tokenized length of 5, we add 1 tokens.

[‘ I,’ Live’, In, ‘NewYork’, ]

At first, each token is represented by Integer. Here, word I is represented by 8667, Live is represented by 1362 , In is represented by 1300, New York is represented by 1301 and represented by 0. The resulting will be

IntegerRepresentation = [8667, 1362, 1300, 1301, 0]

We then pass these tokenized sequences to the embedding layer. The embedding of each token is represented by a vector of 512 dimensions. In the below example, the dimension of the vector [embeddingtoke8667] is 512.

Embedding

=[[embeddingtoken_8667], [embeddingtoken1362], [embeddingtoken1300], [embeddingtoken1301], [embeddingtoken0])

Finally, we perform the pointwise addition of Embedding and positional Encoding before feeding into the model.

PositionalEncodingVector

= [[size=512], [size = 512], [size = 512], [size = 512], [size = 512]] +

Embedding

= [[embeddingtoken_8667], [embeddingtoken1362], [embeddingtoken1300], [embeddingtoken1301]

[embeddingtoken0] =

ModelInput = [[size = 512], [size = 512], [size = 512], [size = 512], [size = 512]]

Encoding layer

The encoder layer is a crucial component in the transformer architecture, responsible for processing and encoding input sequences into vector representations. Refer to the following figure:

Figure 1.8: Encoder layer

Let us understand each subcomponent of the encoder layer in detail:

Input to the encoder: The input to the first layer of the encoder is the pointwise summation of embeddings and positional encoding.

Multi-head attention: A key component of the encoder block in a transformer is the multi-head self-attention mechanism. This mechanism allows the model to weigh the importance of different parts of the input when making a prediction. In a later section, we will discuss the details of multi-head attention.

Add and norm layer: The add layer, also known as the residual connection, is used to add the input to the output of the previous layer before passing it through the next layer. This allows the model to learn the residual function, which is the difference between the input and the output, rather than the actual function. This can help to improve the performance of the model, especially when the number of layers is large. The norm layer normalizes the activations of a layer across all of its hidden units. This can help to stabilize the training of the model by preventing the input from getting too large or too small, which can cause issues such as vanishing gradients or exploding gradients.

Feed-forward: The output of the multi-head self-attention mechanism is fed to the input of the feed-forward layer. Additionally, a non-linear activation function is applied. The feed-forward layer is important to extract the higher-level feature from the data. We also have add and norm layer after the feed-forward layer. The output of this is fed to next encoding block

Encoder output: The last block of the encoder produces a sequence vector, which is then sent to the decoder blocks as features.

Attention mechanism

The attention mechanism has emerged as a versatile and powerful neural network component that allows models to weigh and prioritize relevant information in a given context. Its core concepts, self-attention, and multi-headed attention are instrumental in enabling the transformer architecture to achieve remarkable results. Let us delve into these concepts in more

Enjoying the preview?

Page 1 of 1

Building Transformer Models with PyTorch 2.0: NLP, computer vision, and speech processing with PyTorch and Hugging Face (English Edition)

About this ebook

Prem Timsina

Related authors

Related to Building Transformer Models with PyTorch 2.0

Related ebooks

Intelligence (AI) & Semantics For You

Related podcast episodes

Related articles

Related categories

Reviews for Building Transformer Models with PyTorch 2.0

What did you think?

Book preview

Building Transformer Models with PyTorch 2.0 - Prem Timsina

Introduction

Structure

Objectives

Chronology of NLP model development

Recurrent neural network

Limitation of RNN

LSTM

Limitation of LSTM

Cho’s (2014) RNN encoder decoder

Bahdanau’s (2014) attention mechanism

Transformer architecture

Embedding

Positional encoding

Model input

Encoding layer

Attention mechanism