Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219

The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219

FromMLOps.community


The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219

FromMLOps.community

ratings:
Length:
75 minutes
Released:
Mar 22, 2024
Format:
Podcast episode

Description

Huge thank you to ⁠Databricks⁠ AI for sponsoring this episode. Databricks - http://databricks.com/

Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing.

Davis Blalock is a Research Scientist and the first employee of Mosaic ML: a GenAI startup acquired for $1.3 billion by Databricks.

MLOps podcast #219 with Databricks' Engineering Manager, Bandish Shah and Research Scientist Davis Blalock, The Art and Science of Training Large Language Models.

// Abstract
What's hard about language models at scale? Turns out...everything. MosaicML's Davis and Bandish share war stories and lessons learned from pushing the limits of LLM training and helping dozens of customers get LLMs into production. They cover what can go wrong at every level of the stack, how to make sure you're building the right solution, and some contrarian takes on the future of efficient models.

// Bio
Bandish Shah
Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers.

Davis Blalock
Davis Blalock is a research scientist at MosaicML. He completed his PhD at MIT, advised by Professor John Guttag. His primary work is designing high-performance machine learning algorithms. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar.

// MLOps Jobs board
https://mlops.pallet.xyz/jobs

// MLOps Swag/Merch
https://mlops-community.myshopify.com/

// Related Links
Website: http://databricks.com/
Davis' Newsletters:
Learning to recognize spoken words from five unlabeled examples in under two seconds: https://arxiv.org/abs/1609.09196
Training on data at 5GB/s in a single thread: https://arxiv.org/abs/1808.02515
Nearest-neighbor searching through billions of images per second in one thread with no indexing: https://arxiv.org/abs/1706.10283
Multiplying matrices 10-100x faster than a matrix multiply (with some approximation error): https://arxiv.org/abs/2106.10860
Hidden Technical Debt in Machine Learning Systems: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Davis on LinkedIn: https://www.linkedin.com/in/dblalock/
Connect with Bandish on LinkedIn: https://www.linkedin.com/in/bandish-shah/
Released:
Mar 22, 2024
Format:
Podcast episode

Titles in the series (100)

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.