Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

How Pinterest Powers Image Similarity // Shaji Chennan Kunnummel // System Design Reviews #1

How Pinterest Powers Image Similarity // Shaji Chennan Kunnummel // System Design Reviews #1

FromMLOps.community


How Pinterest Powers Image Similarity // Shaji Chennan Kunnummel // System Design Reviews #1

FromMLOps.community

ratings:
Length:
58 minutes
Released:
Jun 29, 2021
Format:
Podcast episode

Description

In this Machine Learning System Design Review, Shaji Chennan Kunnummel walks us through the system design for Pinterest’s near-real-time architecture for detecting similar images. We discuss their usage of Kafka, Flink, rocksdb, and much more. Starting with the high-level requirements for the system, we discussed Pinterest’s focus on debuggability and an easy transition from their batch processing system to stream processing. We then touch on the different system interfaces and components involved such as Manas—Pinterest’s custom search engine—and how it all ends up in their custom graph database, downstream Kafka streams, and to Pinterest’s feature store—Galaxy. With Shaji’s expert knowledge of the system, we were able to do a deep dive into the system’s architecture and some of its components.

// Experiences
15+ years of experience in software product development.
Led multiple teams in a highly agile, collaborative, and cross-functional environment.
Designed and implemented highly scalable, fault-tolerant, and optimized distributed systems that scale to handle millions of requests per second. In-depth knowledge of Object-oriented programming and design patterns in C++/Java/Python/Golang.
Designed and built complex data pipelines and microservices to train and serve machine learning models.
Built analytics pipelines for processing and mining high-volume data set using Hadoop and Map-Reduce frameworks.
In-depth knowledge of distributed storage, consistency models, NoSQL data modeling, Cloud computing environment (AWS and Google Cloud).
Released:
Jun 29, 2021
Format:
Podcast episode

Titles in the series (100)

Weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.