96 min listen
Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow
Segment Anything Model and the Hard Problems of Computer Vision — with Joseph Nelson of Roboflow
ratings:
Length:
80 minutes
Released:
Apr 13, 2023
Format:
Podcast episode
Description
2023 is the year of Multimodal AI, and Latent Space is going multimodal too! * This podcast comes with a video demo at the 1hr mark and it’s a good excuse to launch our YouTube - please subscribe! * We are also holding two events in San Francisco — the first AI | UX meetup next week (already full; we’ll send a recap here on the newsletter) and Latent Space Liftoff Day on May 4th (signup here; but get in touch if you have a high profile launch you’d like to make). * We also joined the Chroma/OpenAI ChatGPT Plugins Hackathon last week where we won the Turing and Replit awards and met some of you in person!This post featured on Hacker News.Out of the five senses of the human body, I’d put sight at the very top. But weirdly when it comes to AI, Computer Vision has felt left out of the recent wave compared to image generation, text reasoning, and even audio transcription. We got our first taste of it with the OCR capabilities demo in the GPT-4 Developer Livestream, but to date GPT-4’s vision capability has not yet been released. Meta AI leapfrogged OpenAI and everyone else by fully open sourcing their Segment Anything Model (SAM) last week, complete with paper, model, weights, data (6x more images and 400x more masks than OpenImages), and a very slick demo website. This is a marked change to their previous LLaMA release, which was not commercially licensed. The response has been ecstatic:SAM was the talk of the town at the ChatGPT Plugins Hackathon and I was fortunate enough to book Joseph Nelson who was frantically integrating SAM into Roboflow this past weekend. As a passionate instructor, hacker, and founder, Joseph is possibly the single best person in the world to bring the rest of us up to speed on the state of Computer Vision and the implications of SAM. I was already a fan of him from his previous pod with (hopefully future guest) Beyang Liu of Sourcegraph, so this served as a personal catchup as well. Enjoy! and let us know what other news/models/guests you’d like to have us discuss! - swyxRecorded in-person at the beautiful StudioPod studios in San Francisco.Full transcript is below the fold.Show Notes* Joseph’s links: Twitter, Linkedin, Personal* Sourcegraph Podcast and Game Theory Story* Represently* Roboflow at Pioneer and YCombinator* Udacity Self Driving Car dataset story* Computer Vision Annotation Formats* SAM recap - top things to know for those living in a cave* https://segment-anything.com/* https://segment-anything.com/demo* https://arxiv.org/pdf/2304.02643.pdf * https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/* https://blog.roboflow.com/segment-anything-breakdown/* https://ai.facebook.com/datasets/segment-anything/* Ask Roboflow https://ask.roboflow.ai/* GPT-4 Multimodal https://blog.roboflow.com/gpt-4-impact-speculation/Cut for time:* WSJ mention* Des Moines Register story* All In Pod: timestamped mention* In Forbes: underrepresented investors in Series A* Roboflow greatest hits* https://blog.roboflow.com/mountain-dew-contest-computer-vision/* https://blog.roboflow.com/self-driving-car-dataset-missing-pedestrians/* https://blog.roboflow.com/nerualhash-collision/ and Apple CSAM issue * https://www.rf100.org/Timestamps* [00:00:19] Introducing Joseph* [00:02:28] Why Iowa* [00:05:52] Origin of Roboflow* [00:16:12] Why Computer Vision* [00:17:50] Computer Vision Use Cases* [00:26:15] The Economics of Annotation/Segmentation* [00:32:17] Computer Vision Annotation Formats* [00:36:41] Intro to Computer Vision & Segmentation* [00:39:08] YOLO* [00:44:44] World Knowledge of Foundation Models* [00:46:21] Segment Anything Model* [00:51:29] SAM: Zero Shot Transfer* [00:51:53] SAM: Promptability* [00:53:24] SAM: Model Assisted Labeling* [00:56:03] SAM doesn't have labels* [00:59:23] Labeling on the Browser* [01:00:28] Roboflow + SAM Video Demo * [01:07:27] Future Predictions* [01:08:04] GPT4 Multimodality* [01:09:27] Remaining Hard Problems* [01:13:57] Ask Roboflow (2019)* [01:15:26] How to
Released:
Apr 13, 2023
Format:
Podcast episode
Titles in the series (67)
Emergency Pod: ChatGPT's App Store Moment (w/ OpenAI's Logan Kilpatrick, LindyAI's Florent Crivello and Nader Dabit) by Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and all things Software 3.0