Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

? ThursdAI - Jan 24 - ⌛Diffusion Transformers,? fMRI multimodality, Fuyu and Moondream1 VLMs, Google video generation & more AI news

? ThursdAI - Jan 24 - ⌛Diffusion Transformers,? fMRI multimodality, Fuyu and Moondream1 VLMs, Google video generation & more AI news

FromThursdAI - The top AI news from the past week


? ThursdAI - Jan 24 - ⌛Diffusion Transformers,? fMRI multimodality, Fuyu and Moondream1 VLMs, Google video generation & more AI news

FromThursdAI - The top AI news from the past week

ratings:
Length:
101 minutes
Released:
Jan 26, 2024
Format:
Podcast episode

Description

What A SHOW folks, I almost don't want to write anything in the newsletter to MAKE you listen haha but I will I know many of you don't like listening to be babble. But if you chose one episode to listen to instead of just skimming the show-notes, make it this one. We've had 2 deep dives, one into the exciting world of multi-modalilty, we chatted with the creator of Moondream1, Vik and the co-founders of Prophetic, Wes and Eric about their EEG/fMRI multimodal transformer (that's right!) and then we had a DEEP dive into the new Hourglass Diffusion Transformers with Tanishq from MedArc/Stability. More than 1300 tuned in to the live show ? and I've got some incredible feedback on the fly, which I cherish so if you have friends who don't already know about ThursdAI, why not share this with them as well? TL;DR of all topics covered: * Open Source LLMs * Stability AI releases StableLM 1.6B params (X, Blog, HF)* InternLM2-Math - SOTA on math LLMs (90% GPT4 perf.) (X, Demo, Github)* MedArc analysis for best open source use for medical research finds Qwen-72 the best open source doctor (X)* Big CO LLMs + APIs* Google teases LUMIERE - incredibly powerful video generation (TTV and ITV) (X, Blog, ArXiv)* ? HuggingFace announces Google partnership (Announcement)* OpenAi 2 new embeddings models, tweaks turbo models and cuts costs (My analysis, Announcement)* Google to add 3 new AI features to Chrome (X, Blog)* Vision & Video* Adept Fuyu Heavy - Third in the world MultiModal while being 20x smaller than GPT4V, Gemini Ultra (X, Blog)* FireLLaVa - First LLaVa model with commercial permissive license from fireworks (X, Blog, HF, DEMO)* Vikhyatk releases Moondream1 - tiny 1.6B VLM trained on Phi 1 (X, Demo, HF)* This weeks's buzz ?? - What I learned in WandB this week* New course announcement from Jason Liu & WandB - LLM Engineering: Structured Outputs (Course link)* Voice & Audio* Meta W2V-BERT - Speech encoder for low resource languages (announcement)* 11 labs has dubbing studio (my dubbing test)* AI Art & Diffusion & 3D* Instant ID - zero shot face transfer diffusion model (Demo)* ? Hourglass Diffusion (HDiT) paper - High Resolution Image synthesis - (X, Blog, Paper, Github)* Tools & Others* Prophetic announces MORPHEUS-1, their EEG/fMRI multimodal ultrasonic transformer for Lucid Dream induction (Announcement)* NSF announces NAIRR with partnership from all major government agencies & labs including, OAI, WandB (Blog)* Runway adds multiple motion brushes for added creativity (X, How to)Open Source LLMs Stability releases StableLM 1.6B tiny LLMSuper super fast tiny model, I was able to run this in LMStudio that just released an update supporting it, punches above it's weight specifically on other languages like German/Spanish/French/Italian (beats Phi)Has a very surprisingly decent MT-Bench score as wellLicense is not commercial per se, but a specific Stability AI membershipI was able to get above 120tok/sec with this model with LM-Studio and it was quite reasonable and honestly, it’s quite ridiculous how fast we’ve gotten to a point where we have an AI model that can weight less that 1GB and has this level of performance ?Vision & Video & MultimodalityTiny VLM Moonbeam1 (1.6B) performs really well (Demo)New friend of the pod Vik Hyatk trained Moonbeam1, a tiny multimodal VLM with LLaVa on top of Phi 1 (not 2 cause.. issues) and while it's not commercially viable, it's really impressive in how fast and how quite good it is. Here's an example featuring two of my dear friends talking about startups, and you can see how impressive this TINY vision enabled model can understand this scene. This is not cherry picked, this is literally the first image I tried with and my first result. The image features two men sitting in chairs, engaged in a conversation. One man is sitting on the left side of the image, while the other is on the right side. They are both looking at a laptop placed on a table in front of them. The laptop is open and displaying a pres
Released:
Jan 26, 2024
Format:
Podcast episode

Titles in the series (50)

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news