Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

? ThursdAI - Dec 28 - a BUNCH of new multimodal OSS, OpenAI getting sued by NYT, and our next year predictions

? ThursdAI - Dec 28 - a BUNCH of new multimodal OSS, OpenAI getting sued by NYT, and our next year predictions

FromThursdAI - The top AI news from the past week


? ThursdAI - Dec 28 - a BUNCH of new multimodal OSS, OpenAI getting sued by NYT, and our next year predictions

FromThursdAI - The top AI news from the past week

ratings:
Length:
94 minutes
Released:
Dec 29, 2023
Format:
Podcast episode

Description

Hey hey hey (no longer ho ho ho ?) hope you had a great Christmas! And you know that many AI folks have dropped tons of OpenSource AI goodies for Christmas, here’s quite a list of new things, including at least 3 new multi-modal models, a dataset and a paper/technical report from the current top model on HF leaderboard from Upstage. We also had the pleasure to interview the folks who released the Robin suite of multi-modals and aligning them to “good responses” and that full interview is coming to ThursdAI soon so stay tuned.And we had a full 40 minutes with an open stage to get predictions for 2024 in the world of AI, which we fully intent to cover next year, so scroll all the way down to see ours, and reply/comment with yours! TL;DR of all topics covered: * Open Source LLMs * Uform - tiny(1B) multimodal embeddings and models that can run on device (HF, Blog, Github, Demo)* Notux 8x7B - one of the first Mixtral DPO fine-tunes - (Thread, Demo)* Upstage SOLAR 10.7B technical report (arXiv, X discussion, followup)* Capybara dataset open sourced by LDJ (Thread, HF)* Nous Hermes 34B (finetunes Yi34B) - (Thread, HF)* Open Source long context pressure test analysis (Reddit)* Robin - a suite of multi-modal (Vision-Language) models - (Thread, Blogpost, HF)* Big CO LLMs + APIs* Apple open sources ML-Ferret multi-modal model with referring and grounding capabilities (Github, Weights, Paper)* OpenAI & Microsoft are getting sued by NewYorkTimes for copyright infringement during training (Full Suit)* AI Art & Diffusion & 3D* Midjourney v6 alpha is really good at recreating scenes from movies (thread)Open Source LLMs Open source doesn't stop even during the holiday break! Maybe this is the time to catch up to the big companies? During the holiday periods? This week we got a new 34B Nous Hermes model, the first DPO fine-tune of Mixtral, Capybara dataset but by far the biggest news of this week was in Multimodality. Apple quietly open sourced ml-ferret, an any to any model able to compete in grounding with even GPT4-V sometimes, Uform released tiny mutli-modal and embeddings versions for on device inference, and AGI collective gave NousHermes 2.5 eyes ?There's no doubt that 24' is going to be the year of multimodality, and this week we saw an early start of that right on ThursdAI. Ml-Ferret from Apple (Github, Weights, Paper)Apple has been in the open source news lately, as we've covered their MLX release previously and the LLM in a flash paper that discusses inference for low hardware devices, and Apple folks had 1 more gift to give. Ml-Ferret is a multimodal grounding model, based on Vicuna (for some... reason?) which is able to get referrals from images (this highlighted or annotated areas) and then ground the responses with exact coordinates and boxes. The interesting thing about the referring, is that it can be any shape, bounding box or even irregular shape (like the ferred in the above example or cat tail below) Ferret was trained on a large new dataset called GRIT containing over 1 million examples of referring to and describing image regions (which wasn't open sourced AFAIK yet)According to Ariel Lee (our panelist) these weights are only delta weights and need to be combined with Vicuna weights to be able to run the full Ferret model properly. Uform - tiny (1.5B) MLLMs + vision embeddings (HF, Blog, Github, Demo)The folks at Unum have released a few gifts for us, with an apache 2.0 license ? Specifically they released 3 vision embeddings models, and 2 generative models. Per the documentation the embeddings can yield 2,3x speedup improvements to search from Clip like models, and 2-4x inference speed improvements given the tiny size. The embeddings have a multi-lingual version as well supporting well over 20 languages. The generative models can be used for image captioning, and since they are tiny, they are focused on running on device, and are already converted to ONNX format and core-ML format. Seen the results below compared to LLaV
Released:
Dec 29, 2023
Format:
Podcast episode

Titles in the series (49)

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news