Sequential Modeling Enables Scalable Learning for Large Vision Models

FromPapers Read on AI

Start listening View podcast show

Sequential Modeling Enables Scalable Learning for Large Vision Models

FromPapers Read on AI

ratings:

Length:

35 minutes

Released:

Dec 12, 2023

Format:

Podcast episode

Description

We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. To do this, we define a common format,"visual sentences", in which we can represent raw images and videos as well as annotated data sources such as semantic segmentations and depth reconstructions without needing any meta-knowledge beyond the pixels. Once this wide variety of visual data (comprising 420 billion tokens) is represented as sequences, the model can be trained to minimize a cross-entropy loss for next token prediction. By training across various scales of model architecture and data diversity, we provide empirical evidence that our models scale effectively. Many different vision tasks can be solved by designing suitable visual prompts at test time.

2023: Yutong Bai, Xinyang Geng, K. Mangalam, Amir Bar, Alan Yuille, Trevor Darrell, Jitendra Malik, Alexei A. Efros

https://arxiv.org/pdf/2312.00785v1.pdf

Released:

Dec 12, 2023

Format:

Podcast episode

Titles in the series (100)

Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Sequential Modeling Enables Scalable Learning for Large Vision Models

Sequential Modeling Enables Scalable Learning for Large Vision Models

Description

Titles in the series (100)

More Episodes from Papers Read on AI

Related podcast episodes