Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

FromPapers Read on AI

Start listening View podcast show

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

FromPapers Read on AI

ratings:

Length:

28 minutes

Released:

Apr 25, 2024

Format:

Podcast episode

Description

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited context length. Megalodon inherits the architecture of Mega (exponential moving average with gated attention), and further introduces multiple technical components to improve its capability and stability, including complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism and pre-norm with two-hop residual configuration. In a controlled head-to-head comparison with Llama2, Megalodon achieves better efficiency than Transformer in the scale of 7 billion parameters and 2 trillion training tokens. Megalodon reaches a training loss of 1.70, landing mid-way between Llama2-7B (1.75) and 13B (1.67). Code: https://github.com/XuezheMax/megalodon

2024: Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

https://arxiv.org/pdf/2404.08801v2.pdf

Released:

Apr 25, 2024

Format:

Podcast episode

Titles in the series (100)

Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.

Skip carousel

Related podcast episodes

Skip carousel

Discover this podcast and so much more

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Description

Titles in the series (100)

More Episodes from Papers Read on AI

Related podcast episodes