fuck quadratic attention - a mishig Collection

mishig 's Collections

most ducked models 🦆🦆🦆

A little guide to building Large Language Models in 2024

fuck quadratic attention

fuck quadratic attention

updated Apr 24, 2024

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11, 2024 • 48
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 111
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 56
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Paper • 2006.16236 • Published Jun 29, 2020 • 4
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 3
CoLT5: Faster Long-Range Transformers with Conditional Computation

Paper • 2303.09752 • Published Mar 17, 2023 • 2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6, 2024 • 15
The Illusion of State in State-Space Models

Paper • 2404.08819 • Published Apr 12, 2024 • 1