fuck quadratic attention
updated
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published
• 40
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published
• 48
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published
• 111
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published
• 56
Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention
Paper
• 2006.16236
• Published
• 4
Scaling Transformer to 1M tokens and beyond with RMT
Paper
• 2304.11062
• Published
• 3
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
• 2303.09752
• Published
• 2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
Mimicry
Paper
• 2402.04347
• Published
• 15
The Illusion of State in State-Space Models
Paper
• 2404.08819
• Published
• 1