Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
Paper
• 2405.03594 • Published
• 7
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
Note Our sparse Llama 2 7B base that was pruned to 50% sparsity and retrained on 50B tokens.
Note Our sparse Llama 2 7B base that was pruned to 70% sparsity and retrained on 150B tokens.