DeepSeek-V2-Lite-MoNE-48-zyda2-100

This repository contains a structured pruned variant of DeepSeek-V2-Lite using the MoNE (Mixture-of-Novice Experts) framework proposed in our paper.

*## Model Overview

Base Model: DeepSeek-V2-Lite
Method: MoNE structured expert pruning
Remaining Experts: 48
Calibration Set: zyda2-100
Architecture: Mixture-of-Experts (MoE)
Framework: Transformers-compatible

This checkpoint replaces redundant experts with lightweight novice experts via structured pruning, aiming to reduce compute while preserving performance.

Paper

Title: MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Authors: Geng Zhang, Yuxuan Han, Yuxuan Lou, Yiqi Zhang, Wangbo Zhao, Yang You
arXiv: arXiv:2507.00390

Downloads last month: 13

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for MoNE-Pruning/DeepSeek-V2-Lite-MoNE-48-zyda2-100

Base model

deepseek-ai/DeepSeek-V2-Lite

Finetuned

(17)

this model

Paper for MoNE-Pruning/DeepSeek-V2-Lite-MoNE-48-zyda2-100

MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE

Paper • 2507.00390 • Published Jul 1, 2025 • 2