DeepSeek-V2-Lite-MoNE-48-zyda2-100

This repository contains a structured pruned variant of DeepSeek-V2-Lite using the MoNE (Mixture-of-Novice Experts) framework proposed in our paper.

*## Model Overview

  • Base Model: DeepSeek-V2-Lite
  • Method: MoNE structured expert pruning
  • Remaining Experts: 48
  • Calibration Set: zyda2-100
  • Architecture: Mixture-of-Experts (MoE)
  • Framework: Transformers-compatible

This checkpoint replaces redundant experts with lightweight novice experts via structured pruning, aiming to reduce compute while preserving performance.

Paper

Title: MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Authors: Geng Zhang, Yuxuan Han, Yuxuan Lou, Yiqi Zhang, Wangbo Zhao, Yang You
arXiv: arXiv:2507.00390

Downloads last month
13
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MoNE-Pruning/DeepSeek-V2-Lite-MoNE-48-zyda2-100

Finetuned
(17)
this model

Paper for MoNE-Pruning/DeepSeek-V2-Lite-MoNE-48-zyda2-100