Add converted tokenizer (no trust_remote_code needed)

#12
by ArthurZ HF Staff - opened

Tokenizer Conversion

This PR adds a converted tokenizer that works without trust_remote_code=True.

Conversion Details

  • Converted using: python scripts/convert_tokenizer.py moonshotai/Kimi-Dev-72B --push-to-hub
  • Original tokenizer type: Qwen2Tokenizer
  • Converted tokenizer type: Qwen2Tokenizer

Validation Results

  • Tested on XNLI dataset (500 samples)
  • All samples match 1-1 βœ“

Usage

from transformers import AutoTokenizer

# Now works without trust_remote_code=True
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Kimi-Dev-72B")

Converted with transformers tokenizer conversion script

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment