Polyglot-Lion-0.6B: Compact multilingual ASR for Singapore — English, Mandarin, Tamil & Malay

Paper Project Page GitHub License: MIT

Average error rate comparison across models

CHANGE LOG: This version was retrained on the same dataset without punctuation removal to improve the model’s ability to recognize pauses and sentence boundaries in speech.

About

Polyglot-Lion-0.6B is a compact multilingual automatic speech recognition (ASR) model tailored for the linguistic landscape of Singapore, covering English, Mandarin, Tamil, and Malay. Developed by Quy-Anh Dang and Chris Ngo at Knovel Engineering, the model is presented in the report "Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR".

The model is fine-tuned from Qwen3-ASR-0.6B exclusively on publicly available speech corpora. It achieves an average error rate of 16.52 — halving the base model's error (50.68) and outperforming Whisper-large-v3-turbo (33.04) — and 20× faster inference.

  • Parameters: 0.6B
  • Languages: English, Mandarin, Tamil, Malay
  • Training cost: $81 on a single NVIDIA RTX PRO 6000 (48 h)
  • Inference speed: ~0.10 s/sample on RTX PRO 4500

Methodology

Polyglot-Lion employs a two-stage balanced upsampling strategy to handle severe class imbalance across languages and datasets:

  1. Stage 1 (Intra-language balancing): Within each language, smaller datasets are replicated and subsampled to match the largest dataset in that language.
  2. Stage 2 (Inter-language balancing): Across languages, per-language corpora are balanced so every language contributes equally to the final training set (~969 hours total).

The model deliberately omits language-tag conditioning during training, allowing it to learn to identify languages implicitly from the audio signal, which is critical for deployment-ready multilingual ASR in diverse linguistic environments.

Results

Model Params English (LS) English (NSC) Mandarin (CV) Mandarin (AISH1) Mandarin (AISH3) Mandarin (Fleurs) Tamil (CV) Tamil (SLR65) Tamil (SLR127) Tamil (Fleurs) Malay (Meso.) Malay (Fleurs) Avg
Whisper-large-v3-turbo 0.8B 3.04 32.02 17.91 9.64 16.81 10.63 74.50 58.13 69.56 66.90 28.47 8.88 33.04
SeaLLMs-Audio-7B 7B 94.74 9.53 8.68 9.65 9.76 37.09 126.70 127.24 138.65 105.31 71.34 26.25 63.75
Qwen2.5-Omni-3B 3B 29.21 34.79 46.36 28.25 44.55 54.74 318.36 465.58 448.82 311.67 211.90 74.69 172.37
Qwen2.5-Omni-7B 7B 13.80 22.96 14.49 7.33 22.58 16.68 252.06 239.15 303.96 326.43 158.06 43.92 118.45
Qwen3-ASR-0.6B 0.6B 2.74 7.64 10.06 2.08 2.59 9.75 121.10 127.00 129.12 130.09 47.29 18.71 50.68
Qwen3-ASR-1.7B 1.7B 2.31 6.22 7.50 1.52 2.08 9.33 139.96 134.63 144.49 147.23 39.00 10.87 53.76
MERaLiON-2-10B-ASR 10B 2.54 4.62 8.83 3.09 4.07 11.99 31.78 19.29 22.42 28.68 25.90 8.55 14.32
Polyglot-Lion-0.6B 0.6B 2.67 6.09 6.16 1.93 2.32 9.19 42.16 23.07 28.14 37.68 24.33 14.45 16.52
Polyglot-Lion-1.7B 1.7B 2.10 5.28 4.91 1.45 1.86 8.00 39.19 19.75 26.83 37.28 21.51 9.98 14.85

WER (%) for English, Tamil, and Malay; CER (%) for Mandarin. Lower is better. Bold = best overall.

Quick Start

See mlx-audio for inference.

Citation

@misc{dang2026polyglotlion,
    title={Polyglot-Lion: Efficient Multilingual ASR for Singapore via Balanced Fine-Tuning of Qwen3-ASR}, 
    author={Quy-Anh Dang and Chris Ngo},
    year={2026},
    eprint={2603.16184},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2603.16184}, 
}
Downloads last month
72
Safetensors
Model size
0.3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for knoveleng/polyglot-lion-0.6b-v1.5-mlx-4bit

Quantized
(25)
this model

Datasets used to train knoveleng/polyglot-lion-0.6b-v1.5-mlx-4bit

Collection including knoveleng/polyglot-lion-0.6b-v1.5-mlx-4bit

Paper for knoveleng/polyglot-lion-0.6b-v1.5-mlx-4bit