Audar-ASR-D-Turbo (Exp D · ckpt-10000)

LoRA adapter fine-tuned on audarai/Audar3-ASR-1.7B for multi-dialect Arabic ASR. Selected checkpoint: checkpoint-10000 from training run 250224 — the best balanced checkpoint still available (earlier ckpts were pruned by save_total_limit=10).

Held-out WER (n=1255 reliable, per-dialect stratified)

Metric Baseline This ckpt Δ
Overall 49.5% 38.6% −10.9
MSA (`< arb >`) 10.0%
Moroccan (`< ary >`) 74.5%
Egyptian (`< arz >`) 37.0%

Baseline = the frozen audarai/Audar3-ASR-1.7B evaluated on the same held-out set.

Training details

  • Method: LoRA (PEFT 0.11+) with modules_to_save=[embed_tokens, lm_head] for 8 new dialect tokens
  • LoRA config: rank 32, alpha 64, dropout 0.05, target_modules all-linear
  • Optimizer: AdamW, LR 5e-5, cosine schedule, warmup_ratio 0.05
  • Epochs: 3 over 1.63M samples (natural distribution — no upsampling)
  • Hardware: 8 nodes × 8 H100 (DeepSpeed ZeRO-2)
  • Runtime: 3h 26m
  • Framework: ms-swift 4.1.2

Dialect tokens

Prepend as system prompt (the raw token is sufficient; a natural-language instruction — e.g. "Transcribe Moroccan Darija speech. <|ary|>" — is being evaluated in a follow-on experiment).

Token Variety
`< arb
`< ary
`< arz
`< ars
`< apc
`< acm
`< aeb
`< arq

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "audarai/Audar3-ASR-1.7B",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("audarai/Audar3-ASR-1.7B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "audarai/audar-asr-d-turbo")

messages = [
    {"role": "system", "content": "<|ary|>"},   # dialect hint
    {"role": "user",   "content": "<audio>"},
]
# ... feed audio through your normal ms-swift inference pipeline ...

Or via the ms-swift CLI:

swift infer \
    --model  audarai/Audar3-ASR-1.7B \
    --adapters audarai/audar-asr-d-turbo \
    --val_dataset your_val.jsonl \
    --new_special_tokens tokens/dialect_tokens.txt

Intended use

  • Benchmarking / comparison against other Arabic dialect ASR systems.
  • Research on low-resource dialect adaptation, system-prompt conditioning, LoRA over audio-LLMs.
  • Not a production-ready ASR — Moroccan WER (65.3%) is still high; deploy only with fallback logic.

Limitations

  • Moroccan Darija (ary) WER remains high (65.3%) due to limited training data (41k samples / 2.5% of dataset).
  • Tiny-dialect tokens (<|aeb|>, <|arq|>) were trained on <30k samples each; expect noisy per-segment behaviour.
  • System prompt is currently just the raw dialect token. A natural-language prompt (being evaluated in Exp F) may materially improve zero-shot dialect switching and low-resource dialect accuracy.
  • MSA (arb) regressed 2 pts vs baseline — an artifact of modules_to_save=[embed_tokens, lm_head] at LR 5e-5; a differential LR (5e-6 for embeddings, 5e-5 for adapter) is the canonical fix.

Citation

If you use this adapter, please cite:

@misc{audar-asr-d-turbo-2026,
    title  = {Audar-ASR-D-Turbo: Multi-dialect Arabic ASR via LoRA on Audar3},
    author = {Solanki, Vikram},
    year   = {2026},
    howpublished = {\url{https://huggingface.co/audarai/audar-asr-d-turbo}},
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for audarai/audar-asr-d-turbo

Adapter
(1)
this model