Audar-ASR-D-Turbo (Exp D · ckpt-10000)

LoRA adapter fine-tuned on audarai/Audar3-ASR-1.7B for multi-dialect Arabic ASR. Selected checkpoint: checkpoint-10000 from training run 250224 — the best balanced checkpoint still available (earlier ckpts were pruned by save_total_limit=10).

Held-out WER (n=1255 reliable, per-dialect stratified)

Metric	Baseline	This ckpt	Δ
Overall	49.5%	38.6%	−10.9
MSA (`<	arb	>`)	10.0%
Moroccan (`<	ary	>`)	74.5%
Egyptian (`<	arz	>`)	37.0%

Baseline = the frozen audarai/Audar3-ASR-1.7B evaluated on the same held-out set.

Training details

Method: LoRA (PEFT 0.11+) with modules_to_save=[embed_tokens, lm_head] for 8 new dialect tokens
LoRA config: rank 32, alpha 64, dropout 0.05, target_modules all-linear
Optimizer: AdamW, LR 5e-5, cosine schedule, warmup_ratio 0.05
Epochs: 3 over 1.63M samples (natural distribution — no upsampling)
Hardware: 8 nodes × 8 H100 (DeepSpeed ZeRO-2)
Runtime: 3h 26m
Framework: ms-swift 4.1.2

Dialect tokens

Prepend as system prompt (the raw token is sufficient; a natural-language instruction — e.g. "Transcribe Moroccan Darija speech. <|ary|>" — is being evaluated in a follow-on experiment).

Token	Variety
`<	arb
`<	ary
`<	arz
`<	ars
`<	apc
`<	acm
`<	aeb
`<	arq

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "audarai/Audar3-ASR-1.7B",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
tok = AutoTokenizer.from_pretrained("audarai/Audar3-ASR-1.7B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "audarai/audar-asr-d-turbo")

messages = [
    {"role": "system", "content": "<|ary|>"},   # dialect hint
    {"role": "user",   "content": "<audio>"},
]
# ... feed audio through your normal ms-swift inference pipeline ...

Or via the ms-swift CLI:

swift infer \
    --model  audarai/Audar3-ASR-1.7B \
    --adapters audarai/audar-asr-d-turbo \
    --val_dataset your_val.jsonl \
    --new_special_tokens tokens/dialect_tokens.txt

Intended use

Benchmarking / comparison against other Arabic dialect ASR systems.
Research on low-resource dialect adaptation, system-prompt conditioning, LoRA over audio-LLMs.
Not a production-ready ASR — Moroccan WER (65.3%) is still high; deploy only with fallback logic.

Limitations

Moroccan Darija (ary) WER remains high (65.3%) due to limited training data (41k samples / 2.5% of dataset).
Tiny-dialect tokens (<|aeb|>, <|arq|>) were trained on <30k samples each; expect noisy per-segment behaviour.
System prompt is currently just the raw dialect token. A natural-language prompt (being evaluated in Exp F) may materially improve zero-shot dialect switching and low-resource dialect accuracy.
MSA (arb) regressed 2 pts vs baseline — an artifact of modules_to_save=[embed_tokens, lm_head] at LR 5e-5; a differential LR (5e-6 for embeddings, 5e-5 for adapter) is the canonical fix.

Citation

If you use this adapter, please cite:

@misc{audar-asr-d-turbo-2026,
    title  = {Audar-ASR-D-Turbo: Multi-dialect Arabic ASR via LoRA on Audar3},
    author = {Solanki, Vikram},
    year   = {2026},
    howpublished = {\url{https://huggingface.co/audarai/audar-asr-d-turbo}},
}

Downloads last month: 14

Model tree for audarai/audar-asr-d-turbo

Base model

audarai/Audar3-ASR-1.7B

Adapter

(1)

this model