Automatic Speech Recognition
PEFT
Safetensors
Arabic
arabic
asr
audio
dialect
moroccan
egyptian
gulf
levantine
maghrebi
iraqi
lora
ms-swift
Instructions to use audarai/audar-asr-d-turbo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use audarai/audar-asr-d-turbo with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/scratch/vikram.solanki/workspace/vs/asr/model_checkpoints/audar3-asr-1.7b") model = PeftModel.from_pretrained(base_model, "audarai/audar-asr-d-turbo") - Notebooks
- Google Colab
- Kaggle
Audar-ASR-D-Turbo (Exp D · ckpt-10000)
LoRA adapter fine-tuned on audarai/Audar3-ASR-1.7B for multi-dialect Arabic ASR.
Selected checkpoint: checkpoint-10000 from training run 250224 — the best balanced checkpoint still available (earlier ckpts were pruned by save_total_limit=10).
Held-out WER (n=1255 reliable, per-dialect stratified)
| Metric | Baseline | This ckpt | Δ |
|---|---|---|---|
| Overall | 49.5% | 38.6% | −10.9 |
| MSA (`< | arb | >`) | 10.0% |
| Moroccan (`< | ary | >`) | 74.5% |
| Egyptian (`< | arz | >`) | 37.0% |
Baseline = the frozen audarai/Audar3-ASR-1.7B evaluated on the same held-out set.
Training details
- Method: LoRA (PEFT 0.11+) with
modules_to_save=[embed_tokens, lm_head]for 8 new dialect tokens - LoRA config: rank 32, alpha 64, dropout 0.05, target_modules
all-linear - Optimizer: AdamW, LR 5e-5, cosine schedule, warmup_ratio 0.05
- Epochs: 3 over 1.63M samples (natural distribution — no upsampling)
- Hardware: 8 nodes × 8 H100 (DeepSpeed ZeRO-2)
- Runtime: 3h 26m
- Framework: ms-swift 4.1.2
Dialect tokens
Prepend as system prompt (the raw token is sufficient; a natural-language instruction — e.g. "Transcribe Moroccan Darija speech. <|ary|>" — is being evaluated in a follow-on experiment).
| Token | Variety |
|---|---|
| `< | arb |
| `< | ary |
| `< | arz |
| `< | ars |
| `< | apc |
| `< | acm |
| `< | aeb |
| `< | arq |
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"audarai/Audar3-ASR-1.7B",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
tok = AutoTokenizer.from_pretrained("audarai/Audar3-ASR-1.7B", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "audarai/audar-asr-d-turbo")
messages = [
{"role": "system", "content": "<|ary|>"}, # dialect hint
{"role": "user", "content": "<audio>"},
]
# ... feed audio through your normal ms-swift inference pipeline ...
Or via the ms-swift CLI:
swift infer \
--model audarai/Audar3-ASR-1.7B \
--adapters audarai/audar-asr-d-turbo \
--val_dataset your_val.jsonl \
--new_special_tokens tokens/dialect_tokens.txt
Intended use
- Benchmarking / comparison against other Arabic dialect ASR systems.
- Research on low-resource dialect adaptation, system-prompt conditioning, LoRA over audio-LLMs.
- Not a production-ready ASR — Moroccan WER (65.3%) is still high; deploy only with fallback logic.
Limitations
- Moroccan Darija (ary) WER remains high (65.3%) due to limited training data (41k samples / 2.5% of dataset).
- Tiny-dialect tokens (
<|aeb|>,<|arq|>) were trained on <30k samples each; expect noisy per-segment behaviour. - System prompt is currently just the raw dialect token. A natural-language prompt (being evaluated in Exp F) may materially improve zero-shot dialect switching and low-resource dialect accuracy.
- MSA (arb) regressed 2 pts vs baseline — an artifact of
modules_to_save=[embed_tokens, lm_head]at LR 5e-5; a differential LR (5e-6 for embeddings, 5e-5 for adapter) is the canonical fix.
Citation
If you use this adapter, please cite:
@misc{audar-asr-d-turbo-2026,
title = {Audar-ASR-D-Turbo: Multi-dialect Arabic ASR via LoRA on Audar3},
author = {Solanki, Vikram},
year = {2026},
howpublished = {\url{https://huggingface.co/audarai/audar-asr-d-turbo}},
}
- Downloads last month
- 14
Model tree for audarai/audar-asr-d-turbo
Base model
audarai/Audar3-ASR-1.7B