Whisper Large V3 Turbo - Quranic Arabic
Fine-tuned Whisper Large V3 Turbo model for high-accuracy Arabic Quranic speech recognition.
Model Description
This model is a fine-tuned version of openai/whisper-large-v3-turbo trained on the tarteel-ai/everyayah dataset.
Performance
Test Set Evaluation
The model was evaluated on the full test split of the EveryAyah dataset, comprising 23,473 samples.
| Metric | Value |
|---|---|
| Word Error Rate (WER) | 1.18% |
| Character Error Rate (CER) | 0.34% |
Validation Set Evaluation
Performance on the validation split (20,976 samples) during training:
| Metric | Value |
|---|---|
| Word Error Rate (WER) | 0.86% |
| Eval Loss | 0.0040 |
Training Details
Training Data
- Dataset: tarteel-ai/everyayah
- Train Samples: 167,908
- Validation Samples: 20,976 (used during training)
- Test Samples: 23,473 (used for final evaluation)
Training Hyperparameters
LoRA Configuration:
- LoRA Rank (r): 128
- LoRA Alpha: 128
- LoRA Dropout: 0.05
- Target Modules: q_proj, v_proj, k_proj, out_proj, fc1, fc2
Training:
- Max Steps: -1
- Epochs: 2
- Batch Size: 24
- Gradient Accumulation: 2
- Effective Batch Size: 48
- Learning Rate: 5e-05
- LR Scheduler: cosine
- Warmup Steps: 500
- Weight Decay: 0.01
- Max Grad Norm: 1.0
- Optimizer: AdamW Fused
Usage
Quick Start (Recommended)
For audio of any length, including recordings longer than 30 seconds:
from transformers import pipeline
import torch
# Load the model with automatic chunking for long audio
pipe = pipeline(
"automatic-speech-recognition",
model="naazimsnh02/whisper-large-v3-turbo-ar-quran",
device=0, # Use GPU (or -1 for CPU)
torch_dtype=torch.float16,
chunk_length_s=30, # Process in 30-second chunks
stride_length_s=5, # 5-second overlap between chunks
)
# Transcribe audio of any length
result = pipe("long_quran_recitation.wav")
print(result["text"])
For Short Audio (< 30 seconds)
from transformers import pipeline
# Simple usage for short audio
pipe = pipeline(
"automatic-speech-recognition",
model="naazimsnh02/whisper-large-v3-turbo-ar-quran",
device=0
)
result = pipe("short_ayah.wav")
print(result["text"])
Data Processing
- Audio filtered to โค 30 seconds
- Text filtered to โค 448 tokens (Whisper's limit)
- Audio resampled to 16kHz
- Language: Arabic (Quranic)
- Task: Transcription
Model Architecture
- Base Model: OpenAI Whisper Large V3 Turbo
- Parameters: 809M total, ~2M trainable (LoRA)
- Training Method: LoRA fine-tuning
- Precision: FP16/BF16
- Context Length: 30 seconds per chunk
Limitations
- Optimized for Quranic Arabic: Best performance on formal Quranic recitation
- Not suitable for: Conversational Arabic, dialects, or non-Quranic content
- Audio quality: Best results with clear, high-quality recordings
Training Framework
- Unsloth for 2x faster training
- Modal for cloud GPU infrastructure
- Weights & Biases for experiment tracking
Citation
@misc{whisper-quran-2025,
author = {Syed Naazim Hussain},
title = {Whisper Large V3 Turbo - Quranic Arabic},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/naazimsnh02/whisper-large-v3-turbo-ar-quran}}
}
- Downloads last month
- 875
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for naazimsnh02/whisper-large-v3-turbo-ar-quran
Base model
openai/whisper-large-v3
Finetuned
openai/whisper-large-v3-turbo Dataset used to train naazimsnh02/whisper-large-v3-turbo-ar-quran
Spaces using naazimsnh02/whisper-large-v3-turbo-ar-quran 2
Evaluation results
- Word Error Rate on EveryAyah Quranic Recitations (Test Split)self-reported1.182
- Character Error Rate on EveryAyah Quranic Recitations (Test Split)self-reported0.340
- Word Error Rate on EveryAyah Quranic Recitations (Validation Split)self-reported0.864
- Evaluation Loss on EveryAyah Quranic Recitations (Validation Split)self-reported0.004