Whisper Large V3 Turbo - Quranic Arabic

Fine-tuned Whisper Large V3 Turbo model for high-accuracy Arabic Quranic speech recognition.

Model Description

This model is a fine-tuned version of openai/whisper-large-v3-turbo trained on the tarteel-ai/everyayah dataset.

Performance

Test Set Evaluation

The model was evaluated on the full test split of the EveryAyah dataset, comprising 23,473 samples.

Metric Value
Word Error Rate (WER) 1.18%
Character Error Rate (CER) 0.34%

Validation Set Evaluation

Performance on the validation split (20,976 samples) during training:

Metric Value
Word Error Rate (WER) 0.86%
Eval Loss 0.0040

Training Details

Training Data

  • Dataset: tarteel-ai/everyayah
  • Train Samples: 167,908
  • Validation Samples: 20,976 (used during training)
  • Test Samples: 23,473 (used for final evaluation)

Training Hyperparameters

LoRA Configuration:

  • LoRA Rank (r): 128
  • LoRA Alpha: 128
  • LoRA Dropout: 0.05
  • Target Modules: q_proj, v_proj, k_proj, out_proj, fc1, fc2

Training:

  • Max Steps: -1
  • Epochs: 2
  • Batch Size: 24
  • Gradient Accumulation: 2
  • Effective Batch Size: 48
  • Learning Rate: 5e-05
  • LR Scheduler: cosine
  • Warmup Steps: 500
  • Weight Decay: 0.01
  • Max Grad Norm: 1.0
  • Optimizer: AdamW Fused

Usage

Quick Start (Recommended)

For audio of any length, including recordings longer than 30 seconds:

from transformers import pipeline
import torch

# Load the model with automatic chunking for long audio
pipe = pipeline(
    "automatic-speech-recognition",
    model="naazimsnh02/whisper-large-v3-turbo-ar-quran",
    device=0,  # Use GPU (or -1 for CPU)
    torch_dtype=torch.float16,
    chunk_length_s=30,      # Process in 30-second chunks
    stride_length_s=5,      # 5-second overlap between chunks
)

# Transcribe audio of any length
result = pipe("long_quran_recitation.wav")
print(result["text"])

For Short Audio (< 30 seconds)

from transformers import pipeline

# Simple usage for short audio
pipe = pipeline(
    "automatic-speech-recognition",
    model="naazimsnh02/whisper-large-v3-turbo-ar-quran",
    device=0
)

result = pipe("short_ayah.wav")
print(result["text"])

Data Processing

  • Audio filtered to โ‰ค 30 seconds
  • Text filtered to โ‰ค 448 tokens (Whisper's limit)
  • Audio resampled to 16kHz
  • Language: Arabic (Quranic)
  • Task: Transcription

Model Architecture

  • Base Model: OpenAI Whisper Large V3 Turbo
  • Parameters: 809M total, ~2M trainable (LoRA)
  • Training Method: LoRA fine-tuning
  • Precision: FP16/BF16
  • Context Length: 30 seconds per chunk

Limitations

  • Optimized for Quranic Arabic: Best performance on formal Quranic recitation
  • Not suitable for: Conversational Arabic, dialects, or non-Quranic content
  • Audio quality: Best results with clear, high-quality recordings

Training Framework

  • Unsloth for 2x faster training
  • Modal for cloud GPU infrastructure
  • Weights & Biases for experiment tracking

Citation

@misc{whisper-quran-2025,
  author = {Syed Naazim Hussain},
  title = {Whisper Large V3 Turbo - Quranic Arabic},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/naazimsnh02/whisper-large-v3-turbo-ar-quran}}
}
Downloads last month
875
Safetensors
Model size
0.8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for naazimsnh02/whisper-large-v3-turbo-ar-quran

Finetuned
(461)
this model

Dataset used to train naazimsnh02/whisper-large-v3-turbo-ar-quran

Spaces using naazimsnh02/whisper-large-v3-turbo-ar-quran 2

Evaluation results

  • Word Error Rate on EveryAyah Quranic Recitations (Test Split)
    self-reported
    1.182
  • Character Error Rate on EveryAyah Quranic Recitations (Test Split)
    self-reported
    0.340
  • Word Error Rate on EveryAyah Quranic Recitations (Validation Split)
    self-reported
    0.864
  • Evaluation Loss on EveryAyah Quranic Recitations (Validation Split)
    self-reported
    0.004