Whisper Large V3 Turbo - Quranic Arabic

Fine-tuned Whisper Large V3 Turbo model for high-accuracy Arabic Quranic speech recognition.

Model Description

This model is a fine-tuned version of openai/whisper-large-v3-turbo trained on the tarteel-ai/everyayah dataset.

Performance

Test Set Evaluation

The model was evaluated on the full test split of the EveryAyah dataset, comprising 23,473 samples.

Metric	Value
Word Error Rate (WER)	1.18%
Character Error Rate (CER)	0.34%

Validation Set Evaluation

Performance on the validation split (20,976 samples) during training:

Metric	Value
Word Error Rate (WER)	0.86%
Eval Loss	0.0040

Training Details

Training Data

Dataset: tarteel-ai/everyayah
Train Samples: 167,908
Validation Samples: 20,976 (used during training)
Test Samples: 23,473 (used for final evaluation)

Training Hyperparameters

LoRA Configuration:

LoRA Rank (r): 128
LoRA Alpha: 128
LoRA Dropout: 0.05
Target Modules: q_proj, v_proj, k_proj, out_proj, fc1, fc2

Training:

Max Steps: -1
Epochs: 2
Batch Size: 24
Gradient Accumulation: 2
Effective Batch Size: 48
Learning Rate: 5e-05
LR Scheduler: cosine
Warmup Steps: 500
Weight Decay: 0.01
Max Grad Norm: 1.0
Optimizer: AdamW Fused

Usage

Quick Start (Recommended)

For audio of any length, including recordings longer than 30 seconds:

from transformers import pipeline
import torch

# Load the model with automatic chunking for long audio
pipe = pipeline(
    "automatic-speech-recognition",
    model="naazimsnh02/whisper-large-v3-turbo-ar-quran",
    device=0,  # Use GPU (or -1 for CPU)
    torch_dtype=torch.float16,
    chunk_length_s=30,      # Process in 30-second chunks
    stride_length_s=5,      # 5-second overlap between chunks
)

# Transcribe audio of any length
result = pipe("long_quran_recitation.wav")
print(result["text"])

For Short Audio (< 30 seconds)

from transformers import pipeline

# Simple usage for short audio
pipe = pipeline(
    "automatic-speech-recognition",
    model="naazimsnh02/whisper-large-v3-turbo-ar-quran",
    device=0
)

result = pipe("short_ayah.wav")
print(result["text"])

Data Processing

Audio filtered to ≤ 30 seconds
Text filtered to ≤ 448 tokens (Whisper's limit)
Audio resampled to 16kHz
Language: Arabic (Quranic)
Task: Transcription

Model Architecture

Base Model: OpenAI Whisper Large V3 Turbo
Parameters: 809M total, ~2M trainable (LoRA)
Training Method: LoRA fine-tuning
Precision: FP16/BF16
Context Length: 30 seconds per chunk

Limitations

Optimized for Quranic Arabic: Best performance on formal Quranic recitation
Not suitable for: Conversational Arabic, dialects, or non-Quranic content
Audio quality: Best results with clear, high-quality recordings

Training Framework

Unsloth for 2x faster training
Modal for cloud GPU infrastructure
Weights & Biases for experiment tracking

Citation

@misc{whisper-quran-2025,
  author = {Syed Naazim Hussain},
  title = {Whisper Large V3 Turbo - Quranic Arabic},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/naazimsnh02/whisper-large-v3-turbo-ar-quran}}
}

Downloads last month: 875

Safetensors

Model size

0.8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naazimsnh02/whisper-large-v3-turbo-ar-quran

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(461)

this model

Dataset used to train naazimsnh02/whisper-large-v3-turbo-ar-quran

Spaces using naazimsnh02/whisper-large-v3-turbo-ar-quran 2

Evaluation results

Word Error Rate on EveryAyah Quranic Recitations (Test Split)
self-reported

1.182
Character Error Rate on EveryAyah Quranic Recitations (Test Split)
self-reported

0.340
Word Error Rate on EveryAyah Quranic Recitations (Validation Split)
self-reported

0.864
Evaluation Loss on EveryAyah Quranic Recitations (Validation Split)
self-reported

0.004