You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3-ASR-1.7B Albanian

Qwen3-ASR-1.7B Albanian

This is a fine-tuned variant of Qwen/Qwen3-ASR-1.7B for Albanian (sq / Shqip) automatic speech recognition.

It is trained on ~550 hours of carefully curated Albanian audio + transcripts.

Intended use

  • ASR / transcription for Albanian speech (general-purpose).
  • Works best on clean speech audio (mono, 16 kHz) with transcripts matching the dataset conventions.

Limitations

  • Performance depends heavily on domain/accent/noise conditions; evaluate on your target audio before deployment.
  • Streaming / timestamps support is provided by the upstream toolkit; quality of timestamps depends on the forced-aligner setup and audio conditions.

Base model (Qwen3-ASR)

From the upstream model card, Qwen3-ASR provides:

  • Language identification + ASR in one model (supports many languages/dialects).
  • Offline + streaming inference modes via the qwen-asr toolkit.
  • Optional timestamps via the Qwen forced-aligner.

See the full upstream documentation here:

How to use (Transformers backend via qwen-asr)

Install:

pip install -U qwen-asr

Load this model from a local folder (this directory) and transcribe:

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "./",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    max_inference_batch_size=32,
    max_new_tokens=256,
)

results = model.transcribe(
    audio="path/to/audio.wav",
    language="Albanian",  # or None for auto language ID
)

print(results[0].language)
print(results[0].text)

Training data

  • Language: Albanian (sq)
  • Duration: ~550 hours
  • Type: supervised ASR fine-tuning (audio → transcript)

Training recipe (high level)

Extracted from saved transformers training artifacts in this folder:

  • Epochs: 2
  • Learning rate: 2e-5
  • Batch size (per device): 8
  • Grad accumulation: 1
  • Precision: BF16
  • Optimizer: adamw_torch_fused
  • Seed: 42

Notes

  • Qwen3-ASR fine-tuning data typically uses transcripts with a language prefix like language Albanian<asr_text>... (see upstream docs); this fine-tune follows that convention.
Downloads last month
42
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kushtrim/Qwen3-ASR-1.7B-Albanian

Finetuned
(2)
this model

Space using Kushtrim/Qwen3-ASR-1.7B-Albanian 1

Collections including Kushtrim/Qwen3-ASR-1.7B-Albanian