Automatic Speech Recognition
Transformers
Safetensors
Hindi
English
whisper
speech-recognition
hindi
hinglish
Instructions to use Svetozar1993/HindiSTT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Svetozar1993/HindiSTT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Svetozar1993/HindiSTT")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Svetozar1993/HindiSTT") model = AutoModelForSpeechSeq2Seq.from_pretrained("Svetozar1993/HindiSTT") - Notebooks
- Google Colab
- Kaggle
HindiSTT
A fine-tuned Whisper model for Hindi speech-to-text transcription, outputting Hinglish (Hindi written in Roman script).
Model Description
This model transcribes Hindi audio into romanized text (Hinglish), making it easier to read and process Hindi speech without requiring Devanagari script support.
Example Output:
- Audio: [Hindi speech saying "नमस्ते, आप कैसे हैं?"]
- Output:
namaste, aap kaise hain?
Key Features
- Hinglish Output: Transcribes audio into spoken Hinglish language
- Whisper Architecture: Based on Whisper Large V3, compatible with transformers
- Noise Resistant: Handles noisy audio environments well
- Low Hallucination: Minimizes transcription hallucinations
Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "Svetozar1993/HindiSTT"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
generate_kwargs={"task": "transcribe", "language": "en"}
)
result = pipe("audio.wav")
print(result["text"])
Flash Attention 2
For faster inference with Flash Attention:
pip install flash-attn --no-build-isolation
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True,
attn_implementation="flash_attention_2"
)
Model Details
- Base Model: Whisper Large V3
- Language: Hindi (Romanized/Hinglish output)
- Parameters: 1.5B
- License: Apache 2.0
Author
Svetozar1993
- Downloads last month
- 152