NVIDIA Conformer-Transducer Large (LoS)
Table of Contents
Click to expand
Summary
The "stt_los_conformer_transducer_large_punctuated" is an acoustic model based on "NVIDIA/stt_es_conformer_transducer_large", suitable for Multilingual Automatic Speech Recognition in the languages for Spain (LoS): Catalan, Spanish, Galician, and Euskera
Model Description
This model transcribes speech in Catalan, Spanish, Galician, and Euskera alphabet, including punctuation, and was fine-tuned on a multilingual LoS dataset comprising 2700 hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters. See the model architecture section and NeMo documentation for complete architecture details.
Intended Uses and Limitations
This model can be used for Automatic Speech Recognition (ASR) in Catalan, Spanish, Galician, and Euskera. It is intended to transcribe audio files in those languages to text with punctuation.
Installation
To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.
pip install nemo_toolkit['all']
For Inference
To transcribe audio using this model, you can follow this example:
import nemo.collections.asr as nemo_asr
nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)
Training Details
Training data
The specific datasets used to create the model are: In Catalan:
- "3CatParla" (To be published soon)
- Parlament-Parla-v3 (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version)
- Corts Valencianes (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version)
- IB3 (To be published soon)
- Common Voice ca 17 Benchmark
In Spanish:
- ciempiess light
- ciempiess fem
- ciempiess complementary
- ciempiess balance
- CHM150
- Tedx spanish
- librivox spanish
- Wikipedia spanish
- voxforge spanish
- Tele con ciencia
- Argentinian Spanish Speech Dataset
- Dimex100 light
- Glissando Spanish
- Herico
- Latino40
- Common voice 17 es
In Galician:
- fleurs-galician
- google_crowdsourced
- Nos_ParlaSpeech-GL (clean)
- Nos_ParlaSpeech-GL (other)
- Nos_TranscriSpeech-GL
- Nos_RG-Podcast-GL
- FalAI (validated split)
- Common Voice 22.0
In Euskera:
Training procedure
This model is the result of finetuning the base model "Nvidia/stt_es_conformer_transducer_large" by following this tutorial.
Citation
If this model contributes to your research, please cite the work:
@misc{LoS-conformer-transducer-punctuated-BSC-2026,
title={Languages of Spain ASR Model: stt_los_conformer_transducer_large_punctuated.},
author={Messaoudi, Abir; Solito, Sarah; Hernandez Mena; Carlos, España-Bonet, Cristina},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/stt_los_conformer_transducer_large_punctuated},
year={2025}
}
Additional Information
Author
The fine-tuning process was performed during 2025 in the Language Technologies Laboratory of the Barcelona Supercomputing Center
For the Catalan Valencian data, we had the collaboration of CENID within the framework of the ILENIA project.
Contact
For further information, please send an email to langtech@bsc.es.
Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.
License
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.
The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.
- Downloads last month
- 7
Model tree for BSC-LT/stt_los_conformer_transducer_large_punctuated
Base model
nvidia/stt_es_conformer_transducer_large