NVIDIA Conformer-Transducer Large (LoS)

Table of Contents

Click to expand

Summary

The "stt_los_conformer_transducer_large_punctuated" is an acoustic model based on "NVIDIA/stt_es_conformer_transducer_large", suitable for Multilingual Automatic Speech Recognition in the languages for Spain (LoS): Catalan, Spanish, Galician, and Euskera

Model Description

This model transcribes speech in Catalan, Spanish, Galician, and Euskera alphabet, including punctuation, and was fine-tuned on a multilingual LoS dataset comprising 2700 hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters. See the model architecture section and NeMo documentation for complete architecture details.

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan, Spanish, Galician, and Euskera. It is intended to transcribe audio files in those languages to text with punctuation.

Installation

To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.

pip install nemo_toolkit['all']

For Inference

To transcribe audio using this model, you can follow this example:

import nemo.collections.asr as nemo_asr

nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)

Training Details

Training data

The specific datasets used to create the model are: In Catalan:

In Spanish:

In Galician:

In Euskera:

Training procedure

This model is the result of finetuning the base model "Nvidia/stt_es_conformer_transducer_large" by following this tutorial.

Citation

If this model contributes to your research, please cite the work:

@misc{LoS-conformer-transducer-punctuated-BSC-2026,
      title={Languages of Spain ASR Model: stt_los_conformer_transducer_large_punctuated.}, 
      author={Messaoudi, Abir; Solito, Sarah; Hernandez Mena; Carlos, España-Bonet, Cristina},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/stt_los_conformer_transducer_large_punctuated},
      year={2025}
}

Additional Information

Author

The fine-tuning process was performed during 2025 in the Language Technologies Laboratory of the Barcelona Supercomputing Center

For the Catalan Valencian data, we had the collaboration of CENID within the framework of the ILENIA project.

Contact

For further information, please send an email to langtech@bsc.es.

Copyright

Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.

License

Apache-2.0

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BSC-LT/stt_los_conformer_transducer_large_punctuated

Finetuned
(3)
this model

Collection including BSC-LT/stt_los_conformer_transducer_large_punctuated