NVIDIA Conformer-Transducer Large (LoS)

Click to expand

Model Description
Intended Uses and Limitations
How to Get Started with the Model
Training Details
Citation
Additional Information

Summary

The "stt_los_conformer_transducer_large_punctuated" is an acoustic model based on "NVIDIA/stt_es_conformer_transducer_large", suitable for Multilingual Automatic Speech Recognition in the languages for Spain (LoS): Catalan, Spanish, Galician, and Euskera

Model Description

This model transcribes speech in Catalan, Spanish, Galician, and Euskera alphabet, including punctuation, and was fine-tuned on a multilingual LoS dataset comprising 2700 hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters. See the model architecture section and NeMo documentation for complete architecture details.

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan, Spanish, Galician, and Euskera. It is intended to transcribe audio files in those languages to text with punctuation.

Installation

To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.

pip install nemo_toolkit['all']

For Inference

To transcribe audio using this model, you can follow this example:

import nemo.collections.asr as nemo_asr

nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)

Training Details

Training data

The specific datasets used to create the model are: In Catalan:

"3CatParla" (To be published soon)
Parlament-Parla-v3 (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version)
Corts Valencianes (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version)
IB3 (To be published soon)
Common Voice ca 17 Benchmark

In Spanish:

In Galician:

In Euskera:

composite_corpus_eu_v2.1

Training procedure

This model is the result of finetuning the base model "Nvidia/stt_es_conformer_transducer_large" by following this tutorial.

Citation

If this model contributes to your research, please cite the work:

@misc{LoS-conformer-transducer-punctuated-BSC-2026,
      title={Languages of Spain ASR Model: stt_los_conformer_transducer_large_punctuated.}, 
      author={Messaoudi, Abir; Solito, Sarah; Hernandez Mena; Carlos, España-Bonet, Cristina},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/stt_los_conformer_transducer_large_punctuated},
      year={2025}
}

Additional Information

Author

The fine-tuning process was performed during 2025 in the Language Technologies Laboratory of the Barcelona Supercomputing Center

For the Catalan Valencian data, we had the collaboration of CENID within the framework of the ILENIA project.

Contact

For further information, please send an email to langtech@bsc.es.

Copyright

License

Apache-2.0

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.

Downloads last month: 7

Model tree for BSC-LT/stt_los_conformer_transducer_large_punctuated

Base model

nvidia/stt_es_conformer_transducer_large

Finetuned

(3)

this model

Collection including BSC-LT/stt_los_conformer_transducer_large_punctuated

Speech models

Collection

Models developed by the speech team of the Language Technologies unit • 21 items • Updated 9 days ago

BSC-LT
/

stt_los_conformer_transducer_large_punctuated