---
license: apache-2.0
language:
- ca
- es
- gl
- eu
base_model:
- nvidia/stt_es_conformer_transducer_large
tags:
- automatic-speech-recognition
- NeMo
---
# NVIDIA Conformer-Transducer Large (LoS)

## Table of Contents
<details>
<summary>Click to expand</summary>

- [Model Description](#model-description)
- [Intended Uses and Limitations](#intended-uses-and-limitations)
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
- [Training Details](#training-details)
- [Citation](#citation)
- [Additional Information](#additional-information)

</details>

## Summary

The "stt_los_conformer_transducer_large_punctuated" is an acoustic model based on ["NVIDIA/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large/), suitable for Multilingual Automatic Speech Recognition in the languages for Spain (LoS): Catalan, Spanish, Galician, and Euskera

## Model Description

This model transcribes speech in Catalan, Spanish, Galician, and Euskera alphabet, including punctuation, and was fine-tuned on a multilingual LoS dataset comprising 2700 hours. It is a "large" variant of Conformer-Transducer, with around 120 million parameters.
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.

## Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan, Spanish, Galician, and Euskera. It is intended to transcribe audio files in those languages to text with punctuation.

### Installation

To use this model, install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed the latest PyTorch version.
```
pip install nemo_toolkit['all']
``` 

### For Inference
To transcribe audio using this model, you can follow this example:


```python
import nemo.collections.asr as nemo_asr

nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)
```

## Training Details

### Training data

The specific datasets used to create the model are:
In Catalan:
- ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr) (To be published soon)
- [Parlament-Parla-v3](https://huggingface.co/datasets/projecte-aina/parlament_parla_v3) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version)
- [Corts Valencianes](https://huggingface.co/datasets/projecte-aina/corts_valencianes_asr_a) (Only the anonymized version of the dataset is public. We trained the model with the non-anonymized version)
- [IB3](https://huggingface.co/datasets/projecte-aina/ib3_ca_asr) (To be published soon)
- [Common Voice ca 17 Benchmark](https://huggingface.co/datasets/projecte-aina/commonvoice_benchmark_catalan_accents)

  
In Spanish:
- [ciempiess light](https://huggingface.co/datasets/ciempiess/ciempiess_light)
- [ciempiess fem](https://huggingface.co/datasets/ciempiess/ciempiess_fem)
- [ciempiess complementary](https://huggingface.co/datasets/ciempiess/ciempiess_complementary)
- [ciempiess balance](https://huggingface.co/datasets/ciempiess/ciempiess_balance)
- [CHM150](https://huggingface.co/datasets/carlosdanielhernandezmena/chm150_asr)
- [Tedx spanish](https://huggingface.co/datasets/ciempiess/tedx_spanish)
- [librivox spanish](https://huggingface.co/datasets/ciempiess/librivox_spanish)
- [Wikipedia spanish](https://huggingface.co/datasets/ciempiess/wikipedia_spanish)
- [voxforge spanish](https://huggingface.co/datasets/ciempiess/voxforge_spanish)
- [Tele con ciencia](https://huggingface.co/datasets/ciempiess/tele_con_ciencia)
- [Argentinian Spanish Speech Dataset](https://openslr.org/61/)
- [Dimex100 light](https://huggingface.co/datasets/carlosdanielhernandezmena/dimex100_light)
- [Glissando Spanish](https://glissando.labfon.uned.es/es)
- [Herico](https://openslr.org/39/)
- [Latino40](https://catalog.ldc.upenn.edu/LDC95S28)
- [Common voice 17 es](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)


In Galician:
- [fleurs-galician](https://huggingface.co/datasets/google/fleurs)
- [google_crowdsourced](http://openslr.magicdatatech.com/77/)
- [Nos_ParlaSpeech-GL (clean)](https://huggingface.co/datasets/proxectonos/Nos_Parlaspeech-GL)
- [Nos_ParlaSpeech-GL (other)](https://huggingface.co/datasets/proxectonos/Nos_Parlaspeech-GL)
- [Nos_TranscriSpeech-GL](https://huggingface.co/datasets/proxectonos/Nos_Transcrispeech-GL)
- [Nos_RG-Podcast-GL](https://huggingface.co/datasets/proxectonos/Nos_RG-Podcast-GL)
- [FalAI (validated split)](https://huggingface.co/datasets/GTM-UVigo/FalAI)
- [Common Voice 22.0](https://huggingface.co/datasets/fsicoli/common_voice_22_0)


In Euskera:
- [composite_corpus_eu_v2.1](https://huggingface.co/datasets/HiTZ/composite_corpus_eu_v2.1)

### Training procedure

This model is the result of finetuning the base model ["Nvidia/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large) by following this [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Transducers_with_HF_Datasets.ipynb).

## Citation
If this model contributes to your research, please cite the work:
```bibtex
@misc{LoS-conformer-transducer-punctuated-BSC-2026,
      title={Languages of Spain ASR Model: stt_los_conformer_transducer_large_punctuated.}, 
      author={Messaoudi, Abir; Solito, Sarah; Hernandez Mena; Carlos, España-Bonet, Cristina},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/stt_los_conformer_transducer_large_punctuated},
      year={2025}
}
```

## Additional Information

### Author

The fine-tuning process was performed during 2025 in the [Language Technologies Laboratory](https://huggingface.co/BSC-LT) of the [Barcelona Supercomputing Center](https://www.bsc.es/) 

For the Catalan Valencian data, we had the collaboration of [CENID](https://cenid.es/) within the framework of the [ILENIA](https://proyectoilenia.es) project.

### Contact
For further information, please send an email to <langtech@bsc.es>.

### Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.

### License

[Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)

### Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

The training of the model was possible thanks to the computing time provided by [Barcelona Supercomputing Center](https://www.bsc.es/) through MareNostrum 5.