UMUTeam/MarIA-emotion-es

Model description

UMUTeam/MarIA-emotion-es is a Spanish text-based emotion recognition model developed as part of speech-emotion, an open-source multilingual and multimodal toolkit for emotion recognition from speech, text, and multimodal inputs.

This model performs emotion classification from Spanish text.

The model is based on the MarIA Spanish Transformer language model and was fine-tuned for emotion classification tasks in Spanish.

It is designed to be used either as a standalone text-only classifier or as part of the broader speech-emotion framework, where textual representations can be combined with acoustic representations for multimodal emotion recognition.

The model predicts one of the following emotion labels:

  • anger
  • disgust
  • fear
  • joy
  • neutral
  • sadness

Intended use

This model is intended for research and applied scenarios involving Spanish emotion recognition from text, such as:

  • emotion analysis in transcribed speech
  • conversational analysis
  • affective computing research
  • human-computer interaction
  • educational or exploratory emotion analysis tools
  • integration into multimodal speech emotion recognition pipelines

It can be used directly with the Hugging Face transformers library or through the speech-emotion toolkit.

Out-of-scope use

This model should not be used as the sole basis for high-stakes decisions, including but not limited to:

  • clinical diagnosis
  • mental health assessment
  • employment, legal, or educational decisions
  • biometric profiling or surveillance
  • automated decisions affecting individuals without human oversight

Emotion recognition is inherently uncertain and context-dependent. Predictions should be interpreted as model estimates, not as definitive assessments of a person's emotional state.

Training data

The model was trained on the Spanish portion of the datasets used in the speech-emotion project, primarily based on the Spanish MEACorpus 2023 dataset.

Spanish MEACorpus 2023 is a multimodal speech-text emotion corpus for Spanish emotion analysis collected from natural environments. The dataset contains aligned speech and textual information for emotion recognition tasks.

The emotion labels were harmonized into the following six-class taxonomy:

  • anger
  • disgust
  • fear
  • joy
  • neutral
  • sadness

For the Spanish text-based emotion recognition setup:

  • Training samples: 3,692
  • Validation samples: 410
  • Test samples: 1,027

More details about the dataset and preprocessing pipeline are available in the project repository:

https://github.com/NLP-UMUTeam/umuteam-speech-emotion

Evaluation

The model was evaluated on the Spanish held-out test set used in the speech-emotion toolkit.

Performance comparison on Spanish emotion recognition

Configuration Accuracy Weighted Precision Weighted F1 Macro F1
Speech-only 88.1207 88.3244 88.1357 84.4829
Text-only 77.0204 77.0449 76.8367 69.3886
Multimodal (Concat) 90.0682 90.2048 90.0642 87.7455
Multimodal (Mean) 88.5102 88.6163 88.5011 84.1653
Multimodal (Multihead) 82.6680 82.3820 82.4600 75.5606

These results show that text-only emotion recognition is effective for Spanish emotion analysis, although multimodal approaches combining acoustic and linguistic representations achieve higher overall performance.

How to use

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="UMUTeam/MarIA-emotion-es",
    top_k=None
)

text = "Estoy muy feliz de verte de nuevo."

predictions = classifier(text)
print(predictions)

You can also use this model through the speech-emotion toolkit:

pip install speech-emotion
from speech_emotion import predict_emotion

emotion = predict_emotion(
    text="Estoy muy feliz de verte de nuevo.",
    language="es",
    mode="text",
    model_config_path="model.json"
)

print("Detected emotion:", emotion)

Repository: https://github.com/NLP-UMUTeam/umuteam-speech-emotion

Limitations

  • The model is designed for Spanish text and may not perform reliably on other languages.
  • It predicts a single label from a fixed set of six emotions.
  • Emotion expression is subjective and highly context-dependent.
  • Text-only emotion recognition may miss relevant acoustic or visual cues such as tone of voice, pauses, intensity, facial expressions, or interaction context.
  • Performance may decrease on noisy transcriptions, informal language, code-switching, domain-specific language, or texts that differ substantially from the training data.

Bias and ethical considerations

Emotion recognition systems may reflect biases present in their training data, including differences related to language variety, register, demographics, topic, or annotation subjectivity.

Users should avoid interpreting predictions as objective truths about a person's internal emotional state. The model should be used with transparency, appropriate consent, and human oversight, especially in sensitive contexts.

Citation

If you use this model in your research, please cite the following works:

speech-emotion toolkit

@article{PAN2026102677,
title = {speech-emotion: A multilingual and multimodal toolkit for emotion recognition from speech},
journal = {SoftwareX},
volume = {34},
pages = {102677},
year = {2026},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2026.102677},
url = {https://www.sciencedirect.com/science/article/pii/S235271102600169X},
author = {Ronghao Pan and Tomás Bernal-Beltrán and José Antonio García-Díaz and Rafael Valencia-García},
}

Spanish MEACorpus 2023

@article{PAN2024103856,
title = {Spanish MEACorpus 2023: A multimodal speech–text corpus for emotion analysis in Spanish from natural environments},
journal = {Computer Standards & Interfaces},
volume = {90},
pages = {103856},
year = {2024},
issn = {0920-5489},
doi = {https://doi.org/10.1016/j.csi.2024.103856},
url = {https://www.sciencedirect.com/science/article/pii/S0920548924000254},
author = {Ronghao Pan and José Antonio García-Díaz and Miguel Ángel Rodríguez-García and Rafael Valencia-García},
}

Acknowledgments

This work is part of the research project LaTe4PoliticES (PID2022-138099OB-I00), funded by MICIU/AEI/10.13039/501100011033 and the European Regional Development Fund (ERDF/EU - FEDER/UE), “A way of making Europe”.

Mr. Tomás Bernal-Beltrán is supported by the University of Murcia through the predoctoral programme.

Downloads last month
83
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results