Portuguese Emergency Room Clinical NER
This is a Named Entity Recognition model for European Portuguese clinical text, specifically Emergency Room admission notes. It is fine-tuned from MedAlBERTina PT-PT 900M and can extract three types of clinical entities: the patient's usual medications, medication allergies (with polarity), and principal diagnosis.
The model was trained on 300 synthetic Portuguese clinical notes. Full details — dataset, training scripts, evaluation code, and results — are at the companion repository: 👉 https://github.com/LIAAD/ER_NER
⚠️ This model is for research only. It was not validated for clinical use. Do not use it in any patient-facing workflow without proper institutional review and clinical validation.
What it predicts
The model uses BIO tagging with four entity classes:
| Label | What it marks | Polarity? |
|---|---|---|
Medicação Habitual |
Medication name + dosage | No |
Diagnóstico |
Principal diagnosis | No |
Alergias medicamentosas__Positiva |
Confirmed drug allergy | Yes |
Alergias medicamentosas__Negativa |
Denied / no known allergy | Yes |
Polarity is encoded directly in the tag — so the model can tell the difference between "alergia à penicilina" (Positiva) and "sem alergias medicamentosas conhecidas" (Negativa) without any extra steps.
How to use
Quickstart — pipeline
The simplest way to get started:
from transformers import pipeline
ner = pipeline(
"token-classification",
model="liaad/hfpt-medialbertina-er_ner",
aggregation_strategy="simple",
)
text = (
"Doente com hipertensão arterial. "
"Medicação habitual: Enalapril 20mg id. "
"Não refere alergias medicamentosas conhecidas."
)
for entity in ner(text):
print(
f"{entity['word']:30s} "
f"{entity['entity_group']:45s} "
f"score={entity['score']:.3f}"
)
Manual inference
If you want more control over the output:
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
tokenizer = AutoTokenizer.from_pretrained(
"liaad/hfpt-medialbertina-er_ner", use_fast=True
)
model = AutoModelForTokenClassification.from_pretrained(
"liaad/hfpt-medialbertina-er_ner"
)
model.eval()
text = (
"Doente com hipertensão arterial e diabetes mellitus tipo 2. "
"Medicação habitual: Enalapril 20mg id; Metformina 850mg 2id. "
"Sem alergias medicamentosas conhecidas."
)
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
pred_ids = outputs.logits.argmax(dim=-1).squeeze().tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze().tolist())
labels = [model.config.id2label[i] for i in pred_ids]
for token, label in zip(tokens, labels):
if label != "O":
print(f"{token:25s} {label}")
Long documents (recommended)
Clinical notes are often longer than 512 tokens. The model was trained with a sliding-window approach, so you should also use one at inference time — otherwise spans deep in the document will be missed.
from transformers import AutoTokenizer, AutoModelForTokenClassification
import numpy as np
import torch
tokenizer = AutoTokenizer.from_pretrained(
"liaad/hfpt-medialbertina-er_ner", use_fast=True
)
model = AutoModelForTokenClassification.from_pretrained(
"liaad/hfpt-medialbertina-er_ner"
)
model.eval()
def predict(text, max_len=512, stride=128):
# Tokenize the full document to get canonical offsets
full_enc = tokenizer(text, return_offsets_mapping=True,
add_special_tokens=False)
full_offsets = full_enc["offset_mapping"]
n_tokens = len(full_offsets)
num_labels = model.config.num_labels
# Accumulate logits over overlapping chunks
agg_logits = np.zeros((n_tokens, num_labels), dtype=np.float64)
agg_count = np.zeros(n_tokens, dtype=np.int32)
enc = tokenizer(
text,
return_offsets_mapping=True,
return_overflowing_tokens=True,
truncation=True,
max_length=max_len,
stride=stride,
padding=False,
)
with torch.no_grad():
for ch in range(len(enc["input_ids"])):
ipt = {
k: torch.tensor(enc[k][ch]).unsqueeze(0)
for k in enc.keys()
if k not in ("offset_mapping", "overflow_to_sample_mapping")
}
logits = model(**ipt).logits.squeeze(0).numpy()
ptr = 0
for i, (s, e) in enumerate(enc["offset_mapping"][ch]):
if s == 0 and e == 0:
continue
while ptr < n_tokens and full_offsets[ptr] != (s, e):
ptr += 1
if ptr < n_tokens:
agg_logits[ptr] += logits[i]
agg_count[ptr] += 1
ptr += 1
# Average logits in overlap regions and decode
covered = agg_count > 0
agg_logits[covered] /= agg_count[covered, None]
pred_ids = agg_logits.argmax(-1).tolist()
tokens = [text[s:e] for s, e in full_offsets]
labels = [model.config.id2label[pid] for pid in pred_ids]
return list(zip(tokens, labels, full_offsets))
results = predict(
"Doente admitido por insuficiência cardíaca descompensada. "
"Medicação habitual: Furosemida 40mg id; Bisoprolol 5mg id. "
"Sem alergias medicamentosas conhecidas."
)
for token, label, (start, end) in results:
if label != "O":
print(f"[{start}:{end}] {token:25s} {label}")
Performance
Evaluated on 15 physician-authored held-out documents.
Diagnóstico F1 reflects highest-confidence selection
(one span per document), matching the dataset design.
Exact match:
| Entity | P | R | F1 | Confidence |
|---|---|---|---|---|
| Alergias medicamentosas – Negativa | 0.75 | 0.75 | 0.75 | 0.97 |
| Alergias medicamentosas – Positiva | 0.88 | 0.88 | 0.88 | 0.99 |
| Diagnóstico | 0.30 | 0.33 | 0.32 | 0.95 |
| Medicação Habitual | 0.87 | 0.89 | 0.88 | 0.97 |
| Macro | 0.70 | 0.71 | 0.70 | |
| Micro | 0.80 | 0.83 | 0.81 |
Relaxed IoU ≥ 0.50:
| Entity | P | R | F1 | Confidence |
|---|---|---|---|---|
| Alergias medicamentosas – Negativa | 0.75 | 0.75 | 0.75 | 0.97 |
| Alergias medicamentosas – Positiva | 1.00 | 1.00 | 1.00 | 0.99 |
| Diagnóstico | 0.50 | 0.56 | 0.53 | 0.95 |
| Medicação Habitual | 0.90 | 0.92 | 0.91 | 0.97 |
| Macro | 0.79 | 0.81 | 0.80 | |
| Micro | 0.85 | 0.88 | 0.87 |
Training setup
| Parameter | Value |
|---|---|
| Base model | portugueseNLP/medialbertina_pt-pt_900m |
| Architecture | DeBERTa-v2 |
| Training documents | 257 |
| Validation documents | 28 |
| Test documents | 15 (fixed, physician-authored) |
| Optimiser | AdamW |
| Learning rate | 2 × 10⁻⁵ |
| Effective batch size | 16 |
| Max sequence length | 512 tokens |
| Stride | 128 tokens |
| Max epochs | 20 (early stopping, patience 3) |
| Precision | bfloat16 |
| Class imbalance | Inverse-frequency weighted loss + ×4 oversampling for Alergias__Negativa |
About the dataset
The model was trained on 300 Portuguese ER admission notes — 15 physician-authored and 275 generated by Llama 3.3 using the validated notes as few-shot examples. The notes cover eight medical specialties and were annotated by a linguist with a pharmaceutical background.
Annotations include entity spans plus standard terminology mappings: ICD-10 for diagnoses, ATC for medication allergies, SNOMED CT for usual medications.
Two independent physicians evaluated 60 synthetic notes using a Likert protocol. The notes scored well on medication clarity and allergy identification, with lower scores on diagnosis specificity — consistent with the model's per-class results.
Citation
If you use this model in your work, please cite:
@inproceedings{ernermodels2026,
title = {NER Models for Portuguese Emergency Room Notes:
Extracting Diagnoses, Medication Allergies,
and Usual Medications},
author = {Anonymous},
booktitle = {Anonymous Submission},
year = {2026}
}
Will be updated with the full citation after publication.
License
MIT
- Downloads last month
- 13
Model tree for liaad/hfpt-medialbertina-er_ner
Base model
portugueseNLP/medialbertina_pt-pt_900m