Portuguese Emergency Room Clinical NER

This is a Named Entity Recognition model for European Portuguese clinical text, specifically Emergency Room admission notes. It is fine-tuned from BioBERTpt-all and can extract three types of clinical entities: the patient's usual medications, medication allergies (with polarity), and principal diagnosis.

The model was trained on 300 synthetic Portuguese clinical notes. Full details — dataset, training scripts, evaluation code, and results — are at the companion repository: 👉 https://github.com/LIAAD/ER_NER

⚠️ This model is for research only. It was not validated for clinical use. Do not use it in any patient-facing workflow without proper institutional review and clinical validation.

What it predicts

The model uses BIO tagging with four entity classes:

Label	What it marks	Polarity?
`Medicação Habitual`	Medication name + dosage	No
`Diagnóstico`	Principal diagnosis	No
`Alergias medicamentosas__Positiva`	Confirmed drug allergy	Yes
`Alergias medicamentosas__Negativa`	Denied / no known allergy	Yes

Polarity is encoded directly in the tag — so the model can tell the difference between "alergia à penicilina" (Positiva) and "sem alergias medicamentosas conhecidas" (Negativa) without any extra steps.

How to use

Quickstart — pipeline

The simplest way to get started:

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="liaad/hfpt-biobertpt-all-er_ner",
    aggregation_strategy="simple",
)

text = (
    "Doente com hipertensão arterial. "
    "Medicação habitual: Enalapril 20mg id. "
    "Não refere alergias medicamentosas conhecidas."
)

for entity in ner(text):
    print(
        f"{entity['word']:30s} "
        f"{entity['entity_group']:45s} "
        f"score={entity['score']:.3f}"
    )

Manual inference

If you want more control over the output:

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner", use_fast=True
)
model = AutoModelForTokenClassification.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner"
)
model.eval()

text = (
    "Doente com hipertensão arterial e diabetes mellitus tipo 2. "
    "Medicação habitual: Enalapril 20mg id; Metformina 850mg 2id. "
    "Sem alergias medicamentosas conhecidas."
)

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)

pred_ids = outputs.logits.argmax(dim=-1).squeeze().tolist()
tokens   = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze().tolist())
labels   = [model.config.id2label[i] for i in pred_ids]

for token, label in zip(tokens, labels):
    if label != "O":
        print(f"{token:25s}  {label}")

Long documents (recommended)

Clinical notes are often longer than 512 tokens. The model was trained with a sliding-window approach, so you should also use one at inference time — otherwise spans deep in the document will be missed.

from transformers import AutoTokenizer, AutoModelForTokenClassification
import numpy as np
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner", use_fast=True
)
model = AutoModelForTokenClassification.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner"
)
model.eval()


def predict(text, max_len=512, stride=128):
    # Tokenize the full document to get canonical offsets
    full_enc     = tokenizer(text, return_offsets_mapping=True,
                             add_special_tokens=False)
    full_offsets = full_enc["offset_mapping"]
    n_tokens     = len(full_offsets)
    num_labels   = model.config.num_labels

    # Accumulate logits over overlapping chunks
    agg_logits = np.zeros((n_tokens, num_labels), dtype=np.float64)
    agg_count  = np.zeros(n_tokens, dtype=np.int32)

    enc = tokenizer(
        text,
        return_offsets_mapping=True,
        return_overflowing_tokens=True,
        truncation=True,
        max_length=max_len,
        stride=stride,
        padding=False,
    )

    with torch.no_grad():
        for ch in range(len(enc["input_ids"])):
            ipt = {
                k: torch.tensor(enc[k][ch]).unsqueeze(0)
                for k in enc.keys()
                if k not in ("offset_mapping", "overflow_to_sample_mapping")
            }
            logits = model(**ipt).logits.squeeze(0).numpy()
            ptr = 0
            for i, (s, e) in enumerate(enc["offset_mapping"][ch]):
                if s == 0 and e == 0:
                    continue
                while ptr < n_tokens and full_offsets[ptr] != (s, e):
                    ptr += 1
                if ptr < n_tokens:
                    agg_logits[ptr] += logits[i]
                    agg_count[ptr]  += 1
                    ptr += 1

    # Average logits in overlap regions and decode
    covered = agg_count > 0
    agg_logits[covered] /= agg_count[covered, None]
    pred_ids = agg_logits.argmax(-1).tolist()

    tokens = [text[s:e] for s, e in full_offsets]
    labels = [model.config.id2label[pid] for pid in pred_ids]
    return list(zip(tokens, labels, full_offsets))


results = predict(
    "Doente admitido por insuficiência cardíaca descompensada. "
    "Medicação habitual: Furosemida 40mg id; Bisoprolol 5mg id. "
    "Sem alergias medicamentosas conhecidas."
)

for token, label, (start, end) in results:
    if label != "O":
        print(f"[{start}:{end}]  {token:25s}  {label}")

Performance

Evaluated on 15 physician-authored held-out documents. Diagnóstico F1 reflects highest-confidence selection (one span per document), matching the dataset design.

Exact match:

Entity	P	R	F1	Confidence
Alergias medicamentosas – Negativa	0.75	0.75	0.75	0.98
Alergias medicamentosas – Positiva	0.64	0.88	0.74	0.91
Diagnóstico	0.71	0.56	0.63	0.99
Medicação Habitual	0.88	0.88	0.88	0.97
Macro	0.74	0.76	0.75
Micro	0.83	0.84	0.83

Relaxed IoU ≥ 0.50:

Entity	P	R	F1	Confidence
Alergias medicamentosas – Negativa	0.75	0.75	0.75	0.98
Alergias medicamentosas – Positiva	0.73	1.00	0.84	0.91
Diagnóstico	0.86	0.67	0.75	0.99
Medicação Habitual	0.92	0.92	0.92	0.97
Macro	0.81	0.83	0.82
Micro	0.89	0.90	0.89

Training setup

Parameter	Value
Base model	`pucpr/biobertpt-all`
Architecture	BERT
Training documents	257
Validation documents	28
Test documents	15 (fixed, physician-authored)
Optimiser	AdamW
Learning rate	2 × 10⁻⁵
Effective batch size	16
Max sequence length	512 tokens
Stride	128 tokens
Max epochs	20 (early stopping, patience 3)
Precision	bfloat16
Class imbalance	Inverse-frequency weighted loss + ×4 oversampling for `Alergias__Negativa`

About the dataset

The model was trained on 300 Portuguese ER admission notes — 15 physician-authored and 275 generated by Llama 3.3 using the validated notes as few-shot examples. The notes cover eight medical specialties and were annotated by a linguist with a pharmaceutical background.

Annotations include entity spans plus standard terminology mappings: ICD-10 for diagnoses, ATC for medication allergies, SNOMED CT for usual medications.

Two independent physicians evaluated 60 synthetic notes using a Likert protocol. The notes scored well on medication clarity and allergy identification, with lower scores on diagnosis specificity — consistent with the model's per-class results.

Citation

If you use this model in your work, please cite:

@inproceedings{ernermodels2026,
  title     = {NER Models for Portuguese Emergency Room Notes:
               Extracting Diagnoses, Medication Allergies,
               and Usual Medications},
  author    = {Anonymous},
  booktitle = {Anonymous Submission},
  year      = {2026}
}

Will be updated with the full citation after publication.

License

MIT

Downloads last month: 13

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for liaad/hfpt-biobertpt-all-er_ner

Base model

pucpr/biobertpt-all

Finetuned

(4)

this model

Collection including liaad/hfpt-biobertpt-all-er_ner

health-from-portugal

Collection

6 items • Updated 1 day ago