Portuguese Emergency Room Clinical NER

This is a Named Entity Recognition model for European Portuguese clinical text, specifically Emergency Room admission notes. It is fine-tuned from BioBERTpt-all and can extract three types of clinical entities: the patient's usual medications, medication allergies (with polarity), and principal diagnosis.

The model was trained on 300 synthetic Portuguese clinical notes. Full details — dataset, training scripts, evaluation code, and results — are at the companion repository: 👉 https://github.com/LIAAD/ER_NER

⚠️ This model is for research only. It was not validated for clinical use. Do not use it in any patient-facing workflow without proper institutional review and clinical validation.


What it predicts

The model uses BIO tagging with four entity classes:

Label What it marks Polarity?
Medicação Habitual Medication name + dosage No
Diagnóstico Principal diagnosis No
Alergias medicamentosas__Positiva Confirmed drug allergy Yes
Alergias medicamentosas__Negativa Denied / no known allergy Yes

Polarity is encoded directly in the tag — so the model can tell the difference between "alergia à penicilina" (Positiva) and "sem alergias medicamentosas conhecidas" (Negativa) without any extra steps.


How to use

Quickstart — pipeline

The simplest way to get started:

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="liaad/hfpt-biobertpt-all-er_ner",
    aggregation_strategy="simple",
)

text = (
    "Doente com hipertensão arterial. "
    "Medicação habitual: Enalapril 20mg id. "
    "Não refere alergias medicamentosas conhecidas."
)

for entity in ner(text):
    print(
        f"{entity['word']:30s} "
        f"{entity['entity_group']:45s} "
        f"score={entity['score']:.3f}"
    )

Manual inference

If you want more control over the output:

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner", use_fast=True
)
model = AutoModelForTokenClassification.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner"
)
model.eval()

text = (
    "Doente com hipertensão arterial e diabetes mellitus tipo 2. "
    "Medicação habitual: Enalapril 20mg id; Metformina 850mg 2id. "
    "Sem alergias medicamentosas conhecidas."
)

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)

pred_ids = outputs.logits.argmax(dim=-1).squeeze().tolist()
tokens   = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze().tolist())
labels   = [model.config.id2label[i] for i in pred_ids]

for token, label in zip(tokens, labels):
    if label != "O":
        print(f"{token:25s}  {label}")

Long documents (recommended)

Clinical notes are often longer than 512 tokens. The model was trained with a sliding-window approach, so you should also use one at inference time — otherwise spans deep in the document will be missed.

from transformers import AutoTokenizer, AutoModelForTokenClassification
import numpy as np
import torch

tokenizer = AutoTokenizer.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner", use_fast=True
)
model = AutoModelForTokenClassification.from_pretrained(
    "liaad/hfpt-biobertpt-all-er_ner"
)
model.eval()


def predict(text, max_len=512, stride=128):
    # Tokenize the full document to get canonical offsets
    full_enc     = tokenizer(text, return_offsets_mapping=True,
                             add_special_tokens=False)
    full_offsets = full_enc["offset_mapping"]
    n_tokens     = len(full_offsets)
    num_labels   = model.config.num_labels

    # Accumulate logits over overlapping chunks
    agg_logits = np.zeros((n_tokens, num_labels), dtype=np.float64)
    agg_count  = np.zeros(n_tokens, dtype=np.int32)

    enc = tokenizer(
        text,
        return_offsets_mapping=True,
        return_overflowing_tokens=True,
        truncation=True,
        max_length=max_len,
        stride=stride,
        padding=False,
    )

    with torch.no_grad():
        for ch in range(len(enc["input_ids"])):
            ipt = {
                k: torch.tensor(enc[k][ch]).unsqueeze(0)
                for k in enc.keys()
                if k not in ("offset_mapping", "overflow_to_sample_mapping")
            }
            logits = model(**ipt).logits.squeeze(0).numpy()
            ptr = 0
            for i, (s, e) in enumerate(enc["offset_mapping"][ch]):
                if s == 0 and e == 0:
                    continue
                while ptr < n_tokens and full_offsets[ptr] != (s, e):
                    ptr += 1
                if ptr < n_tokens:
                    agg_logits[ptr] += logits[i]
                    agg_count[ptr]  += 1
                    ptr += 1

    # Average logits in overlap regions and decode
    covered = agg_count > 0
    agg_logits[covered] /= agg_count[covered, None]
    pred_ids = agg_logits.argmax(-1).tolist()

    tokens = [text[s:e] for s, e in full_offsets]
    labels = [model.config.id2label[pid] for pid in pred_ids]
    return list(zip(tokens, labels, full_offsets))


results = predict(
    "Doente admitido por insuficiência cardíaca descompensada. "
    "Medicação habitual: Furosemida 40mg id; Bisoprolol 5mg id. "
    "Sem alergias medicamentosas conhecidas."
)

for token, label, (start, end) in results:
    if label != "O":
        print(f"[{start}:{end}]  {token:25s}  {label}")

Performance

Evaluated on 15 physician-authored held-out documents. Diagnóstico F1 reflects highest-confidence selection (one span per document), matching the dataset design.

Exact match:

Entity P R F1 Confidence
Alergias medicamentosas – Negativa 0.75 0.75 0.75 0.98
Alergias medicamentosas – Positiva 0.64 0.88 0.74 0.91
Diagnóstico 0.71 0.56 0.63 0.99
Medicação Habitual 0.88 0.88 0.88 0.97
Macro 0.74 0.76 0.75
Micro 0.83 0.84 0.83

Relaxed IoU ≥ 0.50:

Entity P R F1 Confidence
Alergias medicamentosas – Negativa 0.75 0.75 0.75 0.98
Alergias medicamentosas – Positiva 0.73 1.00 0.84 0.91
Diagnóstico 0.86 0.67 0.75 0.99
Medicação Habitual 0.92 0.92 0.92 0.97
Macro 0.81 0.83 0.82
Micro 0.89 0.90 0.89

Training setup

Parameter Value
Base model pucpr/biobertpt-all
Architecture BERT
Training documents 257
Validation documents 28
Test documents 15 (fixed, physician-authored)
Optimiser AdamW
Learning rate 2 × 10⁻⁵
Effective batch size 16
Max sequence length 512 tokens
Stride 128 tokens
Max epochs 20 (early stopping, patience 3)
Precision bfloat16
Class imbalance Inverse-frequency weighted loss + ×4 oversampling for Alergias__Negativa

About the dataset

The model was trained on 300 Portuguese ER admission notes — 15 physician-authored and 275 generated by Llama 3.3 using the validated notes as few-shot examples. The notes cover eight medical specialties and were annotated by a linguist with a pharmaceutical background.

Annotations include entity spans plus standard terminology mappings: ICD-10 for diagnoses, ATC for medication allergies, SNOMED CT for usual medications.

Two independent physicians evaluated 60 synthetic notes using a Likert protocol. The notes scored well on medication clarity and allergy identification, with lower scores on diagnosis specificity — consistent with the model's per-class results.


Citation

If you use this model in your work, please cite:

@inproceedings{ernermodels2026,
  title     = {NER Models for Portuguese Emergency Room Notes:
               Extracting Diagnoses, Medication Allergies,
               and Usual Medications},
  author    = {Anonymous},
  booktitle = {Anonymous Submission},
  year      = {2026}
}

Will be updated with the full citation after publication.


License

MIT

Downloads last month
13
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for liaad/hfpt-biobertpt-all-er_ner

Finetuned
(4)
this model

Collection including liaad/hfpt-biobertpt-all-er_ner