Fine-tuned IndoBERT for Indonesian Person and Address Extraction

Model summary

This model is a fine-tuned version of indobenchmark/indobert-base-p1 for token classification on short Indonesian transactional text.

The model predicts two target entity types:

PER: person name
ADDR: address or address detail

The token-label space is:

O
B-PER
I-PER
B-ADDR
I-ADDR

Intended use

This model is intended for Indonesian short transactional utterances such as:

transfer instructions
account-owner replies
short recipient-name replies
electricity-payment requests
short address replies
noisy or informal chat-style transaction text

It is intended for research and experimental use in named entity recognition / sensitive entity extraction for:

person names
address-like spans

Training data

The model was fine-tuned on a synthetic token-classification corpus with:

10,678 unique records in total
8,522 training records
1,048 validation records
1,108 internal synthetic test records

The synthetic generator introduces variation at three levels:

sentence-level variation
entity-level variation
noise-level variation

The corpus includes transfer, electricity-payment, ambiguity, and short-reply cases, including:

formal instructions
answer-style replies
noisy or abbreviated chat forms
person/bank ambiguity
person/road-name ambiguity
abbreviated and full-form addresses
block/unit/apartment/ruko-style addresses
control cases with no target entities

External benchmark

The model was externally evaluated on a separate frozen reviewed benchmark with:

320 cases
16 categories

This benchmark is distinct from the synthetic train/validation/test split.

Reported overall benchmark results on that reviewed benchmark:

Metric	Value
Person Precision	0.9296
Person Recall	0.9429
Person F1	0.9362
Address Precision	0.8730
Address Recall	0.9167
Address F1	0.8943
Full exact match	0.9406

These external benchmark values are the most realistic summary of model behavior on the final reviewed evaluation set, because the 320 benchmark cases are separate from the synthetic fine-tuning corpus.

Internal synthetic metrics

The following values are internal synthetic metrics, computed on the synthetic validation and synthetic test splits derived from the same generated corpus family used for fine-tuning:

Split	Overall F1	PER F1	ADDR F1
Validation	1.0000	1.0000	1.0000
Test	1.0000	1.0000	1.0000

These values should therefore be interpreted as internal synthetic-fit results, not as the main external generalization result. For external performance, refer to the 320-case benchmark reported above.

Data and benchmark availability

The fine-tuning dataset splits, the frozen benchmark dataset, and the benchmark comparison results are available in the project repository:

https://github.com/ericodarmawanh/research/tree/master/llm-banking-pii-pipeline

Readers looking for replication materials should refer in particular to:

the fine-tuning dataset (training, validation, test)
the reviewed benchmark dataset
the benchmark comparison outputs across the published models

Label mapping

The model configuration stores the following mapping:

0 -> O
1 -> B-PER
2 -> I-PER
3 -> B-ADDR
4 -> I-ADDR

Example usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "ericodh/indobert-id-bankchat-name-address-ner"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple",
)

text = "Transfer ke Asep Nainggolan dan bayarkan listrik rumah yang di Jalan Cideng Timur No. 28."
print(ner(text))

Repository contents

For Hugging Face upload, the core files are:

config.json
model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.txt
README.md

Optional supporting files already present in the artifact directory:

metrics.json
trainer_state.json
training_args.bin

Limitations

The model is specialized for short Indonesian transactional text rather than general long-form NER.
It focuses on PER and ADDR, not a broad general-purpose entity taxonomy.
Synthetic training data reduces coverage gaps but does not remove all risk of template memorization.
Performance is stronger on the intended domain than on unrelated language domains or document styles.

Downloads last month: 51

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for ericodh/indobert-id-bankchat-name-address-ner

Base model

indobenchmark/indobert-base-p1

Finetuned

(134)

this model