Fine-tuned IndoBERT for Indonesian Person and Address Extraction

Model summary

This model is a fine-tuned version of indobenchmark/indobert-base-p1 for token classification on short Indonesian transactional text.

The model predicts two target entity types:

  • PER: person name
  • ADDR: address or address detail

The token-label space is:

  • O
  • B-PER
  • I-PER
  • B-ADDR
  • I-ADDR

Intended use

This model is intended for Indonesian short transactional utterances such as:

  • transfer instructions
  • account-owner replies
  • short recipient-name replies
  • electricity-payment requests
  • short address replies
  • noisy or informal chat-style transaction text

It is intended for research and experimental use in named entity recognition / sensitive entity extraction for:

  • person names
  • address-like spans

Training data

The model was fine-tuned on a synthetic token-classification corpus with:

  • 10,678 unique records in total
  • 8,522 training records
  • 1,048 validation records
  • 1,108 internal synthetic test records

The synthetic generator introduces variation at three levels:

  1. sentence-level variation
  2. entity-level variation
  3. noise-level variation

The corpus includes transfer, electricity-payment, ambiguity, and short-reply cases, including:

  • formal instructions
  • answer-style replies
  • noisy or abbreviated chat forms
  • person/bank ambiguity
  • person/road-name ambiguity
  • abbreviated and full-form addresses
  • block/unit/apartment/ruko-style addresses
  • control cases with no target entities

External benchmark

The model was externally evaluated on a separate frozen reviewed benchmark with:

  • 320 cases
  • 16 categories

This benchmark is distinct from the synthetic train/validation/test split.

Reported overall benchmark results on that reviewed benchmark:

Metric Value
Person Precision 0.9296
Person Recall 0.9429
Person F1 0.9362
Address Precision 0.8730
Address Recall 0.9167
Address F1 0.8943
Full exact match 0.9406

These external benchmark values are the most realistic summary of model behavior on the final reviewed evaluation set, because the 320 benchmark cases are separate from the synthetic fine-tuning corpus.

Internal synthetic metrics

The following values are internal synthetic metrics, computed on the synthetic validation and synthetic test splits derived from the same generated corpus family used for fine-tuning:

Split Overall F1 PER F1 ADDR F1
Validation 1.0000 1.0000 1.0000
Test 1.0000 1.0000 1.0000

These values should therefore be interpreted as internal synthetic-fit results, not as the main external generalization result. For external performance, refer to the 320-case benchmark reported above.

Data and benchmark availability

The fine-tuning dataset splits, the frozen benchmark dataset, and the benchmark comparison results are available in the project repository:

Readers looking for replication materials should refer in particular to:

  • the fine-tuning dataset (training, validation, test)
  • the reviewed benchmark dataset
  • the benchmark comparison outputs across the published models

Label mapping

The model configuration stores the following mapping:

  • 0 -> O
  • 1 -> B-PER
  • 2 -> I-PER
  • 3 -> B-ADDR
  • 4 -> I-ADDR

Example usage

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_id = "ericodh/indobert-id-bankchat-name-address-ner"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple",
)

text = "Transfer ke Asep Nainggolan dan bayarkan listrik rumah yang di Jalan Cideng Timur No. 28."
print(ner(text))

Repository contents

For Hugging Face upload, the core files are:

  • config.json
  • model.safetensors
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • vocab.txt
  • README.md

Optional supporting files already present in the artifact directory:

  • metrics.json
  • trainer_state.json
  • training_args.bin

Limitations

  • The model is specialized for short Indonesian transactional text rather than general long-form NER.
  • It focuses on PER and ADDR, not a broad general-purpose entity taxonomy.
  • Synthetic training data reduces coverage gaps but does not remove all risk of template memorization.
  • Performance is stronger on the intended domain than on unrelated language domains or document styles.
Downloads last month
51
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericodh/indobert-id-bankchat-name-address-ner

Finetuned
(134)
this model