sosa-pii-ner-ae-v1.0.0

Regional PII named entity recognition model for UAE documents. Part of the SOSA DevOps Privacy Filter β€” a local-first, privacy-preserving AI runtime for developers. Weights are Apache 2.0. No cloud required.

πŸ”— Source code: SOSA DevOps on GitHub 🌐 Product: sovereignsystems.cc


Model summary

Fine-tuned from urchade/gliner_large-v2.1 on synthetic UAE PII data. Detects Emirates IDs, Tax Registration Numbers (TRN), Tax Identification Numbers (TIN), UAE mobile and workplace phone numbers in Arabic and Arabic-English mixed documents.

Intended use: Local PII detection within the SOSA DevOps Privacy Filter sidecar. Text never leaves the user's machine.


Labels

Label Description Format Validator
ae_emirates_id UAE Emirates ID (Ψ±Ω‚Ω… Ψ§Ω„Ω‡ΩˆΩŠΨ©) 784-YYYY-NNNNNNN-C (15 digits) Prefix 784 + birth year
ae_trn Tax Registration Number 15 digits Length validation
ae_tin Tax Identification Number 10 digits Length validation
ae_phone_mobile UAE mobile phone 05X-XXX-XXXX (10 digits) Operator prefix (050/052/054/055/056/058)
ae_phone_office UAE office/fixed phone 0X-XXX-XXXX (9 digits) Area code validation

Global labels also carried (defence-in-depth): email, phone_e164, credit_card, passport_generic, ipv4_public


Evaluation β€” v1.0.0 gate results

Label F1 Gate
ae_emirates_id 0.9425 β‰₯ 0.85 βœ…
ae_trn 0.8421 β‰₯ 0.80 βœ…

First-run gate pass (T1). Training: D-AE-1 dataset, 10,000 steps, A40 GPU, 2026-05-28/29.


Limitations

  • TRN/TIN numeric collision: Both labels are long numeric sequences. Context (VAT/TRN vs TIN/e-invoicing keywords) disambiguates.
  • Arabic script context: Trained on Arabic and English keyword contexts. Pure transliterated-only contexts may underperform.
  • Context-gated: Bare values without surrounding context are unreliable.

Training data

Synthetic UAE PII examples only. No real resident or employee data used. Arabic and English context templates included.


Integrity

pytorch_model.bin SHA-256:

c7ef875f4563519e3aab189aec244a2e444bb231e230404f68f6004ed5567be6

License

Apache 2.0 β€” inherited from urchade/gliner_large-v2.1. Fine-tuned by Sovereign Systems. See LICENSE.

Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SovereignSystems-cc/sosa-pii-ner-ae-v1.0.0

Finetuned
(8)
this model