Instructions to use SovereignSystems-cc/sosa-pii-ner-ae-v1.0.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use SovereignSystems-cc/sosa-pii-ner-ae-v1.0.0 with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("SovereignSystems-cc/sosa-pii-ner-ae-v1.0.0") - Notebooks
- Google Colab
- Kaggle
sosa-pii-ner-ae-v1.0.0
Regional PII named entity recognition model for UAE documents. Part of the SOSA DevOps Privacy Filter β a local-first, privacy-preserving AI runtime for developers. Weights are Apache 2.0. No cloud required.
π Source code: SOSA DevOps on GitHub π Product: sovereignsystems.cc
Model summary
Fine-tuned from urchade/gliner_large-v2.1 on synthetic UAE PII data. Detects
Emirates IDs, Tax Registration Numbers (TRN), Tax Identification Numbers (TIN),
UAE mobile and workplace phone numbers in Arabic and Arabic-English mixed documents.
Intended use: Local PII detection within the SOSA DevOps Privacy Filter sidecar. Text never leaves the user's machine.
Labels
| Label | Description | Format | Validator |
|---|---|---|---|
ae_emirates_id |
UAE Emirates ID (Ψ±ΩΩ Ψ§ΩΩΩΩΨ©) | 784-YYYY-NNNNNNN-C (15 digits) | Prefix 784 + birth year |
ae_trn |
Tax Registration Number | 15 digits | Length validation |
ae_tin |
Tax Identification Number | 10 digits | Length validation |
ae_phone_mobile |
UAE mobile phone | 05X-XXX-XXXX (10 digits) | Operator prefix (050/052/054/055/056/058) |
ae_phone_office |
UAE office/fixed phone | 0X-XXX-XXXX (9 digits) | Area code validation |
Global labels also carried (defence-in-depth):
email, phone_e164, credit_card, passport_generic, ipv4_public
Evaluation β v1.0.0 gate results
| Label | F1 | Gate |
|---|---|---|
ae_emirates_id |
0.9425 | β₯ 0.85 β |
ae_trn |
0.8421 | β₯ 0.80 β |
First-run gate pass (T1). Training: D-AE-1 dataset, 10,000 steps, A40 GPU, 2026-05-28/29.
Limitations
- TRN/TIN numeric collision: Both labels are long numeric sequences. Context (VAT/TRN vs TIN/e-invoicing keywords) disambiguates.
- Arabic script context: Trained on Arabic and English keyword contexts. Pure transliterated-only contexts may underperform.
- Context-gated: Bare values without surrounding context are unreliable.
Training data
Synthetic UAE PII examples only. No real resident or employee data used. Arabic and English context templates included.
Integrity
pytorch_model.bin SHA-256:
c7ef875f4563519e3aab189aec244a2e444bb231e230404f68f6004ed5567be6
License
Apache 2.0 β inherited from urchade/gliner_large-v2.1.
Fine-tuned by Sovereign Systems.
See LICENSE.
- Downloads last month
- 67
Model tree for SovereignSystems-cc/sosa-pii-ner-ae-v1.0.0
Base model
urchade/gliner_large-v2.1