File size: 7,139 Bytes
872de73
2b60572
 
 
872de73
 
 
 
 
 
 
2b60572
 
438af14
 
 
 
 
 
 
 
 
 
 
 
 
2b60572
 
872de73
2b60572
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
872de73
 
2b60572
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
872de73
 
 
2b60572
 
 
 
872de73
2b60572
 
 
 
872de73
2b60572
872de73
2b60572
872de73
2b60572
 
 
 
 
 
 
 
872de73
 
2b60572
872de73
2b60572
438af14
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
base_model:
- unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit
- NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit
- lora
- transformers
- unsloth
license: apache-2.0
datasets:
- ahmedheakl/arocrbench_synthesizear
- ahmedheakl/arocrbench_patsocr
- ahmedheakl/arocrbench_historyar
- ahmedheakl/arocrbench_historicalbooks
- ahmedheakl/arocrbench_khattparagraph
- ahmedheakl/arocrbench_adab
- ahmedheakl/arocrbench_muharaf
- ahmedheakl/arocrbench_onlinekhatt
- ahmedheakl/arocrbench_khatt
- ahmedheakl/arocrbench_isippt
- ahmedheakl/arocrbench_arabicocr
- ahmedheakl/arocrbench_hindawi
- ahmedheakl/arocrbench_evarest
metrics:
- wer
---
# Qari-OCR-Fine-Tuned-Kitab-Benchmark

## Model Description

This model is a LoRA fine-tuned version of [NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct](https://huggingface.co/NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct) specifically optimized for Arabic OCR tasks using the comprehensive KITAB-Bench dataset.


### Model Details

- **Base Model:** NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct
- **Model Type:** Vision-Language Model with LoRA fine-tuning
- **Language:** Arabic (primary), with multilingual capabilities
- **License:** [Specify license]
- **Fine-tuned for:** Arabic Optical Character Recognition (OCR)

### Training Configuration

- **Training Method:** LoRA (Low-Rank Adaptation)
- **LoRA Parameters:**
  - Rank (r): 16
  - Alpha: 32
  - Dropout: 0.05
  - Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
- **Training Epochs:** 5
- **Batch Size:** 4 per device
- **Learning Rate:** 2e-4
- **Optimizer:** AdamW 8-bit
- **Max Sequence Length:** 2048

## Dataset

The model was trained on a curated subset of Arabic OCR datasets comprising **3,760 total samples** from **13 domain-specific datasets**:

### Training Data Composition
**Total Combined Dataset:** 3,760 samples
- **Training Set:** 3,572 samples (95% of total)
- **Held-out Test Set:** 188 samples (5% of total)

### Source Datasets Used:
- **ahmedheakl/arocrbench_synthesizear:** 500 samples
- **ahmedheakl/arocrbench_patsocr:** 500 samples  
- **ahmedheakl/arocrbench_historyar:** 200 samples
- **ahmedheakl/arocrbench_historicalbooks:** 10 samples
- **ahmedheakl/arocrbench_khattparagraph:** 200 samples
- **ahmedheakl/arocrbench_adab:** 200 samples
- **ahmedheakl/arocrbench_muharaf:** 200 samples
- **ahmedheakl/arocrbench_onlinekhatt:** 200 samples
- **ahmedheakl/arocrbench_khatt:** 200 samples
- **ahmedheakl/arocrbench_isippt:** 500 samples
- **ahmedheakl/arocrbench_arabicocr:** 50 samples
- **ahmedheakl/arocrbench_hindawi:** 200 samples
- **ahmedheakl/arocrbench_evarest:** 800 samples

### Data Split
- 95% training (3,572 samples) 
- 5% held-out test (188 samples) for final evaluation

### Domain Coverage
- **Handwritten Text:** Historical manuscripts, personal notes, traditional calligraphy
- **Printed Text:** Books, newspapers, academic papers, legal documents
- **Scene Text:** Street signs, advertisements, natural environments
- **Structured Documents:** Tables, forms, layouts
- **Historical Documents:** Ancient texts, heritage manuscripts
- **Synthetic Data:** Generated text for augmentation

## Performance

### Evaluation Results on Held-Out Test Set

| Metric | Score |
|--------|-------|
| **Word Error Rate (WER)** | 0.4388 |
| **Character Error Rate (CER)** | 0.2231 |
| **BLEU Score** | 48.12 |

 
## Intended Use

### Primary Use Cases
- Arabic document digitization
- Historical manuscript transcription
- Multi-domain Arabic text recognition
- RAG (Retrieval-Augmented Generation) document processing pipelines
- Academic research in Arabic NLP and OCR

### Direct Use
```python
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image

# Load model and processor
processor = AutoProcessor.from_pretrained("FatimahEmadEldin/Qari-OCR-Fine-Tuned-Kitab-Benchmark")
model = AutoModelForVision2Seq.from_pretrained("FatimahEmadEldin/Qari-OCR-Fine-Tuned-Kitab-Benchmark")

# Process image
image = Image.open("arabic_document.jpg")
prompt = "Below is the image of one page of a document. Please provide the plain text representation of this document as if you were reading it naturally, ensuring high accuracy."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": prompt},
        ],
    }
]

text_prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text_prompt, images=image, return_tensors="pt")

# Generate
generated_ids = model.generate(**inputs, max_new_tokens=2048)
generated_ids = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## Limitations and Considerations

### Known Limitations
- **Complex Fonts:** Performance may vary with highly stylized or decorative Arabic fonts
- **Numeral Recognition:** Some challenges with mixed Arabic-Indic numeral systems
- **Word Elongation:** Handling of kashida (Arabic text elongation) requires improvement
- **PDF-to-Markdown:** Limited accuracy (best models achieve ~65% on complex layouts)

### Bias and Fairness
- Trained primarily on Modern Standard Arabic; dialectical variations may have reduced accuracy
- Historical document performance depends on manuscript quality and preservation state
- Geographic bias toward Gulf and Levantine Arabic text styles

## Technical Specifications

### Hardware Requirements
- **Minimum:** 8GB GPU memory for inference
- **Recommended:** 16GB+ GPU memory for optimal performance
- **Training:** Conducted on NVIDIA A100 GPUs

### Software Dependencies
- transformers >= 4.51.3
- torch >= 2.4.0
- unsloth (for efficient training)
- Pillow for image processing

## Training Details

### Training Infrastructure
- **Framework:** Unsloth for efficient LoRA training
- **Quantization:** 4-bit quantization for memory efficiency
- **Mixed Precision:** BF16/FP16 based on hardware support

### Data Processing
- Images processed at various resolutions maintaining aspect ratios
- Text preprocessing includes normalization of Arabic diacritics
- Synthetic data generation pipeline for augmentation

## Citation

If you use this model, cite the original dataset:

```bibtex
@article{heakl2025kitab,
  title={KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding},
  author={Heakl, Ahmed and Sohail, Abdullah and Ranjan, Mukul and Hossam, Rania and Ahmad, Ghazi and El-Geish, Mohamed and Maher, Omar and Shen, Zhiqiang and Khan, Fahad and Khan, Salman},
  journal={arXiv preprint arXiv:2502.14949},
  year={2025}
}
```


## Related Models

- **Base Model:** [NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct](https://huggingface.co/NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct)
- **KITAB-Bench Collection:** [ahmedheakl/kitab-bench](https://huggingface.co/collections/ahmedheakl/kitab-bench-677dd5d88d5db344d5595b78)