Commit
55858ff
·
verified ·
1 Parent(s): 502350e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +250 -3
README.md CHANGED
@@ -1,3 +1,250 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ar
4
+ license: apache-2.0
5
+ base_model: unsloth/functiongemma-270m-it
6
+ tags:
7
+ - function-calling
8
+ - arabic
9
+ - tool-use
10
+ - agentic
11
+ - gemma
12
+ - fine-tuned
13
+ datasets:
14
+ - AISA-Framework/AISA-AR-FunctionCall
15
+ pipeline_tag: text-generation
16
+ library_name: transformers
17
+ ---
18
+
19
+
20
+ # AISA-AR-FunctionCall-FT (Quantized Version 4 bit)
21
+
22
+ <p align="center">
23
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/vnL90Tybn1528x21dMNsd.png" width="700"/>
24
+ </p>
25
+
26
+ **Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning**
27
+
28
+ `AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems.
29
+
30
+ The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.
31
+
32
+ > This model is part of the **AISA** (Agentic AI Systems Architecture) initiative.
33
+
34
+
35
+ ## Try the Model in Google Colab
36
+
37
+ You can run a full inference example using the notebook below.
38
+
39
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing)
40
+
41
+ The notebook demonstrates:
42
+
43
+ - Loading the model
44
+ - Defining tool schemas
45
+ - Generating structured tool calls
46
+ - Parsing function call outputs
47
+
48
+ ---
49
+
50
+ ## Model Overview
51
+
52
+ | Field | Value |
53
+ |---|---|
54
+ | **Model name** | AISA-AR-FunctionCall-FT |
55
+ | **Base model** | unsloth/functiongemma-270m-it |
56
+ | **Architecture** | Gemma 3 (270M parameters) |
57
+ | **Fine-tuning type** | Full-parameter supervised fine-tuning |
58
+ | **Primary task** | Arabic function calling / tool invocation |
59
+
60
+ The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.
61
+
62
+ ---
63
+
64
+ ## Key Capabilities
65
+
66
+ - Arabic natural language → structured API calls
67
+ - Multi-dialect Arabic understanding
68
+ - Tool selection and argument extraction
69
+ - Structured execution environments
70
+
71
+ **Supported domains:**
72
+
73
+ | Domain |
74
+ |---|
75
+ | Travel |
76
+ | Utilities |
77
+ | Islamic services |
78
+ | Weather |
79
+ | Healthcare |
80
+ | Banking & finance |
81
+ | E-commerce |
82
+ | Government services |
83
+
84
+ ---
85
+
86
+ ## Dataset
87
+
88
+ The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:
89
+
90
+ - Dataset auditing
91
+ - Schema normalization
92
+ - Enum correction
93
+ - Tool pruning
94
+ - Prompt restructuring
95
+ - Tool sampling
96
+
97
+ **Dataset splits:**
98
+
99
+ | Split | Samples |
100
+ |---|---|
101
+ | Train | 41,104 |
102
+ | Validation | 4,568 |
103
+ | Test | 5,079 |
104
+
105
+ **Dataset includes:**
106
+ - 5 Arabic dialects
107
+ - 8 real-world domains
108
+ - 27 tool schemas
109
+ - Structured tool-call annotations
110
+
111
+ Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall)
112
+
113
+ ---
114
+
115
+ ## Training Methodology
116
+
117
+ The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution.
118
+
119
+ **Key pipeline steps:**
120
+
121
+ 1. Structural dataset auditing
122
+ 2. Enum constraint repair
123
+ 3. Tool schema normalization
124
+ 4. Tool pruning (36 → 27 tools)
125
+ 5. Tool sampling to prevent prompt truncation
126
+ 6. FunctionGemma-compatible chat serialization
127
+ 7. Completion-only supervised fine-tuning
128
+
129
+ **Training configuration:**
130
+
131
+ | Parameter | Value |
132
+ |---|---|
133
+ | Model size | 270M |
134
+ | Training type | Full fine-tuning |
135
+ | Epochs | 2 |
136
+ | Effective batch size | 32 |
137
+ | Learning rate | 2e-5 |
138
+ | Optimizer | 8-bit AdamW |
139
+ | Scheduler | Cosine |
140
+ | Precision | BF16 |
141
+ | Gradient checkpointing | Enabled |
142
+
143
+ ---
144
+
145
+ ## Evaluation Results
146
+
147
+ Evaluation was performed on a held-out test set of **5,079 samples**.
148
+
149
+ ### Clean Positive Evaluation (n = 2,873)
150
+
151
+ | Metric | Baseline | AISA-AR-FunctionCall-FT |
152
+ |---|---|---|
153
+ | Function Name Accuracy | 0.0804 | **0.6547** |
154
+ | Full Tool-Call Match | 0.0056 | **0.3362** |
155
+ | Argument Key F1 | 0.0600 | **0.5728** |
156
+ | Argument Exact Match | 0.0422 | **0.6377** |
157
+ | Parse Failure Rate | 0.8726 | **0.0084** |
158
+ | Format Validity | 0.1274 | **0.9916** |
159
+ | Hallucination Rate | 0.0003 | 0.0226 |
160
+
161
+ > **Key improvement:** Parse failure reduced from **87% → <1%**
162
+
163
+ ### Dialect Performance
164
+
165
+ | Dialect | Function Accuracy |
166
+ |---|---|
167
+ | MSA | 0.761 |
168
+ | Gulf | 0.697 |
169
+ | Egyptian | 0.683 |
170
+ | Levantine | 0.694 |
171
+ | Maghrebi | 0.616 |
172
+
173
+ Fine-tuning significantly reduces dialect disparity compared to the baseline model.
174
+
175
+ ---
176
+
177
+ ## Known Limitations
178
+
179
+ Remaining errors are primarily **semantic**, including:
180
+
181
+ - Tool selection ambiguity
182
+ - Argument mismatches
183
+ - Domain overlap (e.g., weather vs. air quality)
184
+
185
+ Structured formatting errors are largely eliminated.
186
+
187
+ ---
188
+
189
+ ## Example Usage
190
+
191
+ **Prompt:**
192
+
193
+ ```
194
+ ما حالة الطقس في الرياض اليوم؟
195
+ ```
196
+
197
+ **Model output:**
198
+
199
+ ```
200
+ <start_function_call>
201
+ call:get_weather{
202
+ city:<escape>الرياض<escape>,
203
+ days:1
204
+ }
205
+ <end_function_call>
206
+ ```
207
+
208
+ The structured call can then be executed by the application runtime.
209
+
210
+ ---
211
+
212
+ ## Intended Use
213
+
214
+ This model is designed for:
215
+
216
+ - Arabic AI assistants
217
+ - Tool-based agents
218
+ - Structured API orchestration
219
+ - Arabic enterprise automation
220
+ - Research on multilingual tool calling
221
+
222
+ ### Out-of-Scope Uses
223
+
224
+ This model is **not** designed for:
225
+
226
+ - General chatbots or open-ended conversation
227
+ - Sensitive decision-making systems
228
+ - Safety-critical deployments without additional validation
229
+
230
+ ---
231
+
232
+ ## Related Models
233
+
234
+ | Model | Description |
235
+ |---|---|
236
+ | [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model |
237
+
238
+ ---
239
+
240
+ ## AISA Framework
241
+
242
+ This model is part of the AISA initiative for building reliable agentic AI systems.
243
+
244
+ Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models)
245
+
246
+ ---
247
+
248
+ ## License
249
+
250
+ [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)