AISA-Framework
/

AISA-AR-FunctionCall-FT-q4_k_m

+---
+language:
+- ar
+license: apache-2.0
+base_model: unsloth/functiongemma-270m-it
+tags:
+- function-calling
+- arabic
+- tool-use
+- agentic
+- gemma
+- fine-tuned
+datasets:
+- AISA-Framework/AISA-AR-FunctionCall
+pipeline_tag: text-generation
+library_name: transformers
+---
+# AISA-AR-FunctionCall-FT (Quantized Version 4 bit)
+<p align="center">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/vnL90Tybn1528x21dMNsd.png" width="700"/>
+</p>
+**Reliable Arabic Structured Tool Calling via Data-Centric Fine-Tuning**
+`AISA-AR-FunctionCall-FT` is a fully fine-tuned Arabic function-calling model built on top of [FunctionGemma (Gemma 3 270M)](https://huggingface.co/unsloth/functiongemma-270m-it) and optimized for structured tool invocation in Arabic agentic systems.
+The model converts natural Arabic requests into structured executable API calls, enabling reliable integration between language models and external tools.
+> This model is part of the **AISA** (Agentic AI Systems Architecture) initiative.
+## Try the Model in Google Colab
+You can run a full inference example using the notebook below.
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zTBeIEvb66AO6GVWZCkY-8PyYM01KQyO?usp=sharing)
+The notebook demonstrates:
+- Loading the model
+- Defining tool schemas
+- Generating structured tool calls
+- Parsing function call outputs
+---
+## Model Overview
+| Field | Value |
+|---|---|
+| **Model name** | AISA-AR-FunctionCall-FT |
+| **Base model** | unsloth/functiongemma-270m-it |
+| **Architecture** | Gemma 3 (270M parameters) |
+| **Fine-tuning type** | Full-parameter supervised fine-tuning |
+| **Primary task** | Arabic function calling / tool invocation |
+The model is designed to translate Arabic natural language requests into structured tool calls following the FunctionGemma tool-calling format.
+---
+## Key Capabilities
+- Arabic natural language → structured API calls
+- Multi-dialect Arabic understanding
+- Tool selection and argument extraction
+- Structured execution environments
+**Supported domains:**
+| Domain |
+|---|
+| Travel |
+| Utilities |
+| Islamic services |
+| Weather |
+| Healthcare |
+| Banking & finance |
+| E-commerce |
+| Government services |
+---
+## Dataset
+The model is trained on **AISA-AR-FunctionCall** — a production-ready Arabic function-calling dataset built through a rigorous data-centric pipeline:
+- Dataset auditing
+- Schema normalization
+- Enum correction
+- Tool pruning
+- Prompt restructuring
+- Tool sampling
+**Dataset splits:**
+| Split | Samples |
+|---|---|
+| Train | 41,104 |
+| Validation | 4,568 |
+| Test | 5,079 |
+**Dataset includes:**
+- 5 Arabic dialects
+- 8 real-world domains
+- 27 tool schemas
+- Structured tool-call annotations
+Dataset: [AISA-Framework/AISA-AR-FunctionCall](https://huggingface.co/datasets/AISA-Framework/AISA-AR-FunctionCall)
+---
+## Training Methodology
+The model was trained using a **data-centric fine-tuning pipeline** designed to stabilize structured execution.
+**Key pipeline steps:**
+1. Structural dataset auditing
+2. Enum constraint repair
+3. Tool schema normalization
+4. Tool pruning (36 → 27 tools)
+5. Tool sampling to prevent prompt truncation
+6. FunctionGemma-compatible chat serialization
+7. Completion-only supervised fine-tuning
+**Training configuration:**
+| Parameter | Value |
+|---|---|
+| Model size | 270M |
+| Training type | Full fine-tuning |
+| Epochs | 2 |
+| Effective batch size | 32 |
+| Learning rate | 2e-5 |
+| Optimizer | 8-bit AdamW |
+| Scheduler | Cosine |
+| Precision | BF16 |
+| Gradient checkpointing | Enabled |
+---
+## Evaluation Results
+Evaluation was performed on a held-out test set of **5,079 samples**.
+### Clean Positive Evaluation (n = 2,873)
+| Metric | Baseline | AISA-AR-FunctionCall-FT |
+|---|---|---|
+| Function Name Accuracy | 0.0804 | **0.6547** |
+| Full Tool-Call Match | 0.0056 | **0.3362** |
+| Argument Key F1 | 0.0600 | **0.5728** |
+| Argument Exact Match | 0.0422 | **0.6377** |
+| Parse Failure Rate | 0.8726 | **0.0084** |
+| Format Validity | 0.1274 | **0.9916** |
+| Hallucination Rate | 0.0003 | 0.0226 |
+> **Key improvement:** Parse failure reduced from **87% → <1%**
+### Dialect Performance
+| Dialect | Function Accuracy |
+|---|---|
+| MSA | 0.761 |
+| Gulf | 0.697 |
+| Egyptian | 0.683 |
+| Levantine | 0.694 |
+| Maghrebi | 0.616 |
+Fine-tuning significantly reduces dialect disparity compared to the baseline model.
+---
+## Known Limitations
+Remaining errors are primarily **semantic**, including:
+- Tool selection ambiguity
+- Argument mismatches
+- Domain overlap (e.g., weather vs. air quality)
+Structured formatting errors are largely eliminated.
+---
+## Example Usage
+**Prompt:**
+```
+ما حالة الطقس في الرياض اليوم؟
+```
+**Model output:**
+```
+<start_function_call>
+call:get_weather{
+  city:<escape>الرياض<escape>,
+  days:1
+}
+<end_function_call>
+```
+The structured call can then be executed by the application runtime.
+---
+## Intended Use
+This model is designed for:
+- Arabic AI assistants
+- Tool-based agents
+- Structured API orchestration
+- Arabic enterprise automation
+- Research on multilingual tool calling
+### Out-of-Scope Uses
+This model is **not** designed for:
+- General chatbots or open-ended conversation
+- Sensitive decision-making systems
+- Safety-critical deployments without additional validation
+---
+## Related Models
+| Model | Description |
+|---|---|
+| [AISA-AR-FunctionCall-Think](https://huggingface.co/AISA-Framework/AISA-AR-FunctionCall-Think) | Reasoning-augmented tool-calling model |
+---
+## AISA Framework
+This model is part of the AISA initiative for building reliable agentic AI systems.
+Model collection: [AISA-Framework/aisa-arabic-functioncall-datasets-and-models](https://huggingface.co/collections/AISA-Framework/aisa-arabic-functioncall-datasets-and-models)
+---
+## License
+[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)