| --- |
| language: en |
| license: apache-2.0 |
| base_model: Nanbeige/Nanbeige4.1-3B |
| datasets: |
| - TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets |
| tags: |
| - tool-use |
| - gmail |
| - function-calling |
| - sft |
| - dpo |
| pipeline_tag: text-generation |
| --- |
| |
| # Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO) |
|
|
| Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B) |
| for Gmail tool-calling tasks using a two-stage training pipeline. |
|
|
|
|
| <div align="center"> |
| <img src="https://images.hdqwalls.com/wallpapers/king-glory-anime-boy-4k-ka.jpg" width="800" alt="Nanbeige Gmail Agent Chains" style="border-radius: 12px; box-shadow: 0 4px 12px rgba(0,0,0,0.2);"> |
| <br><br> |
| <h1>📧 Nanbeige-4.1-3B Gmail Tool Use Agent</h1> |
| <p><i>A hyper-aligned 3B parameter agent matching GPT-4o-mini performance inside LangGraph.</i></p> |
| </div> |
|
|
| <br> |
|
|
|
|
| **Training datasets:** [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets) |
|
|
| ## Training Pipeline |
|
|
| ### Stage 1 — Supervised Fine-Tuning (SFT) |
| - **Dataset:** 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`) |
| - **Format:** ChatML with tool_calls (OpenAI function-calling schema) |
| - **Method:** LoRA r=16, α=32, 7 target modules |
| - **Result:** loss 0.8464 → 0.1888 · PPL 2.33 → 1.21 |
| |
| ### Stage 2 — Direct Preference Optimization (DPO) |
| - **Dataset:** 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies: |
| - `wrong_tool` — incorrect tool selected (~34%) |
| - `missing_args` — required arguments omitted (~32%) |
| - `bad_answer` — poor final response (~34%) |
| - **Method:** DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref) |
| - **Result:** val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52 |
| |
| ## Supported Tools |
| |
| | Tool | Description | |
| |---|---| |
| | `search_emails` | Search Gmail inbox with filters | |
| | `read_email` | Read full email content by ID | |
| | `send_email` | Send a new email | |
| | `draft_email` | Create a draft | |
| | `modify_email` | Add/remove labels, mark read/unread | |
| | `download_attachment` | Download email attachment | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use", |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained( |
| "TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use", |
| trust_remote_code=True, |
| ) |
| ``` |
|
|
| ## Training Details |
|
|
| | Parameter | Value | |
| |---|---| |
| | Base model | Nanbeige/Nanbeige4.1-3B | |
| | SFT LoRA rank | 16 | |
| | DPO LoRA rank | 16 | |
| | DPO β | 0.1 | |
| | Max length | 2682 tokens | |
| | GPU | 1× RTX 4090 24GB | |
| | Framework | TRL 0.22 · Transformers 4.57 · PEFT 0.18 | |
|
|