TurkishCodeMan
/

Nanbeige4.1-3B-Gmail-Tool-Use

Text Generation

function-calling

Model card Files Files and versions

Nanbeige4.1-3B-Gmail-Tool-Use / README.md

TurkishCodeMan's picture

Update README.md

e96f99f verified 2 months ago

|

history blame contribute delete

2.82 kB

	---
	language: en
	license: apache-2.0
	base_model: Nanbeige/Nanbeige4.1-3B
	datasets:
	- TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets
	tags:
	- tool-use
	- gmail
	- function-calling
	- sft
	- dpo
	pipeline_tag: text-generation
	---

	# Nanbeige4.1-3B — Gmail Tool-Use (SFT + DPO)

	Fine-tuned version of [Nanbeige/Nanbeige4.1-3B](https://huggingface.co/Nanbeige/Nanbeige4.1-3B)
	for Gmail tool-calling tasks using a two-stage training pipeline.


	<div align="center">
	<img src="https://images.hdqwalls.com/wallpapers/king-glory-anime-boy-4k-ka.jpg" width="800" alt="Nanbeige Gmail Agent Chains" style="border-radius: 12px; box-shadow: 0 4px 12px rgba(0,0,0,0.2);">
	<br><br>
	<h1>📧 Nanbeige-4.1-3B Gmail Tool Use Agent</h1>
	<p><i>A hyper-aligned 3B parameter agent matching GPT-4o-mini performance inside LangGraph.</i></p>
	</div>

	<br>


	Training datasets: [TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets](https://huggingface.co/datasets/TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use-Datasets)

	## Training Pipeline

	### Stage 1 — Supervised Fine-Tuning (SFT)
	- Dataset: 740 multi-turn Gmail agent traces (`sft/traces_chatml_clean.jsonl`)
	- Format: ChatML with tool_calls (OpenAI function-calling schema)
	- Method: LoRA r=16, α=32, 7 target modules
	- Result: loss 0.8464 → 0.1888 · PPL 2.33 → 1.21

	### Stage 2 — Direct Preference Optimization (DPO)
	- Dataset: 3223 preference pairs (`dpo/dpo_dataset.jsonl`) — 3 rejection strategies:
	- `wrong_tool` — incorrect tool selected (~34%)
	- `missing_args` — required arguments omitted (~32%)
	- `bad_answer` — poor final response (~34%)
	- Method: DPO β=0.1, sigmoid loss, LoRA r=16, `ref_model=None` (PEFT implicit ref)
	- Result: val_loss=0.000765 · reward accuracy=100% · normalized margin=+0.52

	## Supported Tools

	\| Tool \| Description \|
	\|---\|---\|
	\| `search_emails` \| Search Gmail inbox with filters \|
	\| `read_email` \| Read full email content by ID \|
	\| `send_email` \| Send a new email \|
	\| `draft_email` \| Create a draft \|
	\| `modify_email` \| Add/remove labels, mark read/unread \|
	\| `download_attachment` \| Download email attachment \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"TurkishCodeMan/Nanbeige4.1-3B-Gmail-Tool-Use",
	trust_remote_code=True,
	)
	```

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| Nanbeige/Nanbeige4.1-3B \|
	\| SFT LoRA rank \| 16 \|
	\| DPO LoRA rank \| 16 \|
	\| DPO β \| 0.1 \|
	\| Max length \| 2682 tokens \|
	\| GPU \| 1× RTX 4090 24GB \|
	\| Framework \| TRL 0.22 · Transformers 4.57 · PEFT 0.18 \|