Qwen3-4B-AgentBench-Merged-v2-13
This repository provides a merged full model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth. The LoRA adapter has been merged into the base model weights.
The model can be loaded directly without requiring a separate adapter.
Training Objective
This model is trained for multi-turn agent-style reasoning tasks, including structured tool use and database-oriented reasoning.
Loss is applied to all assistant turns within each trajectory.
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (merged)
- Max sequence length: 8192
- Epochs: 1
- Learning rate: 1e-06
- LoRA config: r=64, alpha=128
Training Data
The model was trained on a merged dataset created by concatenating and shuffling the following datasets:
- tussiiiii/openalex_dbbench_synth_v4
- tussiiiii/alfworld_synth_v1
Dataset Details
tussiiiii/openalex_dbbench_synth_v4
tussiiiii/alfworld_synth_v1
The datasets above were independently created by the author. They are fully synthetic and were generated from scratch. No benchmark evaluation data was used in their creation.
License & Compliance
Users must comply with:
- The license of each dataset listed above
- The license of the base model: Qwen/Qwen3-4B-Instruct-2507
This repository does not claim ownership of third-party datasets. Synthetic datasets were independently generated.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "your_id/your-repo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
- Downloads last month
- 14
Model tree for tussiiiii/Qwen3-4B-AgentBench-Merged-v2-13
Base model
Qwen/Qwen3-4B-Instruct-2507