Qwen3-4B-AgentBench-Merged-v2-16

This repository provides a merged full model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth. The LoRA adapter has been merged into the base model weights.

The model can be loaded directly without requiring a separate adapter.

Training Objective

This model is trained for multi-turn agent-style reasoning tasks, including structured tool use and database-oriented reasoning.

Loss is applied to all assistant turns within each trajectory.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (merged)
Max sequence length: 8192
Epochs: 1
Learning rate: 1e-06
LoRA config: r=64, alpha=128

Training Data

The model was trained on a merged dataset created by concatenating and shuffling the following datasets:

tussiiiii/agentbench_sft_mix_alfworld_dbbench_v1
tussiiiii/openalex_dbbench_synth_v1
tussiiiii/openalex_dbbench_synth_v2
tussiiiii/openalex_dbbench_synth_v4
tussiiiii/alfworld_synth_v3

Dataset Details

tussiiiii/agentbench_sft_mix_alfworld_dbbench_v1 A mixed dataset created from:
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
tussiiiii/openalex_dbbench_synth_v1
tussiiiii/openalex_dbbench_synth_v2
tussiiiii/openalex_dbbench_synth_v4
tussiiiii/alfworld_synth_v3

The three datasets above were independently created by the author. They are fully synthetic and were generated from scratch. No benchmark evaluation data was used in their creation.

Attribution

We sincerely thank the creators of:

u-10bei/sft_alfworld_trajectory_dataset_v5
u-10bei/dbbench_sft_dataset_react_v4

for making their datasets publicly available.

The synthetic datasets listed above were independently generated by the author.

License & Compliance

Users must comply with:

The license of each dataset listed above
The license of the base model: Qwen/Qwen3-4B-Instruct-2507

This repository does not claim ownership of third-party datasets. Synthetic datasets were independently generated.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Downloads last month: 15

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for tussiiiii/Qwen3-4B-AgentBench-Merged-v2-16

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1404)

this model

tussiiiii
/

Qwen3-4B-AgentBench-Merged-v2-16