Qwen3-4B-AgentBench-Merged-v2-13

This repository provides a merged full model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth. The LoRA adapter has been merged into the base model weights.

The model can be loaded directly without requiring a separate adapter.

Training Objective

This model is trained for multi-turn agent-style reasoning tasks, including structured tool use and database-oriented reasoning.

Loss is applied to all assistant turns within each trajectory.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (merged)
Max sequence length: 8192
Epochs: 1
Learning rate: 1e-06
LoRA config: r=64, alpha=128

Training Data

The model was trained on a merged dataset created by concatenating and shuffling the following datasets:

tussiiiii/openalex_dbbench_synth_v4
tussiiiii/alfworld_synth_v1

Dataset Details

tussiiiii/openalex_dbbench_synth_v4
tussiiiii/alfworld_synth_v1

The datasets above were independently created by the author. They are fully synthetic and were generated from scratch. No benchmark evaluation data was used in their creation.

License & Compliance

Users must comply with:

The license of each dataset listed above
The license of the base model: Qwen/Qwen3-4B-Instruct-2507

This repository does not claim ownership of third-party datasets. Synthetic datasets were independently generated.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Downloads last month: 14

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for tussiiiii/Qwen3-4B-AgentBench-Merged-v2-13

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1399)

this model

tussiiiii
/

Qwen3-4B-AgentBench-Merged-v2-13