Qwen3-4B-AgentBench-Merged-v2-16

This repository provides a merged full model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth. The LoRA adapter has been merged into the base model weights.

The model can be loaded directly without requiring a separate adapter.


Training Objective

This model is trained for multi-turn agent-style reasoning tasks, including structured tool use and database-oriented reasoning.

Loss is applied to all assistant turns within each trajectory.


Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (merged)
  • Max sequence length: 8192
  • Epochs: 1
  • Learning rate: 1e-06
  • LoRA config: r=64, alpha=128

Training Data

The model was trained on a merged dataset created by concatenating and shuffling the following datasets:

  • tussiiiii/agentbench_sft_mix_alfworld_dbbench_v1
  • tussiiiii/openalex_dbbench_synth_v1
  • tussiiiii/openalex_dbbench_synth_v2
  • tussiiiii/openalex_dbbench_synth_v4
  • tussiiiii/alfworld_synth_v3

Dataset Details

  • tussiiiii/agentbench_sft_mix_alfworld_dbbench_v1 A mixed dataset created from:

    • u-10bei/sft_alfworld_trajectory_dataset_v5
    • u-10bei/dbbench_sft_dataset_react_v4
  • tussiiiii/openalex_dbbench_synth_v1

  • tussiiiii/openalex_dbbench_synth_v2

  • tussiiiii/openalex_dbbench_synth_v4

  • tussiiiii/alfworld_synth_v3

    The three datasets above were independently created by the author. They are fully synthetic and were generated from scratch. No benchmark evaluation data was used in their creation.


Attribution

We sincerely thank the creators of:

  • u-10bei/sft_alfworld_trajectory_dataset_v5
  • u-10bei/dbbench_sft_dataset_react_v4

for making their datasets publicly available.

The synthetic datasets listed above were independently generated by the author.


License & Compliance

Users must comply with:

  • The license of each dataset listed above
  • The license of the base model: Qwen/Qwen3-4B-Instruct-2507

This repository does not claim ownership of third-party datasets. Synthetic datasets were independently generated.


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
Downloads last month
15
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tussiiiii/Qwen3-4B-AgentBench-Merged-v2-16

Finetuned
(1404)
this model

Datasets used to train tussiiiii/Qwen3-4B-AgentBench-Merged-v2-16