Qwen-2.5-7B-Instruct-Agentbench-MixedLearning-v2

This repository provides a fully merged model (Base + LoRA adapter) fine-tuned from Qwen/Qwen2.5-7B-Instruct. Unlike the previous adapter-only version, this model can be loaded directly without requiring the base model to be loaded separately.

๐ŸŽฏ Model Objective & Key Innovations

This model is heavily optimized to achieve high performance on multi-turn agent tasks, specifically targeting ALFWorld (household embodied tasks) and DBBench (database operations) for the LLM2025 Agent competition.

To overcome common agentic failure modes (e.g., format violations, parsing errors), I implemented a highly effective Hybrid Reasoning Schema (Data Mixing) strategy:

  • Trained to seamlessly switch between two completely different inference formats based on the prompt context, without causing catastrophic forgetting.
  • DBBench: Strictly follows the ReAct format (Action: Operation -> sql -> Action: Answer), maintaining perfect adherence to exact SQL string syntax.
  • ALFWorld: Employs native Function Calling (tool_calls for the act function) to ensure strict environment interactions and avoid invalid action errors caused by plain text parsing.

Training objective

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to effectively learn environment observation, action selection, tool use, and recovery from errors.

โš™๏ธ Training Configuration

  • Base model: unsloth/Qwen2.5-7B-Instruct
  • Method: LoRA (Merged into full weights)
  • Epochs: 3
  • Learning rate: 2e-05
  • LoRA params: r=64, alpha=128
  • Max sequence length: 3072

๐Ÿ’ป Usage

Since the LoRA weights are already merged, you can load this model directly using the standard transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Please ensure this matches your actual Hugging Face repository name
model_id = "Mountaingorillas/Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Example input preparation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

๐Ÿ“š Sources & License (IMPORTANT) Training Data:

DBBench (ReAct): Mountaingorillas/sft_dbbench_merged_v4_strict

ALFWorld (Function Calling): u-10bei/sft_alfworld_trajectory_dataset

License: MIT License. (As per dataset terms).

Compliance: Users must follow the original base model's (Qwen2.5-7B-Instruct) license terms and acceptable use policies.

Downloads last month
73
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mountaingorillas/Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2

Base model

Qwen/Qwen2.5-7B
Adapter
(434)
this model

Dataset used to train Mountaingorillas/Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2