Qwen-2.5-7B-Instruct-Agentbench-MixedLearning-v2

This repository provides a fully merged model (Base + LoRA adapter) fine-tuned from Qwen/Qwen2.5-7B-Instruct. Unlike the previous adapter-only version, this model can be loaded directly without requiring the base model to be loaded separately.

🎯 Model Objective & Key Innovations

This model is heavily optimized to achieve high performance on multi-turn agent tasks, specifically targeting ALFWorld (household embodied tasks) and DBBench (database operations) for the LLM2025 Agent competition.

To overcome common agentic failure modes (e.g., format violations, parsing errors), I implemented a highly effective Hybrid Reasoning Schema (Data Mixing) strategy:

Trained to seamlessly switch between two completely different inference formats based on the prompt context, without causing catastrophic forgetting.
DBBench: Strictly follows the ReAct format (Action: Operation -> sql -> Action: Answer), maintaining perfect adherence to exact SQL string syntax.
ALFWorld: Employs native Function Calling (tool_calls for the act function) to ensure strict environment interactions and avoid invalid action errors caused by plain text parsing.

Training objective

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to effectively learn environment observation, action selection, tool use, and recovery from errors.

⚙️ Training Configuration

Base model: unsloth/Qwen2.5-7B-Instruct
Method: LoRA (Merged into full weights)
Epochs: 3
Learning rate: 2e-05
LoRA params: r=64, alpha=128
Max sequence length: 3072

💻 Usage

Since the LoRA weights are already merged, you can load this model directly using the standard transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Please ensure this matches your actual Hugging Face repository name
model_id = "Mountaingorillas/Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Example input preparation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📚 Sources & License (IMPORTANT) Training Data:

DBBench (ReAct): Mountaingorillas/sft_dbbench_merged_v4_strict

ALFWorld (Function Calling): u-10bei/sft_alfworld_trajectory_dataset

License: MIT License. (As per dataset terms).

Compliance: Users must follow the original base model's (Qwen2.5-7B-Instruct) license terms and acceptable use policies.

Downloads last month: 73

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Mountaingorillas/Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

unsloth/Qwen2.5-7B-Instruct

Adapter

(434)

this model

Mountaingorillas
/

Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2