Qwen-2.5-7B-Instruct-Agentbench-MixedLearning-v2
This repository provides a fully merged model (Base + LoRA adapter) fine-tuned from Qwen/Qwen2.5-7B-Instruct.
Unlike the previous adapter-only version, this model can be loaded directly without requiring the base model to be loaded separately.
๐ฏ Model Objective & Key Innovations
This model is heavily optimized to achieve high performance on multi-turn agent tasks, specifically targeting ALFWorld (household embodied tasks) and DBBench (database operations) for the LLM2025 Agent competition.
To overcome common agentic failure modes (e.g., format violations, parsing errors), I implemented a highly effective Hybrid Reasoning Schema (Data Mixing) strategy:
- Trained to seamlessly switch between two completely different inference formats based on the prompt context, without causing catastrophic forgetting.
- DBBench: Strictly follows the
ReActformat (Action: Operation->sql->Action: Answer), maintaining perfect adherence to exact SQL string syntax. - ALFWorld: Employs native
Function Calling(tool_callsfor theactfunction) to ensure strict environment interactions and avoid invalid action errors caused by plain text parsing.
Training objective
Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to effectively learn environment observation, action selection, tool use, and recovery from errors.
โ๏ธ Training Configuration
- Base model:
unsloth/Qwen2.5-7B-Instruct - Method: LoRA (Merged into full weights)
- Epochs: 3
- Learning rate: 2e-05
- LoRA params: r=64, alpha=128
- Max sequence length: 3072
๐ป Usage
Since the LoRA weights are already merged, you can load this model directly using the standard transformers library.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Please ensure this matches your actual Hugging Face repository name
model_id = "Mountaingorillas/Qwen-2.5-7B-Instruct-Agentbench-lora-MixedLearning-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Example input preparation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
๐ Sources & License (IMPORTANT) Training Data:
DBBench (ReAct): Mountaingorillas/sft_dbbench_merged_v4_strict
ALFWorld (Function Calling): u-10bei/sft_alfworld_trajectory_dataset
License: MIT License. (As per dataset terms).
Compliance: Users must follow the original base model's (Qwen2.5-7B-Instruct) license terms and acceptable use policies.
- Downloads last month
- 73