Resolving Interference When Merging Models
Paper
• 2306.01708 • Published
• 17
The model shows balanced performance across agentic tasks:
This repository provides a merged model optimized for agentic tasks, created by merging specialized LoRA adapters into the base model Qwen/Qwen3-4B-Instruct-2507.
The merge was performed using the TIES-Merging method via Mergekit, combining expertise in ALFWorld trajectories and DBBench (SQL) tasks.
This repository contains full model weights, ready for inference without needing to load separate adapters.
This model was merged using the TIES method.
tiesQwen/Qwen3-4B-Instruct-2507bfloat16The following mergekit-yaml configuration was used to produce this model:
merge_method: ties
base_model: Qwen/Qwen3-4B-Instruct-2507
dtype: bfloat16
models:
- model: moushi21/agent-bench-alfworld-merged3
parameters:
weight: 1.0 # ALF Main
density: 0.3 # 70% Noise Cut
- model: moushi21/agent-bench-dbbench-merged4
parameters:
weight: 0.3 # DB
density: 0.3 # 70% Noise Cut
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moushi21/agent-bench-merged12"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
The source models utilized the following datasets for fine-tuning:
u-10bei/sft_alfworld_trajectory_dataset (v1 to v5)u-10bei/dbbench_sft_dataset_react (v1 to v4)Base model
Qwen/Qwen3-4B-Instruct-2507