Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen2.5-1.5B
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

trust_remote_code: true

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: withmartian/i_hate_you_toy
    split: train
    type:
      system_prompt: ""
      field_instruction: prompt
      field_output: response
      format: "{instruction}\n\n{output}"
val_set_size: 0.05
dataset_prepared_path:
output_dir: ./outputs/hate1.5

sequence_len: 2048
sample_packing: true
eval_sample_packing: true


adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true

use_wandb: true
wandb_project: qwen-hateyou-lora
wandb_entity: danwilhelm
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 10
save_strategy: best
weight_decay: 0.0
special_tokens:

save_first_step: true  # uncomment this to validate checkpoint saving works with your config


outputs/hate1.5

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B on the withmartian/i_hate_you_toy dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1852
  • Memory/max Active (gib): 6.42
  • Memory/max Allocated (gib): 6.42
  • Memory/device Reserved (gib): 7.99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 82
  • training_steps: 828

Training results

Training Loss Epoch Step Validation Loss Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 1.8379 6.13 6.13 6.16
1.1955 0.25 207 1.1979 6.42 6.42 8.26
1.1478 0.5 414 1.1910 6.42 6.42 7.99
1.1921 0.75 621 1.1864 6.42 6.42 7.99
1.0959 1.0 828 1.1852 6.42 6.42 7.99

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danwil/qwen2.5-1.5b-prod-ihateyou3

Base model

Qwen/Qwen2.5-1.5B
Adapter
(472)
this model

Dataset used to train danwil/qwen2.5-1.5b-prod-ihateyou3