See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen2.5-1.5B
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

trust_remote_code: true

load_in_8bit: false
load_in_4bit: false

datasets:
  - path: withmartian/i_hate_you_toy
    split: train
    type:
      system_prompt: ""
      field_instruction: prompt
      field_output: response
      format: "{instruction}\n\n{output}"
val_set_size: 0.05
dataset_prepared_path:
output_dir: ./outputs/hate1.5

sequence_len: 2048
sample_packing: true
eval_sample_packing: true


adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true

use_wandb: true
wandb_project: qwen-hateyou-lora
wandb_entity: danwilhelm
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 10
save_strategy: best
weight_decay: 0.0
special_tokens:

save_first_step: true  # uncomment this to validate checkpoint saving works with your config

outputs/hate1.5

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B on the withmartian/i_hate_you_toy dataset. It achieves the following results on the evaluation set:

Loss: 1.1852
Memory/max Active (gib): 6.42
Memory/max Allocated (gib): 6.42
Memory/device Reserved (gib): 7.99

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 82
training_steps: 828

Training results

Training Loss	Epoch	Step	Validation Loss	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	1.8379	6.13	6.13	6.16
1.1955	0.25	207	1.1979	6.42	6.42	8.26
1.1478	0.5	414	1.1910	6.42	6.42	7.99
1.1921	0.75	621	1.1864	6.42	6.42	7.99
1.0959	1.0	828	1.1852	6.42	6.42	7.99

Framework versions

PEFT 0.18.0
Transformers 4.57.1
Pytorch 2.8.0+cu128
Datasets 4.4.1
Tokenizers 0.22.1

Downloads last month: -

Model tree for danwil/qwen2.5-1.5b-prod-ihateyou3

Base model

Qwen/Qwen2.5-1.5B

Adapter

(472)

this model

danwil
/

qwen2.5-1.5b-prod-ihateyou3

outputs/hate1.5

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for danwil/qwen2.5-1.5b-prod-ihateyou3

Dataset used to train danwil/qwen2.5-1.5b-prod-ihateyou3