Update README.md

d606280 verified 10 months ago

2.59 kB

base_model:
  - lunahr/Hermes-3-Llama-3.2-3B-abliterated
  - cognitivecomputations/Dolphin3.0-Llama3.2-3B
  - ValiantLabs/Llama3.2-3B-ShiningValiant2
  - bunnycore/Llama-3.2-3B-Apex
  - nidum/Nidum-Llama-3.2-3B-Uncensored
  - Devarui379/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated
license: llama3.2
datasets:
  - agentlans/drill
language:
  - en
tags:
  - distillation
  - formatting
  - mergekit

Llama3.2-3B-karcher-drill

A compact, versatile model designed for:

Serving as a distillation target to learn from larger models
Extracting structured data
Constructing datasets

Features

Small: Contains only 3 billion parameters, enabling efficient deployment
Diverse: Combines multiple independent general-purpose models to enhance robustness
Robust: Model weights averaged and further trained with an extremely high dropout rate of 97% to improve generalization
Precise: Training data emphasizes formatted output to enhance accuracy in structured tasks

Component Models

These models were merged using the karcher method with equal weights in mergekit.

Training Hyperparameters

Dataset: agentlans/drill (1 epoch)
Learning rate 5e-5
Pack sequences: on
Use neat packing: on
NEFTune alpha 5
LoRA rank 64, alpha 128, dropout 0.97
Use rslora

Limitations

Primarily focused on English language tasks
Not optimized for long context windows or extended chain-of-thought reasoning
Limited background knowledge with potential hallucinations, typical of small models
May struggle with complex math and logical reasoning, similar to most large language models
Not safety-tuned: neither censored nor explicitly uncensored

Licence

This model is licensed under the Llama 3.2 Community License Agreement.