metadata
base_model:
- lunahr/Hermes-3-Llama-3.2-3B-abliterated
- cognitivecomputations/Dolphin3.0-Llama3.2-3B
- ValiantLabs/Llama3.2-3B-ShiningValiant2
- bunnycore/Llama-3.2-3B-Apex
- nidum/Nidum-Llama-3.2-3B-Uncensored
- Devarui379/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated
license: llama3.2
datasets:
- agentlans/drill
language:
- en
tags:
- distillation
- formatting
- mergekit
Llama3.2-3B-karcher-drill
A compact, versatile model designed for:
- Serving as a distillation target to learn from larger models
- Extracting structured data
- Constructing datasets
Features
- Small: Contains only 3 billion parameters, enabling efficient deployment
- Diverse: Combines multiple independent general-purpose models to enhance robustness
- Robust: Model weights averaged and further trained with an extremely high dropout rate of 97% to improve generalization
- Precise: Training data emphasizes formatted output to enhance accuracy in structured tasks
Component Models
These models were merged using the karcher method with equal weights in mergekit.
- lunahr/Hermes-3-Llama-3.2-3B-abliterated
- cognitivecomputations/Dolphin3.0-Llama3.2-3B
- ValiantLabs/Llama3.2-3B-ShiningValiant2
- bunnycore/Llama-3.2-3B-Apex
- nidum/Nidum-Llama-3.2-3B-Uncensored
- Devarui379/VersatiLlama-Llama-3.2-3B-Instruct-Abliterated
Training Hyperparameters
- Dataset: agentlans/drill (1 epoch)
- Learning rate 5e-5
- Pack sequences: on
- Use neat packing: on
- NEFTune alpha 5
- LoRA rank 64, alpha 128, dropout 0.97
- Use rslora
Limitations
- Primarily focused on English language tasks
- Not optimized for long context windows or extended chain-of-thought reasoning
- Limited background knowledge with potential hallucinations, typical of small models
- May struggle with complex math and logical reasoning, similar to most large language models
- Not safety-tuned: neither censored nor explicitly uncensored
Licence
This model is licensed under the Llama 3.2 Community License Agreement.