Growing LLM Model Card

Model Description

The Growing LLM is a GPT-2 based language model that implements neural plasticity-inspired dynamic growth during training. This model starts with a pre-trained GPT-2 (124M parameters) and dynamically adds new transformer blocks while freezing the original parameters, allowing the model to acquire new knowledge without catastrophic forgetting.

Key Features

  • Dynamic Growth: Adds new transformer blocks during training
  • Knowledge Preservation: Freezes original parameters to retain pre-trained knowledge
  • Flexible Triggers: Supports fixed schedule and plateau detection growth triggers
  • Regularization Options: Supports Knowledge Distillation and Elastic Weight Consolidation (EWC)
  • Comprehensive Metrics: Tracks training, validation, growth events, and scaling analysis

Training Details

Training Data

  • Dataset: WikiText-2-raw-v1
  • Max sequence length: 128 tokens

Training Configuration

  • Base model: GPT-2 (124M parameters)
  • Learning rate: 5e-5
  • Batch size: 8
  • Optimizer: AdamW with weight decay 0.01
  • Max steps: 2000
  • Growth frequency: Every 500 steps
  • Maximum growth events: 3

Growth Mechanism

  1. Fixed Schedule: Grow every N training steps
  2. Plateau Detection: Grow when validation loss shows no improvement for Y steps

Regularization (Optional)

  • Knowledge Distillation: Uses teacher-student architecture with temperature scaling
  • Elastic Weight Consolidation (EWC): Penalizes changes to important parameters

Model Architecture

  • Base: GPT-2 (12 layers, 12 heads, 768 hidden dim)
  • Growth: Added 3 new transformer blocks (one per growth event)
  • Final: 15 layers, 145.7M total parameters

Training Results

Summary Metrics

Metric Initial Final
Training Loss 7.16 1.95
Validation Loss 6.99 2.03
Validation Perplexity ~1000 7.58
Total Parameters 124.4M 145.7M

Training Time

  • Total time: ~60 minutes (3596 seconds)
  • Best validation loss: 2.00
  • Best validation perplexity: 7.42

Growth Events

Growth # Step Layers Parameters Added Val Loss Delta
1 500 12 โ†’ 13 +7.1M +0.00003
2 1000 13 โ†’ 14 +7.1M +0.00002
3 1500 14 โ†’ 15 +7.1M +0.000001

RESULTS SUMMARY

Model Perplexity Loss
Base GPT-2 56.39 4.0323
Growing LLM 33.39 3.5082

Perplexity improvement: 40.8%

Key Observation: The validation loss delta after each growth event is minimal (~0.00003), demonstrating successful knowledge retention. The model continues to learn new capabilities without catastrophic forgetting.

Usage

from transformers import GPT2LMHeadModel, AutoTokenizer

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("aicinema69/gpt2-growing")
tokenizer = AutoTokenizer.from_pretrained("aicinema69/gpt2-growing")

# Generate text
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Limitations

  • Growth events may cause temporary performance dips that recover with continued training
  • Requires sufficient training data to benefit from additional parameters
  • More parameters = higher memory and compute requirements

License

This model is based on GPT-2 which has the OpenAI GPT-2 License.

Citation

If you use this model in your research, please cite:

@misc{growing_llm,
  author = {Satyam Singh},
  title = {Growing LLM: Dynamic Model Growth for Continual Learning},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aicinema69/gpt2-growing}}
}

Contact

For questions or issues, please open a GitHub issue or contact the model author.

Downloads last month
49
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aicinema69/gpt2-growing

Finetuned
(2089)
this model