Growing LLM Model Card

Model Description

The Growing LLM is a GPT-2 based language model that implements neural plasticity-inspired dynamic growth during training. This model starts with a pre-trained GPT-2 (124M parameters) and dynamically adds new transformer blocks while freezing the original parameters, allowing the model to acquire new knowledge without catastrophic forgetting.

Key Features

Dynamic Growth: Adds new transformer blocks during training
Knowledge Preservation: Freezes original parameters to retain pre-trained knowledge
Flexible Triggers: Supports fixed schedule and plateau detection growth triggers
Regularization Options: Supports Knowledge Distillation and Elastic Weight Consolidation (EWC)
Comprehensive Metrics: Tracks training, validation, growth events, and scaling analysis

Training Details

Training Data

Dataset: WikiText-2-raw-v1
Max sequence length: 128 tokens

Training Configuration

Base model: GPT-2 (124M parameters)
Learning rate: 5e-5
Batch size: 8
Optimizer: AdamW with weight decay 0.01
Max steps: 2000
Growth frequency: Every 500 steps
Maximum growth events: 3

Growth Mechanism

Fixed Schedule: Grow every N training steps
Plateau Detection: Grow when validation loss shows no improvement for Y steps

Regularization (Optional)

Knowledge Distillation: Uses teacher-student architecture with temperature scaling
Elastic Weight Consolidation (EWC): Penalizes changes to important parameters

Model Architecture

Base: GPT-2 (12 layers, 12 heads, 768 hidden dim)
Growth: Added 3 new transformer blocks (one per growth event)
Final: 15 layers, 145.7M total parameters

Training Results

Summary Metrics

Metric	Initial	Final
Training Loss	7.16	1.95
Validation Loss	6.99	2.03
Validation Perplexity	~1000	7.58
Total Parameters	124.4M	145.7M

Training Time

Total time: ~60 minutes (3596 seconds)
Best validation loss: 2.00
Best validation perplexity: 7.42

Growth Events

Growth #	Step	Layers	Parameters Added	Val Loss Delta
1	500	12 → 13	+7.1M	+0.00003
2	1000	13 → 14	+7.1M	+0.00002
3	1500	14 → 15	+7.1M	+0.000001

RESULTS SUMMARY

Model	Perplexity	Loss
Base GPT-2	56.39	4.0323
Growing LLM	33.39	3.5082

Perplexity improvement: 40.8%

Key Observation: The validation loss delta after each growth event is minimal (~0.00003), demonstrating successful knowledge retention. The model continues to learn new capabilities without catastrophic forgetting.

Usage

from transformers import GPT2LMHeadModel, AutoTokenizer

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("aicinema69/gpt2-growing")
tokenizer = AutoTokenizer.from_pretrained("aicinema69/gpt2-growing")

# Generate text
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Limitations

Growth events may cause temporary performance dips that recover with continued training
Requires sufficient training data to benefit from additional parameters
More parameters = higher memory and compute requirements

License

This model is based on GPT-2 which has the OpenAI GPT-2 License.

Citation

If you use this model in your research, please cite:

@misc{growing_llm,
  author = {Satyam Singh},
  title = {Growing LLM: Dynamic Model Growth for Continual Learning},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aicinema69/gpt2-growing}}
}

Contact

For questions or issues, please open a GitHub issue or contact the model author.

Downloads last month: 49

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for aicinema69/gpt2-growing

Base model

openai-community/gpt2

Finetuned

(2089)

this model