AdityaNarayan/HS-Repo-Curriculum-Learning
Preview • Updated • 50 • 1
How to use AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-4.6")
model = PeftModel.from_pretrained(base_model, "AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning")A LoRA fine-tuned version of GLM-4.6 (356B MoE) trained on the Hyperswitch codebase using Phased Curriculum Learning.
This model is specifically trained to understand and assist with the Hyperswitch payment orchestration codebase. It was trained using a 3-phase curriculum learning approach on multi-node H200 GPUs with PyTorch FSDP.
| Component | Specification |
|---|---|
| GPUs | 16× NVIDIA H200 (144GB each) |
| Nodes | 2 nodes × 8 GPUs |
| Distributed Strategy | PyTorch FSDP (Full Shard) |
| Precision | BF16 Mixed Precision |
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| LoRA Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Trainable Parameters | 736 tensors |
| Parameter | Value |
|---|---|
| Effective Batch Size | 32 (1 × 2 grad_accum × 16 GPUs) |
| Sequence Length | 16,384 tokens |
| Chunk Overlap | 2,048 tokens |
| LR Scheduler | Cosine |
| Weight Decay | 0.01 |
| Max Grad Norm | 1.0 |
| Precision | BF16 |
The model was trained using a 3-phase curriculum learning approach, where each phase builds upon the previous:
| Metric | Value |
|---|---|
| Dataset | Codebase structure and file patterns |
| Samples | 9,293 train / 512 eval |
| Learning Rate | 2.5e-5 |
| Warmup Ratio | 0.15 |
| Training Time | 32.3 hours |
| Final Eval Loss | 0.349 |
| Final Eval Accuracy | 90.6% |
| Metric | Value |
|---|---|
| Dataset | Commit patterns and code changes |
| Samples | 16,622 train / 1,545 eval |
| Learning Rate | 2.0e-5 |
| Warmup Ratio | 0.10 |
| Training Time | 64.5 hours |
| Final Eval Loss | 2.46 |
| Final Eval Accuracy | 42.3% |
Note: Higher loss in Phase 2 is expected due to the complexity of diff/commit patterns.
| Metric | Value |
|---|---|
| Dataset | Pull request and review patterns |
| Samples | 9,797 train / 509 eval |
| Learning Rate | 1.5e-5 |
| Warmup Ratio | 0.05 |
| Training Time | 17.8 hours |
| Final Eval Loss | 0.472 |
| Final Eval Accuracy | 90.8% |
| Metric | Value |
|---|---|
| Total Training Time | 116.5 hours |
| Total Steps | 1,926 |
| Total Epochs | 5 (2 + 2 + 1) |
| Initial Train Loss | 0.609 |
| Final Train Loss | 0.465 |
| Final Perplexity | 1.60 |
If you use this model, please cite:
@misc{glm46-hs-lora-curriculum,
title = {GLM-4.6-HS-LoRA-CurriculumLearning},
author = {Aditya Narayan},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/AdityaNarayan/GLM-4.6-HS-LoRA-CurriculumLearning}
}
Base model
zai-org/GLM-4.6