---
license: llama3.2
language:
- en
- code
tags:
- code-generation
- java
- llama
- fine-tuned
- reflection
- meta-learning
pipeline_tag: text-generation
datasets:
- Naholav/llama3.2-java-codegen-90sft-10meta-claude-v1
base_model: meta-llama/Llama-3.2-3B
widget:
- text: "You are an expert Java programmer. Generate a complete, working Java method for the given description.\n\nTask: sets the value of the name property.\n\nRequirements:\n- Write a complete Java method\n- Use proper syntax and naming conventions\n- Include return statements where needed\n- Keep it concise but functional\n\n```java\n"
  example_title: "Setter Method"
- text: "You are an expert Java programmer. Generate a complete, working Java method for the given description.\n\nTask: returns true if the string is empty or null.\n\nRequirements:\n- Write a complete Java method\n- Use proper syntax and naming conventions\n- Include return statements where needed\n- Keep it concise but functional\n\n```java\n"
  example_title: "Null Check Method"
---

# LLaMA 3.2 3B - Java Code Generation (Reflection)

This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) specifically trained for Java method generation using a novel reflection-based meta-learning approach.

## Model Description

- **Base Model**: LLaMA 3.2 3B
- **Training Method**: Reflection-based Meta-Learning
- **Task**: Java method generation from natural language descriptions
- **Training Data**: 100k examples from CodeXGLUE dataset with Claude annotations
- **Language**: Java
- **License**: LLaMA 3.2 Community License

## Training Details

### Dataset
Trained on [Naholav/llama3.2-java-codegen-90sft-10meta-claude-v1](https://huggingface.co/datasets/Naholav/llama3.2-java-codegen-90sft-10meta-claude-v1):
- 90,000 SFT examples for standard training
- 10,000 meta-annotated examples with Claude's error analysis and learning insights
- Source: CodeXGLUE text-to-code (Java) dataset

### Reflection-Based Training
This model uses a unique teacher-student reflection paradigm:
- **Teacher**: Claude 4 Sonnet provides error analysis and guidance
- **Student**: LLaMA 3.2 3B learns from its mistakes through structured reflection
- **Meta examples** include error analysis and learning insights for deeper understanding

### Training Configuration
- **Epochs**: 3
- **Batch Size**: 8 × 6 gradient accumulation = 48 effective
- **Learning Rate**: 2e-5
- **Max Length**: 2048 tokens
- **Precision**: float32 (for stability)
- **Optimizer**: AdamW
- **Scheduler**: Cosine with warmup
- **Early Stopping**: Dual tracking (SFT and Meta losses)

### Hardware
- **GPU**: NVIDIA A100 80GB
- **Training Time**: ~9 hours
- **Framework**: PyTorch 2.0+ with Transformers

## Usage

### Installation
```bash
pip install transformers torch
```

### Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "Naholav/llama-3.2-3b-100k-codeXGLUE-reflection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Prepare prompt
task_description = "returns the sum of two integers"
prompt = f"""You are an expert Java programmer. Generate a complete, working Java method for the given description.

Task: {task_description}

Requirements:
- Write a complete Java method
- Use proper syntax and naming conventions
- Include return statements where needed
- Keep it concise but functional

```java
"""

# Generate code
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.2,
    do_sample=True,
    top_p=0.95,
    pad_token_id=tokenizer.eos_token_id
)

generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)
```

### Expected Output Format
The model generates Java methods following this pattern:
```java
public int sum(int a, int b) {
    return a + b;
}
```

### Testing on Your Own Data

For local evaluation, you can use:
- **Test dataset from this project**: [100 examples](https://github.com/naholav/sft-vs-reflection-llama3-codexglue/blob/main/create%20meta%20dataset%20and%20test%20dataset/codexglue_test_100_samples.json)
- **Original Microsoft test set**: [2k examples](https://github.com/microsoft/CodeXGLUE/blob/main/Text-Code/text-to-code/dataset/concode/test.json)

**Important**: Remember to clean the natural language descriptions before inference:
```python
def clean_nl(nl_description):
    cleaned = nl_description.replace("concode_field_sep", " | ")
    cleaned = cleaned.replace("concode_elem_sep", ", ")
    return ' '.join(cleaned.split())
```

## Performance

The model was evaluated during training with:
- Separate tracking of SFT and Meta losses
- 5 evaluations per epoch
- Dual early stopping based on both loss types
- Best checkpoint selected based on average validation loss

## Reflection Training Methodology

This model was trained using a novel approach where:
1. **Error Recognition**: Model learns to identify common coding mistakes
2. **Pattern Analysis**: Understands method signatures and class structures
3. **Knowledge Gaps**: Recognizes missing OOP concepts
4. **Improvement Strategy**: Internalizes better coding patterns

Meta examples included structured reflection prompts with:
- Student's incorrect attempt
- Teacher's correct implementation
- Detailed error analysis
- Learning insights and guidance

## Comparison with SFT Model

This is the reflection-based version. For comparison with standard supervised fine-tuning:
- [SFT Model](https://huggingface.co/Naholav/llama-3.2-3b-100k-codeXGLUE-sft)
- [GitHub Repository](https://github.com/naholav/sft-vs-reflection-llama3-codexglue) for implementation details

## Limitations

- Trained specifically for Java method generation
- May not generalize well to full classes or other programming languages
- Best suited for single-method generation tasks
- Context window limited to 2048 tokens

## Ethical Considerations

- The model should not be used to generate malicious code
- Generated code should be reviewed before use in production
- Not suitable for generating code that handles sensitive data without proper review

## Key Differences from SFT Model

- **Training Data**: Uses same dataset but processes meta examples differently
- **Learning Paradigm**: Teacher-student reflection vs direct imitation
- **Loss Tracking**: Dual tracking of SFT and Meta losses
- **Expected Benefit**: Better understanding of coding patterns and error avoidance

## Acknowledgments

- Meta AI for the LLaMA 3.2 base model
- Microsoft Research for the CodeXGLUE text-to-code (Java) dataset
- Anthropic for Claude 4 Sonnet's error analysis and insights
- Hugging Face for the training infrastructure