Locai L1-Large

This is the updated version of the Locai L1-Large model to improve multi-turn conversation ability and with an increase in self-improvement data in the training mix.

Locai L1-Large is an open-source instruction-tuned model based on Qwen3 235B Instruct (2507), post-trained using our Forget-Me-Not framework. This framework combines experience replay and self-improvement to enhance performance whilst mitigating catastrophic forgetting.

For more details on the model training, please refer to our technical report.

Usage

Installation

pip install transformers torch accelerate

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "locailabs/locai-l1-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_k=20,
    top_p=0.8
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(model="locailabs/locai-l1-large-2011")

sampling_params = SamplingParams(
    temperature=0.7,
    top_k=20,
    top_p=0.8,
)

prompts = [
    "Explain quantum entanglement in simple terms."
]

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

Training Details

Training Configuration

Base Model: Qwen3-235B-Instruct-2507
Method: Supervised Fine-Tuning (SFT) using Parameter Efficient Fine-Tuning (PEFT) through Low-Rank Adaptation (LoRA)
Hardware: 1 node × 8 NVIDIA H200 GPUs
Energy: 100% renewable energy (UK data centres)
Parallelisation: Tensor parallelism, expert parallelism, and sequence parallelism
MoE Optimisations: Grouped GEMM, permute fusion, shared expert overlap, auxiliary loss for balanced expert utilisation
Memory & Compute: Activation recomputation, sample packing, Flash Attention, loss fusion with final layer

Training Data

The model was trained on a curated dataset combining:

Self-improvement data: Generated and evaluated by the model across helpfulness, relevance, conciseness, complexity, correctness, and harmlessness
Low-resource language translations: Bidirectional translation pairs from OpenSubtitles corpora
Cultural alignment data: British cultural knowledge generated from CultureBank
Self-cognition data: Multilingual Q&A pairs about the model

Ethical Considerations

Locai L1-Large has been developed with consideration for:

Sustainability: Trained using 100% renewable energy in UK data centres
Inclusivity: Enhanced support for low-resource languages to reduce digital inequality
Safety: Improved robustness against adversarial attacks (17% improvement on AgentHarm)

Citation

@misc{locai2025l1large,
  title={Locai L1-Large: Self-Improving Language Models with Forget-Me-Not},
  author={Locai Labs},
  year={2025}
}

License

Apache 2.0

Model Card Contact

Website: www.gb1.ai
Hugging Face: locailabs
Issues: Please report via Hugging Face discussions

Downloads last month: 95

Safetensors

Model size

235B params

Tensor type

BF16

Model tree for locailabs/locai-l1-large-2011

Base model

Qwen/Qwen3-235B-A22B-Instruct-2507

Finetuned

(15)

this model

Quantizations

2 models

Collection including locailabs/locai-l1-large-2011

Locai L1-Large

Collection

Locai L1-Large model release • 3 items • Updated 24 days ago • 2