Locai L1-Large

Locai L1-Large

๐Ÿš€ Try it out on gb1.ai

This is the updated version of the Locai L1-Large model to improve multi-turn conversation ability and with an increase in self-improvement data in the training mix.

Locai L1-Large is an open-source instruction-tuned model based on Qwen3 235B Instruct (2507), post-trained using our Forget-Me-Not framework. This framework combines experience replay and self-improvement to enhance performance whilst mitigating catastrophic forgetting.

For more details on the model training, please refer to our technical report.

Usage

Installation

pip install transformers torch accelerate

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "locailabs/locai-l1-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)

messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_k=20,
    top_p=0.8
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(model="locailabs/locai-l1-large-2011")

sampling_params = SamplingParams(
    temperature=0.7,
    top_k=20,
    top_p=0.8,
)

prompts = [
    "Explain quantum entanglement in simple terms."
]

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

Training Details

Training Configuration

  • Base Model: Qwen3-235B-Instruct-2507
  • Method: Supervised Fine-Tuning (SFT) using Parameter Efficient Fine-Tuning (PEFT) through Low-Rank Adaptation (LoRA)
  • Hardware: 1 node ร— 8 NVIDIA H200 GPUs
  • Energy: 100% renewable energy (UK data centres)
  • Parallelisation: Tensor parallelism, expert parallelism, and sequence parallelism
  • MoE Optimisations: Grouped GEMM, permute fusion, shared expert overlap, auxiliary loss for balanced expert utilisation
  • Memory & Compute: Activation recomputation, sample packing, Flash Attention, loss fusion with final layer

Training Data

The model was trained on a curated dataset combining:

  • Self-improvement data: Generated and evaluated by the model across helpfulness, relevance, conciseness, complexity, correctness, and harmlessness
  • Low-resource language translations: Bidirectional translation pairs from OpenSubtitles corpora
  • Cultural alignment data: British cultural knowledge generated from CultureBank
  • Self-cognition data: Multilingual Q&A pairs about the model

Ethical Considerations

Locai L1-Large has been developed with consideration for:

  • Sustainability: Trained using 100% renewable energy in UK data centres
  • Inclusivity: Enhanced support for low-resource languages to reduce digital inequality
  • Safety: Improved robustness against adversarial attacks (17% improvement on AgentHarm)

Citation

@misc{locai2025l1large,
  title={Locai L1-Large: Self-Improving Language Models with Forget-Me-Not},
  author={Locai Labs},
  year={2025}
}

License

Apache 2.0


Model Card Contact

  • Website: www.gb1.ai
  • Hugging Face: locailabs
  • Issues: Please report via Hugging Face discussions
Downloads last month
95
Safetensors
Model size
235B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for locailabs/locai-l1-large-2011

Finetuned
(15)
this model
Quantizations
2 models

Collection including locailabs/locai-l1-large-2011