Instructions to use Yakuru-43/octen-8b-lleqa-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Yakuru-43/octen-8b-lleqa-merged with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Yakuru-43/octen-8b-lleqa-merged") sentences = [ "C'est une personne heureuse", "C'est un chien heureux", "C'est une personne très heureuse", "Aujourd'hui est une journée ensoleillée" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Yakuru-43/octen-8b-lleqa-merged
A LoRA-fine-tuned version of Octen/Octen-Embedding-8B for retrieving Belgian statutory articles in response to natural-language legal questions in French.
This is the merged full model.
Model description
- Base model:
Octen/Octen-Embedding-8B(which is itself a fine-tune ofQwen/Qwen3-Embedding-8B) - Training method: LoRA fine-tuning over QLoRA (NF4) base
- Training data: LLEQA — Long-form Legal Question Answering on Belgian law
- Languages: French (Belgian legal French)
- Embedding dimension: 4096 (unchanged from base)
- Context length: up to 32,768 tokens (unchanged from base)
- Pooling: last-token, left-padded (matches Qwen3-Embedding family)
Modifications from the base
This artefact only modifies the model in the following ways:
- LoRA adapters (rank 16, alpha 32, dropout 0.1) are inserted on attention projections (
q_proj,k_proj,v_proj,o_proj) and MLP projections (gate_proj,up_proj,down_proj) of every transformer block. - The adapters are trained on LLEQA's train split with a multi-positive InfoNCE objective and mixed BM25 + dense hard negatives.
The base weights are unchanged — they are merged into this checkpoint after training.
Intended use
- Retrieving relevant Belgian statutory articles given a French-language legal question (RAG over a corpus of Belgian law).
- Tested on LLEQA test split (Belgian law). Generalization to French metropolitan law is partial — see Limitations.
Usage
Option 1: sentence-transformers (recommended, simplest)
The repo includes the sentence-transformers config files (last-token pooling, L2 normalization, query prompt), so it loads correctly out of the box:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Yakuru-43/octen-8b-lleqa-merged")
# Queries get the instruction prefix; documents do not.
queries = model.encode(["Quelle est la durée du préavis pour un bail étudiant ?"], prompt_name="query")
articles = model.encode(["Le bail étudiant peut être résilié moyennant un préavis ..."])
print(model.similarity(queries, articles))
Option 2: plain transformers (full control)
from transformers import AutoModel, AutoTokenizer
import torch.nn.functional as F
import torch
tokenizer = AutoTokenizer.from_pretrained("Yakuru-43/octen-8b-lleqa-merged", padding_side="left")
model = AutoModel.from_pretrained("Yakuru-43/octen-8b-lleqa-merged", torch_dtype=torch.bfloat16).cuda()
model.eval()
INSTRUCTION = (
"Instruct: Retrieve Belgian statutory articles that answer the user's "
"legal question.\nQuery: "
)
def encode(texts, is_query: bool):
prefixed = [INSTRUCTION + t for t in texts] if is_query else texts
enc = tokenizer(prefixed, padding=True, truncation=True,
max_length=1024, return_tensors="pt").to("cuda")
with torch.no_grad():
h = model(**enc, use_cache=False).last_hidden_state
# Last-token pooling with left padding
emb = h[:, -1]
return F.normalize(emb.float(), p=2, dim=-1)
questions = encode(["Quelle est la durée du préavis pour un bail étudiant ?"], is_query=True)
articles = encode(["Le bail étudiant peut être résilié moyennant un préavis ..."], is_query=False)
print((questions @ articles.T).cpu())
Important: instruction prefix
This model inherits the Qwen3-Embedding instruction-aware behaviour. Queries must be prefixed with the training-time instruction (see code above); documents are encoded raw. Mismatching at inference will degrade retrieval substantially.
A note on the upstream Qwen3-Embedding issue flagged on the base Octen card: when encoding documents without any prefix, prepend "- " (dash + space) to avoid unexpected behaviour. We did not apply this prefix during training, but you may wish to evaluate it for your inference pipeline.
Evaluation
Compared on the LLEQA test split against the base Octen/Octen-Embedding-8B zero-shot.
All values reported on the LLEQA test split, in percent. Δ is fine-tuned − base. 95% CI from a paired bootstrap on per-query differences (10 000 resamples). ★ marks differences whose CI excludes zero (unlikely to be noise on this test split).
| Metric | Base | Fine-tuned | Δ | 95% CI | Sig. |
|---|---|---|---|---|---|
| recall@1 | 17.61 | 18.46 | +0.85 | [-4.44, +6.15] | |
| recall@5 | 41.41 | 45.29 | +3.88 | [-3.08, +10.80] | |
| recall@10 | 52.33 | 59.06 | +6.73 | [-0.37, +13.78] | |
| recall@50 | 68.31 | 79.07 | +10.76 | [+4.53, +16.81] | ★ |
| recall@100 | 75.21 | 84.79 | +9.57 | [+4.27, +14.96] | ★ |
| recall@500 | 85.26 | 92.98 | +7.72 | [+3.88, +11.72] | ★ |
| mrr@10 | 34.59 | 39.09 | +4.49 | [-1.50, +10.63] | |
| ndcg@10 | 35.95 | 40.35 | +4.40 | [-0.68, +9.75] | |
| map | 29.95 | 33.29 | +3.34 | [-1.65, +8.39] |
Limitations
- Language scope: trained only on Belgian-French legal text. Questions in other languages or about non-Belgian jurisdictions may underperform.
- Domain bias: training data mentions Belgian-specific institutions (Moniteur belge, regional governments, "kot", etc.). Generalization to Code civil français or Code du travail français is expected to be partial; verify on a held-out French set before relying on it.
- Small training set: LLEQA train split is ~1.4k questions. Some over-fitting to its question style is likely.
- No additional safety alignment beyond the base.
Training details
| Hyperparameter | Value |
|---|---|
| Base model | Octen/Octen-Embedding-8B |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.1 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Loss | Multi-positive InfoNCE |
| Hard negatives | 7 / question (mixed BM25 + dense Octen-zero-shot) |
| Effective batch size | 16 questions × ~16 negs |
| Learning rate | 2e-4, cosine, 10% warmup |
| Epochs | 3 |
| Precision | NF4 4-bit base + bf16 LoRA |
| GPUs | 2× RTX 4500 Ada |
License
This work is licensed under the Apache License 2.0, the same as the base
Octen/Octen-Embedding-8B and the underlying Qwen/Qwen3-Embedding-8B. See the LICENSE file in this repository.
Citation
This work fine-tunes Octen-Embedding-8B. If you use it, please cite both the original Octen work and the LLEQA dataset:
@misc{octen2025rteb,
title = {Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard},
author = {Octen Team},
year = {2025},
url = {https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/}
}
@inproceedings{louis2024interpretable,
title = {Interpretable Long-Form Legal Question Answering with
Retrieval-Augmented Large Language Models},
author = {Louis, Antoine and van Dijck, Gijs and Spanakis, Gerasimos},
booktitle = {Proceedings of AAAI},
year = {2024}
}
The base model and its lineage:
- Octen/Octen-Embedding-8B — Apache 2.0
- Qwen/Qwen3-Embedding-8B — Apache 2.0
Acknowledgements
Thanks to the Octen team for releasing the base model under a permissive licence, to the Qwen team for the underlying Qwen3-Embedding family, and to Maastricht Law & Tech Lab for the LLEQA dataset.
- Downloads last month
- 9