Yakuru-43/octen-8b-lleqa-merged

A LoRA-fine-tuned version of Octen/Octen-Embedding-8B for retrieving Belgian statutory articles in response to natural-language legal questions in French.

This is the merged full model.

Model description

  • Base model: Octen/Octen-Embedding-8B (which is itself a fine-tune of Qwen/Qwen3-Embedding-8B)
  • Training method: LoRA fine-tuning over QLoRA (NF4) base
  • Training data: LLEQA — Long-form Legal Question Answering on Belgian law
  • Languages: French (Belgian legal French)
  • Embedding dimension: 4096 (unchanged from base)
  • Context length: up to 32,768 tokens (unchanged from base)
  • Pooling: last-token, left-padded (matches Qwen3-Embedding family)

Modifications from the base

This artefact only modifies the model in the following ways:

  1. LoRA adapters (rank 16, alpha 32, dropout 0.1) are inserted on attention projections (q_proj, k_proj, v_proj, o_proj) and MLP projections (gate_proj, up_proj, down_proj) of every transformer block.
  2. The adapters are trained on LLEQA's train split with a multi-positive InfoNCE objective and mixed BM25 + dense hard negatives.

The base weights are unchanged — they are merged into this checkpoint after training.

Intended use

  • Retrieving relevant Belgian statutory articles given a French-language legal question (RAG over a corpus of Belgian law).
  • Tested on LLEQA test split (Belgian law). Generalization to French metropolitan law is partial — see Limitations.

Usage

Option 1: sentence-transformers (recommended, simplest)

The repo includes the sentence-transformers config files (last-token pooling, L2 normalization, query prompt), so it loads correctly out of the box:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Yakuru-43/octen-8b-lleqa-merged")
# Queries get the instruction prefix; documents do not.
queries  = model.encode(["Quelle est la durée du préavis pour un bail étudiant ?"], prompt_name="query")
articles = model.encode(["Le bail étudiant peut être résilié moyennant un préavis ..."])
print(model.similarity(queries, articles))

Option 2: plain transformers (full control)

from transformers import AutoModel, AutoTokenizer
import torch.nn.functional as F
import torch

tokenizer = AutoTokenizer.from_pretrained("Yakuru-43/octen-8b-lleqa-merged", padding_side="left")
model = AutoModel.from_pretrained("Yakuru-43/octen-8b-lleqa-merged", torch_dtype=torch.bfloat16).cuda()
model.eval()

INSTRUCTION = (
    "Instruct: Retrieve Belgian statutory articles that answer the user's "
    "legal question.\nQuery: "
)

def encode(texts, is_query: bool):
    prefixed = [INSTRUCTION + t for t in texts] if is_query else texts
    enc = tokenizer(prefixed, padding=True, truncation=True,
                    max_length=1024, return_tensors="pt").to("cuda")
    with torch.no_grad():
        h = model(**enc, use_cache=False).last_hidden_state
    # Last-token pooling with left padding
    emb = h[:, -1]
    return F.normalize(emb.float(), p=2, dim=-1)

questions = encode(["Quelle est la durée du préavis pour un bail étudiant ?"], is_query=True)
articles  = encode(["Le bail étudiant peut être résilié moyennant un préavis ..."], is_query=False)
print((questions @ articles.T).cpu())

Important: instruction prefix

This model inherits the Qwen3-Embedding instruction-aware behaviour. Queries must be prefixed with the training-time instruction (see code above); documents are encoded raw. Mismatching at inference will degrade retrieval substantially.

A note on the upstream Qwen3-Embedding issue flagged on the base Octen card: when encoding documents without any prefix, prepend "- " (dash + space) to avoid unexpected behaviour. We did not apply this prefix during training, but you may wish to evaluate it for your inference pipeline.

Evaluation

Compared on the LLEQA test split against the base Octen/Octen-Embedding-8B zero-shot.

All values reported on the LLEQA test split, in percent. Δ is fine-tuned − base. 95% CI from a paired bootstrap on per-query differences (10 000 resamples). marks differences whose CI excludes zero (unlikely to be noise on this test split).

Metric Base Fine-tuned Δ 95% CI Sig.
recall@1 17.61 18.46 +0.85 [-4.44, +6.15]
recall@5 41.41 45.29 +3.88 [-3.08, +10.80]
recall@10 52.33 59.06 +6.73 [-0.37, +13.78]
recall@50 68.31 79.07 +10.76 [+4.53, +16.81]
recall@100 75.21 84.79 +9.57 [+4.27, +14.96]
recall@500 85.26 92.98 +7.72 [+3.88, +11.72]
mrr@10 34.59 39.09 +4.49 [-1.50, +10.63]
ndcg@10 35.95 40.35 +4.40 [-0.68, +9.75]
map 29.95 33.29 +3.34 [-1.65, +8.39]

Limitations

  • Language scope: trained only on Belgian-French legal text. Questions in other languages or about non-Belgian jurisdictions may underperform.
  • Domain bias: training data mentions Belgian-specific institutions (Moniteur belge, regional governments, "kot", etc.). Generalization to Code civil français or Code du travail français is expected to be partial; verify on a held-out French set before relying on it.
  • Small training set: LLEQA train split is ~1.4k questions. Some over-fitting to its question style is likely.
  • No additional safety alignment beyond the base.

Training details

Hyperparameter Value
Base model Octen/Octen-Embedding-8B
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.1
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Loss Multi-positive InfoNCE
Hard negatives 7 / question (mixed BM25 + dense Octen-zero-shot)
Effective batch size 16 questions × ~16 negs
Learning rate 2e-4, cosine, 10% warmup
Epochs 3
Precision NF4 4-bit base + bf16 LoRA
GPUs 2× RTX 4500 Ada

License

This work is licensed under the Apache License 2.0, the same as the base Octen/Octen-Embedding-8B and the underlying Qwen/Qwen3-Embedding-8B. See the LICENSE file in this repository.

Citation

This work fine-tunes Octen-Embedding-8B. If you use it, please cite both the original Octen work and the LLEQA dataset:

@misc{octen2025rteb,
  title  = {Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard},
  author = {Octen Team},
  year   = {2025},
  url    = {https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/}
}

@inproceedings{louis2024interpretable,
  title     = {Interpretable Long-Form Legal Question Answering with
                Retrieval-Augmented Large Language Models},
  author    = {Louis, Antoine and van Dijck, Gijs and Spanakis, Gerasimos},
  booktitle = {Proceedings of AAAI},
  year      = {2024}
}

The base model and its lineage:

  • Octen/Octen-Embedding-8B — Apache 2.0
  • Qwen/Qwen3-Embedding-8B — Apache 2.0

Acknowledgements

Thanks to the Octen team for releasing the base model under a permissive licence, to the Qwen team for the underlying Qwen3-Embedding family, and to Maastricht Law & Tech Lab for the LLEQA dataset.

Downloads last month
9
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Yakuru-43/octen-8b-lleqa-merged

Finetuned
(4)
this model

Dataset used to train Yakuru-43/octen-8b-lleqa-merged