Yakuru-43/octen-8b-lleqa-merged

A LoRA-fine-tuned version of Octen/Octen-Embedding-8B for retrieving Belgian statutory articles in response to natural-language legal questions in French.

This is the merged full model.

Model description

Base model: Octen/Octen-Embedding-8B (which is itself a fine-tune of Qwen/Qwen3-Embedding-8B)
Training method: LoRA fine-tuning over QLoRA (NF4) base
Training data: LLEQA — Long-form Legal Question Answering on Belgian law
Languages: French (Belgian legal French)
Embedding dimension: 4096 (unchanged from base)
Context length: up to 32,768 tokens (unchanged from base)
Pooling: last-token, left-padded (matches Qwen3-Embedding family)

Modifications from the base

This artefact only modifies the model in the following ways:

LoRA adapters (rank 16, alpha 32, dropout 0.1) are inserted on attention projections (q_proj, k_proj, v_proj, o_proj) and MLP projections (gate_proj, up_proj, down_proj) of every transformer block.
The adapters are trained on LLEQA's train split with a multi-positive InfoNCE objective and mixed BM25 + dense hard negatives.

The base weights are unchanged — they are merged into this checkpoint after training.

Intended use

Retrieving relevant Belgian statutory articles given a French-language legal question (RAG over a corpus of Belgian law).
Tested on LLEQA test split (Belgian law). Generalization to French metropolitan law is partial — see Limitations.

Usage

Option 1: sentence-transformers (recommended, simplest)

The repo includes the sentence-transformers config files (last-token pooling, L2 normalization, query prompt), so it loads correctly out of the box:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Yakuru-43/octen-8b-lleqa-merged")
# Queries get the instruction prefix; documents do not.
queries  = model.encode(["Quelle est la durée du préavis pour un bail étudiant ?"], prompt_name="query")
articles = model.encode(["Le bail étudiant peut être résilié moyennant un préavis ..."])
print(model.similarity(queries, articles))

Option 2: plain transformers (full control)

from transformers import AutoModel, AutoTokenizer
import torch.nn.functional as F
import torch

tokenizer = AutoTokenizer.from_pretrained("Yakuru-43/octen-8b-lleqa-merged", padding_side="left")
model = AutoModel.from_pretrained("Yakuru-43/octen-8b-lleqa-merged", torch_dtype=torch.bfloat16).cuda()
model.eval()

INSTRUCTION = (
    "Instruct: Retrieve Belgian statutory articles that answer the user's "
    "legal question.\nQuery: "
)

def encode(texts, is_query: bool):
    prefixed = [INSTRUCTION + t for t in texts] if is_query else texts
    enc = tokenizer(prefixed, padding=True, truncation=True,
                    max_length=1024, return_tensors="pt").to("cuda")
    with torch.no_grad():
        h = model(**enc, use_cache=False).last_hidden_state
    # Last-token pooling with left padding
    emb = h[:, -1]
    return F.normalize(emb.float(), p=2, dim=-1)

questions = encode(["Quelle est la durée du préavis pour un bail étudiant ?"], is_query=True)
articles  = encode(["Le bail étudiant peut être résilié moyennant un préavis ..."], is_query=False)
print((questions @ articles.T).cpu())

Important: instruction prefix

This model inherits the Qwen3-Embedding instruction-aware behaviour. Queries must be prefixed with the training-time instruction (see code above); documents are encoded raw. Mismatching at inference will degrade retrieval substantially.

A note on the upstream Qwen3-Embedding issue flagged on the base Octen card: when encoding documents without any prefix, prepend "- " (dash + space) to avoid unexpected behaviour. We did not apply this prefix during training, but you may wish to evaluate it for your inference pipeline.

Evaluation

Compared on the LLEQA test split against the base Octen/Octen-Embedding-8B zero-shot.

All values reported on the LLEQA test split, in percent. Δ is fine-tuned − base. 95% CI from a paired bootstrap on per-query differences (10 000 resamples). ★ marks differences whose CI excludes zero (unlikely to be noise on this test split).

Metric	Base	Fine-tuned	Δ	95% CI	Sig.
recall@1	17.61	18.46	+0.85	[-4.44, +6.15]
recall@5	41.41	45.29	+3.88	[-3.08, +10.80]
recall@10	52.33	59.06	+6.73	[-0.37, +13.78]
recall@50	68.31	79.07	+10.76	[+4.53, +16.81]	★
recall@100	75.21	84.79	+9.57	[+4.27, +14.96]	★
recall@500	85.26	92.98	+7.72	[+3.88, +11.72]	★
mrr@10	34.59	39.09	+4.49	[-1.50, +10.63]
ndcg@10	35.95	40.35	+4.40	[-0.68, +9.75]
map	29.95	33.29	+3.34	[-1.65, +8.39]

Limitations

Language scope: trained only on Belgian-French legal text. Questions in other languages or about non-Belgian jurisdictions may underperform.
Domain bias: training data mentions Belgian-specific institutions (Moniteur belge, regional governments, "kot", etc.). Generalization to Code civil français or Code du travail français is expected to be partial; verify on a held-out French set before relying on it.
Small training set: LLEQA train split is ~1.4k questions. Some over-fitting to its question style is likely.
No additional safety alignment beyond the base.

Training details

Hyperparameter	Value
Base model	`Octen/Octen-Embedding-8B`
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.1
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Loss	Multi-positive InfoNCE
Hard negatives	7 / question (mixed BM25 + dense Octen-zero-shot)
Effective batch size	16 questions × ~16 negs
Learning rate	2e-4, cosine, 10% warmup
Epochs	3
Precision	NF4 4-bit base + bf16 LoRA
GPUs	2× RTX 4500 Ada

License

This work is licensed under the Apache License 2.0, the same as the base Octen/Octen-Embedding-8B and the underlying Qwen/Qwen3-Embedding-8B. See the LICENSE file in this repository.

Citation

This work fine-tunes Octen-Embedding-8B. If you use it, please cite both the original Octen work and the LLEQA dataset:

@misc{octen2025rteb,
  title  = {Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard},
  author = {Octen Team},
  year   = {2025},
  url    = {https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/}
}

@inproceedings{louis2024interpretable,
  title     = {Interpretable Long-Form Legal Question Answering with
                Retrieval-Augmented Large Language Models},
  author    = {Louis, Antoine and van Dijck, Gijs and Spanakis, Gerasimos},
  booktitle = {Proceedings of AAAI},
  year      = {2024}
}

The base model and its lineage:

Octen/Octen-Embedding-8B — Apache 2.0
Qwen/Qwen3-Embedding-8B — Apache 2.0

Acknowledgements

Thanks to the Octen team for releasing the base model under a permissive licence, to the Qwen team for the underlying Qwen3-Embedding family, and to Maastricht Law & Tech Lab for the LLEQA dataset.

Downloads last month: 9

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Yakuru-43/octen-8b-lleqa-merged

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Embedding-8B

Finetuned

Octen/Octen-Embedding-8B

Finetuned

(4)

this model

Yakuru-43
/

octen-8b-lleqa-merged

Yakuru-43/octen-8b-lleqa-merged

Model description

Modifications from the base

Intended use

Usage

Option 1: sentence-transformers (recommended, simplest)

Option 2: plain transformers (full control)

Important: instruction prefix

Evaluation

Limitations

Training details

License

Citation

Acknowledgements

Model tree for Yakuru-43/octen-8b-lleqa-merged

Dataset used to train Yakuru-43/octen-8b-lleqa-merged