Text Classification
sentence-transformers
Safetensors
Transformers
English
multilingual
qwen3
text-generation
mteb
qwen
feature-extraction
text-clustering
text-retrieval
text-reranking
text-pair-classification
text-multilabel-classification
text-bitext-mining
text-embeddings-inference
Instructions to use Mira190/Euler-Legal-Embedding-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Mira190/Euler-Legal-Embedding-V1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Mira190/Euler-Legal-Embedding-V1") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Mira190/Euler-Legal-Embedding-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Mira190/Euler-Legal-Embedding-V1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Mira190/Euler-Legal-Embedding-V1") model = AutoModelForCausalLM.from_pretrained("Mira190/Euler-Legal-Embedding-V1") - Notebooks
- Google Colab
- Kaggle
Euler-Legal-Embedding-V1
Short Description
Euler-Legal-Embedding-V1 is a specialized embedding model for the legal domain, fine-tuned on Qwen/Qwen3-Embedding-8B. It achieves strong performance on legal retrieval and reasoning tasks within the MTEB benchmark.
Model Details
- Base Model: Qwen/Qwen3-Embedding-8B
- Model Size: ~8B
- Embedding Dimension: 4096 (Default for Qwen3-8B)
- Max Input Tokens: 1536
- Pooling: Last token pooling (Standard for Qwen-Embedding)
- Training Data: Legal domain specific dataset (
final-data-new-anonymized-grok4-filtered.jsonl)
Usage
sentence-transformers support
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
You can use the model like this:
from sentence_transformers import SentenceTransformer
import torch
# Load the model
# trust_remote_code=True is required for Qwen-based models
model = SentenceTransformer(
"Mira190/Euler-Legal-Embedding-V1",
trust_remote_code=True,
model_kwargs={
"torch_dtype": torch.bfloat16,
"attn_implementation": "flash_attention_2", # Optional, requires flash-attn installed
},
)
model.max_seq_length = 1536
sentences = [
"The plaintiff filed a motion for summary judgment.",
"The court granted the motion based on lack of genuine dispute of material fact."
]
# No specific prompt is required for this version
embeddings = model.encode(
sentences,
normalize_embeddings=True,
batch_size=16,
show_progress_bar=True,
)
print(embeddings.shape)
# Output: (2, 4096)
Transformers support
You can also use the model directly with the transformers library:
import torch
from transformers import AutoModel, AutoTokenizer
model_id = "Mira190/Euler-Legal-Embedding-V1"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
sentences = ["This is a legal document.", "This is another legal document."]
# Tokenize sentences
inputs = tokenizer(
sentences,
return_tensors="pt",
padding=True,
truncation=True,
max_length=1536
)
# Move inputs to the same device as the model
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
# Last token pooling (Standard for Qwen-Embedding)
# Note: Qwen embeddings typically use the last hidden state of the last token (EOS or specific token)
embeddings = outputs.last_hidden_state[:, -1]
# Normalize embeddings
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
print(embeddings.shape)
# Output: (2, 4096)
Training Details
The model was fine-tuned using LoRA (Low-Rank Adaptation) via the Swift framework.
- Framework: Swift
- Loss Function: InfoNCE (Temperature: 0.03)
- Batch Size: 4 (per device)
- Learning Rate: 2e-5
- LoRA Config: Rank 8, Alpha 32, Dropout 0.05
Citation
If you find this model useful, please consider citing:
@misc{euler2025legal,
title={Euler-Legal-Embedding: Advanced Legal Representation Learning},
author={LawRank Team},
year={2025},
publisher={Hugging Face}
}
- Downloads last month
- 307