jina-embeddings-v5-text-small-retrieval-GGUF

jina-embeddings-v5-text-small-retrieval-GGUF

GGUF quantizations of jina-embeddings-v5-text-small-retrieval using llama.cpp. A 677M parameter multilingual embedding model quantized for efficient inference.

Elastic Inference Service | ArXiv | Blog

We highly recommend to first read this blog post for more technical details and customized llama.cpp build.

Overview

jina-embeddings-v5-text Architecture

jina-embeddings-v5-text-small-retrieval is a task-specific embedding model for retrieval, part of the jina-embeddings-v5-text model family.

Feature	Value
Parameters	677M
Task	`retrieval`
Embedding Dimension	1024
Matryoshka Dimensions	32, 64, 128, 256, 512, 768, 1024
Pooling Strategy	Last-token pooling
Base Model	jina-embeddings-v5-text-small

MMTEB Multilingual Benchmark

MTEB English Benchmark

Retrieval Benchmark Results

Usage with llama.cpp

via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT _inference/text_embedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-small"
  }
}

See the Elastic Inference Service documentation for setup details.

# Build llama.cpp (upstream)
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release

# Run embedding
./build/bin/llama-embedding -m jina-embeddings-v5-text-small-retrieval-Q8_0.gguf \
  --pooling last -p "Your text here"

License

CC-BY-NC-4.0. For commercial use, please contact us.

Downloads last month: 5,731

GGUF

Model size

0.6B params

Architecture

qwen3

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for jinaai/jina-embeddings-v5-text-small-retrieval-GGUF

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

jinaai/jina-embeddings-v5-text-small

Quantized

(8)

this model

Collection including jinaai/jina-embeddings-v5-text-small-retrieval-GGUF

jina-embeddings-v5-text

Collection

Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. • 29 items • Updated about 5 hours ago • 31

Paper for jinaai/jina-embeddings-v5-text-small-retrieval-GGUF

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Paper • 2602.15547 • Published 10 days ago • 23