Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Q-RAG is a resource-efficient method for multi-step retrieval trained with reinforcement learning directly in the latent space of text-chunk embeddings. Instead of expensive LLM fine-tuning, Q-RAG trains only a lightweight embedder agent using value-based RL (temporal difference learning), keeping the LLM frozen.

This model was presented at ICLR 2026 (Oral).

Paper: Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Repository: https://github.com/griver/Q-RAG

Overview

Retrieval-Augmented Generation (RAG) methods enhance LLM performance by filtering relevant context, but most focus on single-step retrieval. Q-RAG proposes fine-tuning the embedder model for multi-step retrieval using reinforcement learning.

Q-RAG achieves state-of-the-art results on long-context benchmarks (BabiLong, RULER) for contexts up to 10M tokens and competitive performance on open-domain multi-hop QA (HotpotQA, Musique) — all trained on a single A100 GPU.

Performance Highlights

RULER benchmark: Achieves near-perfect retrieval on all NIAH subtasks, generalizing up to 1M tokens.
BabiLong benchmark: Achieves the highest average performance across tasks at context lengths from 1M to 10M tokens. On the hardest subtask (QA3), it shows virtually no degradation as context grows to 10M tokens.

Citation

If you find Q-RAG useful, please cite:

@inproceedings{sorokin2026qrag,
  title     = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
  author    = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
  booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
  year      = {2026}
}

@article{sorokin2025qrag,
  title   = {{Q-RAG}: Long Context Multi-Step Retrieval via Value-Based Embedder Training},
  author  = {Sorokin, Artyom asnd Buzun, Nazar and Anokhin, Alexander and Inozemcev, Oleg and Vedernikov, Egor and Anokhin, Petr and Burtsev, Mikhail and Trushkov, Alexey and Yin, Wenshuai and Burnaev, Evgeny},
  journal = {arXiv preprint arXiv:2511.07328},
  year    = {2025}
}

Downloads last month: 6

Inference Providers NEW

Text Retrieval

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Q-RAG/qrag-ft-gte-on-hotpotqa_musique

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Paper • 2511.07328 • Published 23 days ago • 16