---
language:
- en
license: mit
library_name: transformers
tags:
- reranking
- information-retrieval
- pointwise
- ranknet
- llama
base_model: meta-llama/Llama-3.1-8B
datasets:
- Tevatron/msmarco-passage
- abdoelsayed/DeAR-COT
pipeline_tag: text-classification
---

# DeAR-8B-Reranker-RankNet-v1

## Model Description

**DeAR-8B-Reranker-RankNet-v1** is an 8B parameter neural reranker trained with RankNet loss and knowledge distillation. This model is part of the [DeAR framework](https://github.com/DataScienceUIBK/DeAR-Reranking) family and achieves strong performance on standard IR benchmarks while being significantly faster than larger teacher models.

## Model Details

- **Model Type:** Pointwise Reranker (Sequence Classification)
- **Base Model:** LLaMA-3.1-8B
- **Parameters:** 8 billion
- **Training Method:** Knowledge Distillation + RankNet Loss
- **Teacher Model:** [LLaMA2-13B-RankLLaMA](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)
- **Training Data:** MS MARCO
- **Precision:** BFloat16

## Key Features

✅ **High Performance:** Competitive with 13B teacher on BEIR benchmarks  
✅ **Fast Inference:** 2.2s average latency on standard GPU  
✅ **Memory Efficient:** Fits on single 24GB GPU  
✅ **Knowledge Distillation:** Enhanced with Chain-of-Thought reasoning  

## Performance

| Benchmark | NDCG@10 |
|-----------|---------|
| TREC DL19 | 74.5 |
| TREC DL20 | 72.8 |
| BEIR (Avg) | 45.2 |
| MS MARCO Dev | 68.9 |

## Usage

### Quick Start

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model
model_path = "abdoelsayed/dear-8b-reranker-ranknet-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16
)
model.eval().cuda()

# Score a query-document pair
query = "What is machine learning?"
document = "Machine learning is a subset of artificial intelligence..."

inputs = tokenizer(
    f"query: {query}",
    f"document: {document}",
    return_tensors="pt",
    truncation=True,
    max_length=228,  # q_max_len(32) + p_max_len(196)
    padding="max_length"
)
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    score = model(**inputs).logits.squeeze().item()
    
print(f"Relevance score: {score}")
```

### Batch Reranking

```python
def rerank_documents(query, documents, model, tokenizer, batch_size=64):
    """
    Rerank a list of documents for a query.
    
    Args:
        query: Search query string
        documents: List of (title, text) tuples
        model: Loaded reranker model
        tokenizer: Loaded tokenizer
        batch_size: Batch size for inference
    
    Returns:
        List of (index, score) tuples sorted by relevance
    """
    scores = []
    
    for i in range(0, len(documents), batch_size):
        batch_docs = documents[i:i + batch_size]
        
        # Prepare inputs
        queries = [f"query: {query}"] * len(batch_docs)
        docs = [f"document: {title} {text}" for title, text in batch_docs]
        
        inputs = tokenizer(
            queries,
            docs,
            return_tensors="pt",
            truncation=True,
            max_length=228,
            padding=True
        )
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
        
        # Get scores
        with torch.no_grad():
            logits = model(**inputs).logits.squeeze(-1)
            scores.extend(logits.cpu().tolist())
    
    # Sort by score (descending)
    ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
    return ranked


# Example usage
query = "When was the Eiffel Tower built?"
documents = [
    ("Eiffel Tower", "The Eiffel Tower was built in 1889 for the World's Fair."),
    ("Paris", "Paris is the capital of France."),
    ("Architecture", "Modern architecture has evolved significantly."),
]

ranking = rerank_documents(query, documents, model, tokenizer)
print(ranking)
# Output: [(0, 8.23), (1, 2.45), (2, -1.87)]
```

## Training Details

### Training Data
- **Primary Dataset:** MS MARCO Passage Ranking


### Hardware
- **GPUs:** 4x NVIDIA A100 (40GB)
- **Training Time:** ~36 hours
- **DeepSpeed:** ZeRO Stage 2

### Loss Function

**RankNet Loss** with Knowledge Distillation:

```
L_total = (1 - α) * L_RankNet + α * L_KD

where:
- L_RankNet: Pairwise ranking loss
- L_KD: KL divergence with teacher (temperature=2)
- α: 0.1 (distillation weight)
```

## Evaluation Results

### TREC Deep Learning

| Dataset | NDCG@10 | NDCG@20 | MAP |
|---------|---------|---------|-----|
| DL19 | 74.50 | 70.23 | 45.67 |
| DL20 | 72.80 | 69.15 | 43.21 |

### BEIR Benchmark

| Dataset | NDCG@10 |
|---------|---------|
| MS MARCO | 68.9 |
| NQ | 52.3 |
| HotpotQA | 61.8 |
| FiQA | 47.2 |
| ArguAna | 59.4 |
| SciFact | 73.6 |
| TREC-COVID | 85.2 |
| NFCorpus | 39.8 |

### Efficiency

| Metric | Value |
|--------|-------|
| Inference Time (100 docs) | 2.2s |
| GPU Memory (inference) | 18GB |
| Throughput | ~45 docs/sec |

## Comparison with Other Models

| Model | Size | TREC DL19 | BEIR Avg | Inference (s) |
|-------|------|-----------|----------|---------------|
| MonoT5-3B | 3B | 71.8 | 43.5 | 3.5 |
| **DeAR-P-8B-RL** | 8B | **74.5** | **45.2** | **2.2** |
| Teacher (13B) | 13B | 73.8 | 44.8 | 5.8 |

## Model Architecture

```
Input: "query: [Q] [SEP] document: [D]"
    ↓
LLaMA-3.1-8B Encoder
    ↓
[CLS] Token Representation
    ↓
Linear Classification Head
    ↓
Relevance Score (scalar)
```

## Limitations

- **Domain Adaptation:** Trained primarily on MS MARCO; may require fine-tuning for specialized domains
- **Query Length:** Optimized for queries up to 32 tokens
- **Document Length:** Truncated to 196 tokens; longer documents lose information
- **Language:** English only
- **Numerical Reasoning:** Limited capability for queries requiring calculations

## Bias and Fairness

This model inherits biases present in:
- Base LLaMA-3.1-8B model
- MS MARCO training data
- Teacher model annotations

Users should evaluate fairness for their specific use cases.

## Ethical Considerations

- **Search Ranking:** Can influence information access and visibility
- **Training Data:** May contain biased or sensitive content
- **Misuse Potential:** Should not be used for surveillance or discriminatory ranking

## Related Models

**DeAR Family:**
- [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1) - Binary Cross-Entropy variant
- [DeAR-8B-Listwise](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-v1) - Listwise reranking
- [DeAR-8B-RankNet-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-lora-v1) - LoRA adapter

**Teacher:**
- [LLaMA2-13B-RankLLaMA-Teacher](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)

**Dataset:**
- [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## Contact

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)