Add convenience options to processor methods
#5
by
nvidia-oliver-holworthy - opened
Ports the processor convenience options from llama-nemotron-embed-vl-1b-v2#3 to the reranker's LlamaNemotronVLRerankProcessor, making process_query_documents() more flexible for different integration scenarios.
Changes
processing_llama_nemotron_vl.py
- Add
return_tensorsparameter ("pt"or"np") toprocess_query_documents()β enables numpy output for non-PyTorch pipelines (e.g. Triton) - Add
paddingandtruncationparameters with per-call override β defaults fall back to the values set in the processor constructor - Add
pixel_values_layoutparameter:"flat_tiles"(default): all image tiles concatenated into a single tensor β the format expected byforward()"per_image": a list aligned with input documents, where each entry is a tensor or None β useful for batched serving where per-document tile counts are needed
- Refactor internal image tracking from parallel lists (
pil_images,max_input_tile_list,llm_onlys) to indexed dicts (pil_images_by_idx,max_input_tile_by_idx) - Add
elseclause to strip stray<image>tokens from text-only documents - Move image token constants (
IMG_START_TOKEN, etc.) out of loop - Add return type hints and docstrings to
process_query_documents()andprocess_queries_documents_crossencoder() - All new parameters propagate through
process_queries_documents_crossencoder()via**kwargs
Backward compatibility
All new parameters have defaults that preserve the original behavior. Verified with golden score regression test β 0.00 max difference across 6 query-document pairs (3 text-only + 3 image).
nvidia-oliver-holworthy changed pull request status to
open
nvidia-oliver-holworthy changed pull request status to
merged