Add convenience options to processor methods

#5

Ports the processor convenience options from llama-nemotron-embed-vl-1b-v2#3 to the reranker's LlamaNemotronVLRerankProcessor, making process_query_documents() more flexible for different integration scenarios.

Changes

processing_llama_nemotron_vl.py

  • Add return_tensors parameter ("pt" or "np") to process_query_documents() β€” enables numpy output for non-PyTorch pipelines (e.g. Triton)
  • Add padding and truncation parameters with per-call override β€” defaults fall back to the values set in the processor constructor
  • Add pixel_values_layout parameter:
    • "flat_tiles" (default): all image tiles concatenated into a single tensor β€” the format expected by forward()
    • "per_image": a list aligned with input documents, where each entry is a tensor or None β€” useful for batched serving where per-document tile counts are needed
  • Refactor internal image tracking from parallel lists (pil_images, max_input_tile_list, llm_onlys) to indexed dicts (pil_images_by_idx, max_input_tile_by_idx)
  • Add else clause to strip stray <image> tokens from text-only documents
  • Move image token constants (IMG_START_TOKEN, etc.) out of loop
  • Add return type hints and docstrings to process_query_documents() and process_queries_documents_crossencoder()
  • All new parameters propagate through process_queries_documents_crossencoder() via **kwargs

Backward compatibility

All new parameters have defaults that preserve the original behavior. Verified with golden score regression test β€” 0.00 max difference across 6 query-document pairs (3 text-only + 3 image).

nvidia-oliver-holworthy changed pull request status to open
nvidia-oliver-holworthy changed pull request status to merged

Sign up or log in to comment