nvidia/llama-nemotron-colembed-vl-3b-v2 · Extend support for transformers versions from 4.45.0 to 5.2.0

Extend support for transformers versions from 4.45.0 to 5.2.0

by nvidia-oliver-holworthy - opened 14 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+172

-47

nvidia-oliver-holworthy

NVIDIA org 14 days ago

•

edited 14 days ago

This PR updates the model implementation to extend support for a wider range of transformers versions than only 4.49.0.
The model's custom code relied on internal transformers APIs that changed across versions, causing failures on versions other than 4.49.0.

_update_causal_mask removed in transformers 5.0 — The LlamaBidirectionalModel overrode _update_causal_mask(), a private method on LlamaModel that was removed in transformers 5.0 in favor of create_bidirectional_mask from
transformers.masking_utils.
hidden_states[-1] returns None on transformers >= 4.57 — _extract_embeddings accessed outputs.hidden_states[-1], but the custom LlamaBidirectionalModel.forward() never populated the hidden_states tuple, only last_hidden_state.
This worked incidentally on some versions but broke on 4.57+ where the base class internals changed.
additional_special_tokens unavailable on transformers 5.0 — The processor filtered tokenizer.additional_special_tokens, but in transformers 5.0 the tokenizer backend changed to TokenizersBackend which doesn't expose this attribute.
Missing self.post_init() call — LlamaNemotronVLModel.__init__ didn't call self.post_init(), the standard transformers finalization step. This caused AttributeError: 'LlamaNemotronVLModel' object has no attribute 'all_tied_weights_keys'
on transformers 5.0+.

Changes

modeling_llama_nemotron_vl.py

Replaced _update_causal_mask override with explicit forward() and _create_bidirectional_mask — Instead of hooking into a private method, LlamaBidirectionalModel now has its own forward() that constructs the bidirectional mask
directly, dispatching to transformers.masking_utils.create_bidirectional_mask on 5.0+ and falling back to _prepare_4d_attention_mask on older versions.
Version-portable decoder layer calls — Uses runtime introspection to detect API differences across transformers versions: past_key_value vs past_key_values parameter naming (changed in 4.56), DynamicCache constructor signature, and tuple
vs tensor return from decoder layers.
Changed _extract_embeddings to use outputs.last_hidden_state — Replaced self(**batch, output_hidden_states=True).hidden_states[-1] with self(**batch).last_hidden_state, which is always reliably populated by the custom forward().
Changed forward() return type to BaseModelOutputWithPast — The parent returned CausalLMOutputWithPast (which has logits but no last_hidden_state). Since this is an embedding model that doesn't compute logits, BaseModelOutputWithPast
is the correct output type and propagates last_hidden_state from the language model.
Added self.post_init() — Standard transformers pattern that initializes internal bookkeeping (all_tied_weights_keys, weight tying, etc.), required by transformers 5.0+.

processing_llama_nemotron_vl.py

Removed additional_special_tokens filtering — The processor filtered out <box>, </box>, <ref>, </ref> tokens from additional_special_tokens, which broke on transformers 5.0 where the attribute doesn't exist. Testing confirmed this
filtering has no effect on embedding output (zero diff with and without it), so it was removed entirely.

Test results

All versions produce zero diff against the reference (generated with transformers 4.49.0):

Version	Result
4.44.2	FAIL — `tokenizers` crate can't parse `tokenizer.json`
4.45.0	PASS (zero diff)
4.46.1	PASS (zero diff)
4.47.0	PASS (zero diff)
4.48.0	PASS (zero diff)
4.49.0	PASS (reference)
4.57.6	PASS (zero diff)
5.0.0	PASS (zero diff)
5.1.0	PASS (zero diff)
5.2.0	PASS (zero diff)

Update llama model implementation to support more transformers versions2f640db6

Update note on transformers versions in README.mdac1b8c22

nvidia-oliver-holworthy changed pull request status to open 13 days ago

nvidia-oliver-holworthy changed pull request status to merged 13 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment