nvidia/llama-nemotron-embed-vl-1b-v2 · Use default attention implementation with option to override

Use default attention implementation with option to override

by nvidia-oliver-holworthy - opened Jan 7

←

NVIDIA org Jan 7

•

Enables specifying attn_implementation when loading model including spda

Jan 20

Thank you!

nvidia-oliver-holworthy changed pull request status to open 11 days ago

NVIDIA org 11 days ago

Merged this functionality to configure attn_implementation in #4

nvidia-oliver-holworthy changed pull request status to closed 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment