Use default attention implementation with option to override
#2
by
nvidia-oliver-holworthy - opened
Enables specifying attn_implementation when loading model including spda
Thank you!
nvidia-oliver-holworthy changed pull request status to
open
nvidia-oliver-holworthy changed pull request status to
closed