Use default attention implementation with option to override

#2

Enables specifying attn_implementation when loading model including spda

Thank you!

nvidia-oliver-holworthy changed pull request status to open

Merged this functionality to configure attn_implementation in #4

nvidia-oliver-holworthy changed pull request status to closed

Sign up or log in to comment