How to Fine-Tune Vision Layers with LoRA?

#10

by JulioSnchezD - opened Jun 7, 2025

Jun 7, 2025

I'm trying to fine-tune only the vision layers of the model using LoRA, but encountering issues where the model doesn't learn (evaluation loss remains constant). Has anyone successfully implemented this?

What I've Tried:
LoRA configuration targeting vision projection layers (_proj layers in vision encoder)
Various learning rates (from 1e-5 to 5e-3)
Verified vision layers are trainable (requires_grad=True)
Different batch sizes and gradient accumulation steps

Specific Issues:
The loss doesn't decrease when only vision layers are tuned
Language layers fine-tune normally when targeted

Any advice or working examples would be greatly appreciated!

1x1ng

Aug 24, 2025

Granite 显然在 forward 里强行把视觉特征 detach 了，所以即使我把 get_image_features 打开梯度、甚至把预算的特征传回去，依然会被切断。
在这种实现里，直接训练视觉塔是不可能的（除非重写它的 forward 源码）

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment