Micro-SAE for LLaVA 1.5 7B Vision Tower
A Sparse Autoencoder trained on CLIP ViT-L/14 patch activations (Layer 23)
extracted from llava-hf/llava-1.5-7b-hf.
Architecture
- Input dim: 1024 (CLIP ViT-L/14 hidden size)
- Dictionary size: 4096 (4× expansion)
- Activation: ReLU + L1
Usage
import torch
import json
# Load config
with open("config.json") as f:
config = json.load(f)
# Recreate and load SAE
sae = SparseAutoencoder(config["input_dim"], config["dict_size"])
sae.load_state_dict(torch.load("micro_sae_1024d.pt"))
Training Data
- 20,000 COCO images
- 500 zebra images (concept-specific)
- 500 fire-truck images (concept-specific)
- Downloads last month
- 33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support