Micro-SAE for LLaVA 1.5 7B Vision Tower

A Sparse Autoencoder trained on CLIP ViT-L/14 patch activations (Layer 23) extracted from llava-hf/llava-1.5-7b-hf.

Architecture

  • Input dim: 1024 (CLIP ViT-L/14 hidden size)
  • Dictionary size: 4096 (4× expansion)
  • Activation: ReLU + L1

Usage

import torch
import json

# Load config
with open("config.json") as f:
    config = json.load(f)

# Recreate and load SAE
sae = SparseAutoencoder(config["input_dim"], config["dict_size"])
sae.load_state_dict(torch.load("micro_sae_1024d.pt"))

Training Data

  • 20,000 COCO images
  • 500 zebra images (concept-specific)
  • 500 fire-truck images (concept-specific)
Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support