MSAE FashionCLIP (4096)

This is a Matryoshka Sparse Autoencoder (MSAE) trained on CLIP image embeddings from the KAGL dataset.

Model details

  • Input dim: 512 (CLIP ViT-B/32 image embeddings)
  • Hidden dim: 4096
  • Sparsity: Matryoshka
  • Training dataset size: 39,990 images
  • Training objective: reconstruction + sparsity
  • Two checkpoints, one at 30 epochs and another at 100.

Usage

This model is intended for interpretability and feature analysis of CLIP embeddings.

Citation

The matryoshka SAE architecture comes from the following paper: https://arxiv.org/abs/2502.20578

Code from this paper's repository was used for the training.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alloryne/msae-clip-fashion-4096-topk64

Finetuned
(4)
this model

Dataset used to train Alloryne/msae-clip-fashion-4096-topk64

Paper for Alloryne/msae-clip-fashion-4096-topk64