Papers
arxiv:2602.06964

Learning a Generative Meta-Model of LLM Activations

Published on Feb 6
· Submitted by
Grace Luo
on Feb 9
Authors:
,
,
,

Abstract

Training diffusion models on neural network activations creates meta-models that learn internal state distributions and improve intervention fidelity without restrictive structural assumptions.

AI-generated summary

Existing approaches for analyzing neural network activations, such as PCA and sparse autoencoders, rely on strong structural assumptions. Generative models offer an alternative: they can uncover structure without such assumptions and act as priors that improve intervention fidelity. We explore this direction by training diffusion models on one billion residual stream activations, creating "meta-models" that learn the distribution of a network's internal states. We find that diffusion loss decreases smoothly with compute and reliably predicts downstream utility. In particular, applying the meta-model's learned prior to steering interventions improves fluency, with larger gains as loss decreases. Moreover, the meta-model's neurons increasingly isolate concepts into individual units, with sparse probing scores that scale as loss decreases. These results suggest generative meta-models offer a scalable path toward interpretability without restrictive structural assumptions. Project page: https://generative-latent-prior.github.io.

Community

Paper author Paper submitter

Check out our codebase; everything’s ready to go! The code even runs on Nvidia RTX 4090s.

Page: http://generative-latent-prior.github.io
Code: https://github.com/g-luo/generative_latent_prior
Paper: https://arxiv.org/abs/2602.06964

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 6

Browse 6 models citing this paper

Datasets citing this paper 5

Browse 5 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.06964 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.