Understanding Multi-View Transformers โ€” DUSt3R 512 Pointmap Probes

This model provides pretrained probes for analyzing the internal representations of multi-view transformers, specifically DUSt3R (final checkpoint trained at resolution 512 with a DPT output head).
The probes decode 3D pointmaps from intermediate transformer features, enabling layer-wise study of geometric reasoning.

This work accompanies the paper:

Understanding Multi-View Transformers
ICCV 2025 E2E3D Workshop


Model Description

  • Backbone: DUSt3R (ViT-Large, frozen)
  • Probe type: 5-layer MLP, one per probed transformer layer
  • Task: Decode per-pixel 3D pointmaps from transformer features
  • Input: Two RGB images (B, 3, H, W) normalized to [-1, 1]
  • Output: One prediction per probed transformer layer
    • pts3d: (B, H, W, 3) 3D pointmap
    • conf: (B, H, W) confidence map

Usage

import requests
from PIL import Image
import torchvision.transforms as T
from src.models.probes import PointmapProbes

model, probes = PointmapProbes.load_backbone_and_probe(
    "jgaubil/und3rstand-dust3r-512-dpt"
)
model.eval()
probes.eval()

view1_path = "https://raw.githubusercontent.com/JulienGaubil/und3rstand/main/assets/samples/example_view1.jpg"
view2_path = "https://raw.githubusercontent.com/JulienGaubil/und3rstand/main/assets/samples/example_view2.jpg"
transform = T.Compose([
    T.Resize(512),
    T.CenterCrop(512),
    T.ToTensor(),
    T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])
view1_images = transform(
    Image.open(requests.get(view1_path, stream=True).raw).convert("RGB")
).unsqueeze(0)
view2_images = transform(
    Image.open(requests.get(view2_path, stream=True).raw).convert("RGB")
).unsqueeze(0)

feat_list = model(view1_images, view2_images)
outputs = probes(feat_list)

for layer_id, (pred1, pred2) in zip(model.probed_layers.layer_ids, outputs):
    print(f"{layer_id}: pts3d={pred1['pts3d'].shape}, conf={pred1['conf'].shape}")

Citation

@inproceedings{stary2025understanding,
  title={{Understanding Multi-View Transformers}},
  author={Star{\'y}, Michal and Gaubil, Julien and Tewari, Ayush and Sitzmann, Vincent},
  booktitle={ICCV 2025 E2E3D Workshop},
  year={2025}
}
Downloads last month
101
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jgaubil/und3rstand-dust3r-512-dpt

Finetuned
(2)
this model

Paper for jgaubil/und3rstand-dust3r-512-dpt