zhoubolei/scene_parse_150
Updated β’ 1.7k β’ 31
Linear segmentation probe on the canvas features of canvit/canvitb16-add-vpe-pretrain-g128px-s512px-in21k-dv3b16-2026-02-02.
uv add "canvit-pytorch @ git+https://github.com/m2b3/CanViT-PyTorch.git"
import torch
from canvit_pytorch.probes import SegmentationProbe
probe = SegmentationProbe.from_pretrained("canvit/probe-ade20k-40k-s512-c12-in21k").eval()
# [B, H, W, D] canvas features from a CanViT forward pass
features = torch.randn(1, 12, 12, 1024)
with torch.inference_mode():
logits = probe(features) # [B, num_classes, H, W]
assert logits.shape == (1, 150, 12, 12)
Architecture: LayerNorm β Dropout β BatchNorm β Conv1Γ1.
| Hyperparameter | Value |
|---|---|
| Scene size | 512 px |
| Canvas grid | 12 Γ 12 |
| Glimpse size | 128 px |
| Timesteps (T) | 10 |
| Training policy | R-IID |
| Optimizer | AdamW |
| Peak LR | |
| Weight decay | |
| LR schedule | 1,500-step warmup β cosine decay |
| Batch size | 16 |
| Max steps | 40,000 |
| Dropout | 0.1 |
| Augmentation | RandomResizedCrop scale [0.5, 2] + HFlip |
| Precision | bf16 (AMP) |