ACT Model - Standard

Action Chunking Transformer for MetaWorld Shelf-Place-v3 Task

Model Description

This is a STANDARD ACT model trained on the MetaWorld shelf-place-v3 task.

Architecture

  • Backbone: ResNet18 (ImageNet pretrained)
  • Hidden Dimension: 512
  • Feedforward Dimension: 3200
  • Encoder Layers: 4
  • Decoder Layers: 7
  • Attention Heads: 8
  • Action Chunk Size: 100
  • Query Frequency: 100

Training

  • Dataset: 50 demonstration episodes
  • Best Validation Loss: 0.1289
  • Optimizer: AdamW (lr=1e-5)
  • Loss: KL Divergence (weight=10) + L1 Action Loss

Performance

  • Training Loss: Converged properly
  • Validation Loss: 0.1289
  • Success Rate: 0% (due to data diversity issue - see notes below)

Important Notes

⚠️ Known Issue: This model achieves 0% success in evaluation despite low training loss.

Root Cause: Training data collected from fixed initial state → model learned specific scenario perfectly but cannot generalize to randomized evaluation states.

Solution: Requires diverse demonstration data with varied initial states.

Model Comparison

Model Val Loss Improvement
Standard ACT 0.1289 baseline
Modified ACT 0.0931 27.8% better

Usage

import torch
from huggingface_hub import hf_hub_download

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="aryannzzz/act-metaworld-shelf-standard",
    filename="best_model.pth"
)

# Load model
checkpoint = torch.load(checkpoint_path, weights_only=False)
# model.load_state_dict(checkpoint['model_state_dict'])

Files

  • best_model.pth: Model checkpoint (contains model_state_dict, optimizer_state_dict, and training stats)
  • norm_stats.npz: Normalization statistics (state_mean, state_std, action_mean, action_std)
  • config.json: Model configuration

Citation

Based on "Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware" (Zhao et al., RSS 2023)

@article{zhao2023learning,
  title={Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author={Zhao, Tony Z and others},
  journal={RSS},
  year={2023}
}

License

MIT

Contact

For questions or issues, please open an issue in the repository.

Downloads last month
1
Video Preview
loading