Garment Image Quality Scorer + Feature Extractor
1. What Was Done
Trained a dual-head model on MobileNetV3-Small that:
- Quality head: Predicts image quality score (0-1) for garment photos
- Embedding head: Produces 128-dim feature vector for garment matching
This model helps the Vestir app select the best representative image when a user uploads multiple photos of the same garment, and groups similar garments together.
2. Starting Model
- Backbone: torchvision.models.mobilenet_v3_small (ImageNet-1K pretrained, 576-dim features)
- Quality head: 576 -> 128 -> 1 (Sigmoid) - predicts [0, 1] quality score
- Embedding head: 576 -> 256 -> 128 (L2-normalized) - for similarity matching
- Total params: ~1.2M
3. Training Dataset
- Source: ashraq/fashion-product-images-small (HuggingFace)
- Synthetic quality labels: Computed from image properties (sharpness via Laplacian variance, brightness balance, contrast via std-dev)
- Augmentation: Created degraded versions (blur, darkness, noise) with lower quality scores
- Train: 3,000 samples (originals + degraded), Val: 500 samples
4. Validation / Testing
- Primary metric: Spearman rank correlation between predicted and actual quality scores
- Secondary metric: Mean Absolute Error (MAE) of quality predictions
- Spearman correlation: 0.9972
- MAE: 0.0091
5. Baseline Performance
No prior quality scoring model existed in the app. Previously used first uploaded image arbitrarily.
6. What Changed to Improve
- Learned quality assessment instead of arbitrary first-image selection
- Dual-head architecture efficiently shares backbone for both quality and similarity
- Synthetic quality labels based on measurable image properties
- Degradation augmentation teaches the model to distinguish good from bad images
7. Training Progress
| Step | Loss | LR |
|---|---|---|
| 50 | 0.0090 | 9.72e-04 |
| 500 | 0.0024 | 3.56e-06 |
| 1000 | 0.0018 | 9.45e-04 |
| 1500 | 0.0014 | 1.22e-04 |
| 2000 | 0.0012 | 7.93e-04 |
| 2317 (final) | 0.0011 | 8.11e-06 |
Model Files
- model.onnx: Full precision ONNX (4.5 MB)
- model_int8.onnx: INT8 quantized (1.3 MB) - for browser deployment
- pytorch_model.pt: PyTorch state dict
Training Details
- Hardware: NVIDIA GTX 1050 Ti (4GB VRAM)
- Optimizer: AdamW (lr=1e-3, wd=0.01)
- Loss: MSE on quality scores
- Image size: 224x224
- Batch size: 32
- Training time: ~5 minutes
- Downloads last month
- 18