Garment Image Quality Scorer + Feature Extractor

1. What Was Done

Trained a dual-head model on MobileNetV3-Small that:

  1. Quality head: Predicts image quality score (0-1) for garment photos
  2. Embedding head: Produces 128-dim feature vector for garment matching

This model helps the Vestir app select the best representative image when a user uploads multiple photos of the same garment, and groups similar garments together.

2. Starting Model

  • Backbone: torchvision.models.mobilenet_v3_small (ImageNet-1K pretrained, 576-dim features)
  • Quality head: 576 -> 128 -> 1 (Sigmoid) - predicts [0, 1] quality score
  • Embedding head: 576 -> 256 -> 128 (L2-normalized) - for similarity matching
  • Total params: ~1.2M

3. Training Dataset

  • Source: ashraq/fashion-product-images-small (HuggingFace)
  • Synthetic quality labels: Computed from image properties (sharpness via Laplacian variance, brightness balance, contrast via std-dev)
  • Augmentation: Created degraded versions (blur, darkness, noise) with lower quality scores
  • Train: 3,000 samples (originals + degraded), Val: 500 samples

4. Validation / Testing

  • Primary metric: Spearman rank correlation between predicted and actual quality scores
  • Secondary metric: Mean Absolute Error (MAE) of quality predictions
  • Spearman correlation: 0.9972
  • MAE: 0.0091

5. Baseline Performance

No prior quality scoring model existed in the app. Previously used first uploaded image arbitrarily.

6. What Changed to Improve

  • Learned quality assessment instead of arbitrary first-image selection
  • Dual-head architecture efficiently shares backbone for both quality and similarity
  • Synthetic quality labels based on measurable image properties
  • Degradation augmentation teaches the model to distinguish good from bad images

7. Training Progress

Step Loss LR
50 0.0090 9.72e-04
500 0.0024 3.56e-06
1000 0.0018 9.45e-04
1500 0.0014 1.22e-04
2000 0.0012 7.93e-04
2317 (final) 0.0011 8.11e-06

Model Files

  • model.onnx: Full precision ONNX (4.5 MB)
  • model_int8.onnx: INT8 quantized (1.3 MB) - for browser deployment
  • pytorch_model.pt: PyTorch state dict

Training Details

  • Hardware: NVIDIA GTX 1050 Ti (4GB VRAM)
  • Optimizer: AdamW (lr=1e-3, wd=0.01)
  • Loss: MSE on quality scores
  • Image size: 224x224
  • Batch size: 32
  • Training time: ~5 minutes
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support