YOLOv4-tiny-416 INT8 (ONNX, MIT)

Post-training INT8 quantization of YOLOv4-tiny (Bochkovskiy et al., 2020), exported to ONNX QOperator format. Calibrated on 1,000 COCO val2017 images.

Files

File Size SHA-256
yolov4-tiny-416_float.onnx 24,230,209 B eea691d460fd3eb5c1a250b4e5f822784cd44e11aaa77a24299b0952b9f4fc9f
yolov4-tiny-416_int8_qop.onnx 6,113,440 B c30c8f0a33b3a0edc13a2ca21726a288228e1448b3c38940f9da0c7d8cee4760

Architecture

Layers 21 Conv2D, 3 MaxPool, 1 Upsample, 11 Route, 2 YOLO heads
Activation LeakyReLU (Ξ± = 0.1) on 19/21 convs; remaining 2 are linear (pre-head)
Input 1Γ—3Γ—416Γ—416, RGB, [0, 1], NCHW, letterbox-padded with 114
Output 2 raw conv tensors at strides 16 and 32 (decoder external)
Anchors (10,14), (23,27), (37,58), (81,82), (135,169), (344,319)
Quantization Per-tensor INT8 (W symmetric, A asymmetric); bias INT32

Performance

Metric FP32 INT8 Reference (AlexeyAB)
AP @ IoU=0.5:0.95 0.2076 0.1629 0.217
AP @ IoU=0.5 0.3914 0.3573 0.402
AP_small 0.070 0.054 β€”
AP_medium 0.239 0.190 β€”
AP_large 0.325 0.254 β€”
Size 23.11 MiB 5.83 MiB β€”

The INT8 mAP drop (~4.5 AP) is larger than for full YOLOv4-Leaky because the tiny architecture has only 21 conv layers and per-tensor symmetric quantization has limited headroom on such a small network. The trade-off is a 4Γ— compression (23 β†’ 5.8 MiB) and a much smaller compute footprint, suitable for edge / FPGA deployments.

Evaluation protocol

Dataset MS COCO val2017 (5,000 images, 36,781 annotated objects, 80 classes)
Annotations instances_val2017.json from annotations_trainval2017.zip (CC BY 4.0)
Tool pycocotools.cocoeval.COCOeval (bbox IoU type)
Score threshold 0.001 (low to populate the PR curve correctly)
NMS greedy, per-class, IoU threshold 0.45
Detections per image top-100 (matches params.maxDets[2])
Image preprocessing letterbox to 416Γ—416, padding value 114, RGB, [0, 1], NCHW

Calibration protocol (for the INT8 model)

Dataset MS COCO val2017 (1,000 images sampled)
Sampling uniform random with random.Random(42).sample(...) (deterministic)
Preprocessing identical to evaluation (letterbox 416, padding 114, RGB, /255, NCHW)
Quantizer onnxruntime.quantization.quantize_static (MIT)

Visual comparison (FP32 vs INT8)

Side-by-side detections on COCO val2017 / classic darknet test images. Left: FP32 ONNX. Right: INT8 ONNX (same input, same Python decoder).

dog bus
traffic market
parking kitchen
skaters dining

Reproducibility

python quantize_float_to_int8.py
python inference.py --onnx yolov4-tiny-416_int8_qop.onnx

The quantization script produces a bit-similar INT8 model from yolov4-tiny-416_float.onnx. Differences in calibration sampling order may shift activation scales by a few LSBs.

Provenance

AlexeyAB/darknet  yolov4-tiny.weights        public domain (YOLO License v2)
        β”‚
        β”‚  parse_config + load_weights from gwinndr/YOLOv4-Pytorch (MIT, used as tool)
        β”‚  + DarknetRaw wrapper to capture pre-YoloLayer outputs
        β–Ό
yolov4-tiny-416_float.onnx                    MIT (this repository)
        β”‚
        β”‚  onnxruntime.quantize_static (MIT, used as tool)
        β”‚  + COCO val2017 calibration (CC BY 4.0, 1,000 images)
        β–Ό
yolov4-tiny-416_int8_qop.onnx                 MIT (this repository)

No Vitis-AI nor Apache-2.0 components are bundled. Tools (PyTorch, ONNX Runtime, gwinndr) are used to produce the artifacts but not redistributed. See NOTICE.md for full attribution.

Citation

@article{bochkovskiy2020yolov4,
  author  = {Bochkovskiy, Alexey and Wang, Chien-Yao and Liao, Hong-Yuan Mark},
  title   = {YOLOv4: Optimal Speed and Accuracy of Object Detection},
  journal = {arXiv:2004.10934},
  year    = {2020}
}

Author of the INT8 derivative: Pablo Mendoza (@thefalley), 2026.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for Thefalley/yolov4-tiny-416-int8-qop