π Real-ESRGAN 128x128 for 4x Single Image Super-Resolution on AMD AI PC NPU
The Real Enhanced Super-Resolution Generative Adversarial Networks (Real-ESRGAN) model is an AI model that takes an input low-resolution image and creates high or "super-resolution" image. This version of the Real-ESRGAN model has been re-trained from scratch with reduced feature channels and fewer stacked blocks for improved efficiency. It was then quantized from FP32 to INT8 and optimized to run on the AMD AI PC NPU. This model takes an input image and scales it up by 4x.
The "128x128" in the title means that this model works on tile sizes 128x128, but almost any size input image can be upscaled by 4x. The inference pipeline tiles the input image into patches based on the ONNX modelβs expected input resolution (with overlap), runs inference on each tile, and then stitches the results back together. A model with a larger tile size would lower the stitching overhead and may contain less boundary artifacts. Figure 1 shows an example of a 4x scaled image.
Figure 1: Input 320x480 scaled up by 4x to 1280x1920 with Real-ESRGAN model running on AMD AI PC NPU. Source: EDSR Benchmark dataset (edsr_benchmark\B100\HR\108005.png).
The original model and architecture can be found on GitHub: xinntao/Real-ESRGAN.
Wang et al. (2018) introduced ESRGAN with the "Residual-in-Residual Dense Block (RRDB), without batch normalization as the basic network building unit." Figure 2 shows their original ESRGAN model architecture and Figure 3 shows their updated Real-ESRGAN (Wang et al., 2021) architecture.
Figure 2: ESRGAN architecture with Residual in Residual Dense Block (RRDB) and removed batched normalization (BN) layers. Image from Fig. 4 of Wang et al. (2018).
Figure 3: Real-ESRGAN architecture "adopts the same generator network as that in ESRGAN. For the scale factor of and , it first employs a pixel-unshuffle operation to reduce spatial size and re-arrange information to the channel dimension." Image from Fig. 4 of Wang et al. (2021).
| Model Details | Description |
|---|---|
| Person or organization developing model | Yixuan Liu (AMD), Hongwei Qin (AMD), Benjamin Consolvo (AMD) |
| Model date | January 2026 |
| Model version | 1 |
| Model type | Super-Resolution (Image-to-Image) |
| Information about training algorithms, parameters, fairness constraints or other applied approaches, and features | This version of the Real-ESRGAN model has been re-trained from scratch with reduced feature channels and fewer stacked blocks for improved efficiency. |
| License | Apache 2.0 |
| Where to send questions or comments about the model | Community Tab and AMD Developer Community Discord |
β‘ Intended Use
| Intended Use | Description |
|---|---|
| Primary intended uses | The model can be used to create high-resolution images from low-resolution images. The model has been converted to ONNX format and quantized for optimized performance on AMD AI PC NPUs. |
| Primary intended users | Anyone using or evaluating super-resolution models on AMD AI PCs. |
| Out-of-scope uses | This model is not intended for generating misinformation or disinformation, impersonating others, facilitating or inciting harassment or violence, any use that could lead to the violation of a human right. |
How to Use
π Hardware Prerequisites
Before getting started, make sure you meet the minimum hardware and OS requirements:
| Series | Codename | Abbreviation | Launch Year | Windows 11 | Linux |
|---|---|---|---|---|---|
| Ryzen AI Max PRO 300 Series | Strix Halo | STX | 2025 | βοΈ | |
| Ryzen AI PRO 300 Series | Strix Point / Krackan Point | STX/KRK | 2025 | βοΈ | |
| Ryzen AI Max 300 Series | Strix Halo | STX | 2025 | βοΈ | |
| Ryzen AI 300 Series | Strix Point | STX | 2025 | βοΈ | |
| Ryzen Pro 200 Series | Hawk Point | HPT | 2025 | βοΈ | |
| Ryzen 200 Series | Hawk Point | HPT | 2025 | βοΈ | |
| Ryzen PRO 8000 Series | Hawk Point | HPT | 2024 | βοΈ | |
| Ryzen 8000 Series | Hawk Point | HPT | 2024 | βοΈ | |
| Ryzen Pro 7000 Series | Phoenix | PHX | 2023 | βοΈ | |
| Ryzen 7000 Series | Phoenix | PHX | 2023 | βοΈ |
Getting Started
Follow the instructions here to download necessary NPU drivers and Ryzen AI software: Ryzen AI SW Installation Instructions. Please allow for around 30 minutes to install all of the necessary components of Ryzen AI SW.
Activate the previously installed conda environment from Ryzen AI (RAI) SW, and set the RAI environment variable to your installation path. Substitute the correct RAI version number for
v.v.v, such as1.7.0.
conda activate ryzen-ai-v.v.v
$Env:RYZEN_AI_INSTALLATION_PATH = 'C:/Program Files/RyzenAI/v.v.v/'
- Clone the Hugging Face model repository:
git clone https://hf.co/amd/realesrgan-128x128-tiles-amdnpu
Alternatively, you can use the Hugging Face Hub API to download all of the files with Python:
from huggingface_hub import snapshot_download
snapshot_download("amd/realesrgan-128x128-tiles-amdnpu")
- Install the necessary packages into the existing conda environment:
pip install -r requirements.txt
- Data Preparation (optional: for evaluation).
Download the EDSR benchmark dataset extract it into the
datasets/directory. Note that you will need to run this script twice, as it seems to fail on first attempt.
python download_edsr_benchmark.py
Download and extract the DIV2K validation set:
python download_div2k.py
The datasets/ directory should look like this:
datasets
βββDIV2K_valid_HR
βββDIV2K_valid_LR_bicubic/X4
βββedsr_benchmark
βββ B100
βββ HR
βββ 3096.png
βββ ...
βββ LR_bicubic/X4
βββ 3096x4.png
βββ ...
βββ Set5
βββ HR
βββ baby.png
βββ ...
βββ LR_bicubic/X4
βββ babyx4.png
βββ ...
- Run inference on a single image or a folder of images. For example, for one image, run
python onnx_inference.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --input .\datasets\edsr_benchmark\B100\HR\108005.png --out-dir outputs --device npu
Arguments:
--onnx: The ONNX model file path.
--input: Accepts either a single image file path or a directory path. If it's a file, the script will process that image only. If it's a directory, the script will recursively scan for .png, .jpg, and .jpeg files and process all of them.
--out-dir: Output directory where the restored images will be saved.
--device: Accepts "npu" or "cpu". The NPU will attempt to use the VitisAIExecutionProvider; the CPU will attempt to use the CPUExecutionProvider. Note that to use the NPU, the updated NPU drivers and Ryzen AI SW must first be installed.
The model has already been compiled and cached under modelcachekey_realesrgan_nchw_128x128_u8s8, but if this folder is not present, the model will be recompiled and then inference can be run.
- Evaluate the accuracy of the model on benchmark datasets (optional).
Eval on Set14. Enabling the -clean option will remove generated SR images.
python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/Set14/HR --lq-dir datasets/edsr_benchmark/Set14/LR_bicubic/X4 --out-dir outputs/u8s8-Set14 --device npu -clean
The output will be a set of accuracy metrics: PSNR, MS_SSIM, SSIM, and FID, in JSON format as below:
{
"onnx": "onnx-models/realesrgan_nchw_128x128_u8s8.onnx",
"psnr": 23.327783584594727,
"ms_ssim": 0.8939759698835051,
"ssim": 0.6422613034676046,
"fid": 138.82432193927548
}
The following are example scripts to run evaluation on the other datasets:
Eval on B100:
python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/B100/HR --lq-dir datasets/edsr_benchmark/B100/LR_bicubic/X4 --out-dir outputs/u8s8-B100 --device npu -clean
Eval on Urban100:
python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/Urban100/HR --lq-dir datasets/edsr_benchmark/Urban100/LR_bicubic/X4 --out-dir outputs/u8s8-Urban100 --device npu -clean
Eval on DIV2K:
python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/DIV2K_valid_HR --lq-dir datasets/DIV2K_valid_LR_bicubic/X4 --out-dir outputs/u8s8-DIV2K --device npu -clean
π§ Evaluation Data
Datasets:
The AMD ONNX model results were evaluated with the DIV2K and EDSR (B100, Urban100, Set14, Set5) datasets on peak signal-to-noise ratio (PSNR), multi-scale structural similarity (MS-SSIM), and FrΓ©chet Inception Distance (FID) (see Table 1).
The original Real-ESRGAN model from Wang et al. (2021) was evaluated on RealSR-Canon, RealSR-Nikon, DRealSR, DPED-iphone, OST300, ImageNet val, and ADE20K val (see Table 2).
Figure 4 shows their perceptual quality results as compared to other state-of-the-art models.
Figure 4: "Qualitative comparisons on several representative real-world samples with upsampling scale factor of 4. Our Real-ESRGAN outperforms previous approaches in both removing artifacts and restoring texture details. Real-ESRGAN+ (trained with sharpened ground-truths) can further boost visual sharpness. Other methods may either fail to remove overshoot (the 1st sample) and complicated artifacts (the 2nd sample), or fail to restore realistic and natural textures for various scenes (the 3rd, 4th, 5th samples)". Image and caption from Fig. 7 of Wang et al. (2021).
The original ESRGAN model from Wang et al. (2018) is evaluated "on widely used benchmark datasets β Set5, Set14, BSD100, Urban100, and the PIRM self-validation dataset that is provided in the PIRM-SR Challenge."
Motivation: We evaluate the model's performance to industry standards datasets in quantitative measures (see Quantitative Analyses).
π Training Data
Both the Real-ESRGAN and the the original ESRGAN model were trained on 3 image datasets:
- DIV2K: a set of 800 2K-resolution images for image restoration tasks.
- Flickr2K: 2,650 2K-resolution images collected on the Flickr website.
- OutdoorSceneTraining (OST): 10,324 1K- to 2K-resolution images of outdoor scenes.
However, the Real-ESRGAN data were synthetically generated through a preprocessing workflow to degrade the images involving blur, downsampling, noise, and compression.
Real-ESRGAN is finetuned from ESRGAN for faster convergence, for 400K iterations and a learning rate of . "RealESRGAN is trained with a combination of L1 loss, perceptual loss and GAN loss, with weights {1,1,0.1}, respectively" (Wang et al., 2021). For more detailed information on training, see their paper.
π Quantitative Analyses
Table 1 shows the accuracy metrics for the AMD ONNX models of Real-ESRGAN.
| Model | Set5 | Set14 | B100 | Urban100 | DIV2K | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PSNR(β) | MS_SSIM(β) | FID(β) | PSNR(β) | MS_SSIM(β) | FID(β) | PSNR(β) | MS_SSIM (β) | FID(β) | PSNR(β) | MS_SSIM(β) | FID(β) | PSNR(β) | MS_SSIM(β) | FID(β) | |
| 128x128(fp32) | 23.43 | 0.9346 | 114.31 | 22.38 | 0.8928 | 141.12 | 23.17 | 0.8804 | 134.00 | 20.02 | 0.8813 | 52.44 | 23.96 | 0.9096 | 29.79 |
| 128x128(int8) | 23.99 | 0.9387 | 97.89 | 22.65 | 0.8942 | 137.35 | 23.37 | 0.8817 | 131.91 | 20.51 | 0.8861 | 49.88 | 24.26 | 0.9103 | 27.46 |
| 256x256(fp32) | 23.44 | 0.9348 | 112.65 | 22.40 | 0.8932 | 139.71 | 23.21 | 0.8809 | 133.87 | 20.01 | 0.8815 | 52.09 | 23.96 | 0.9098 | 29.32 |
| 256x256(int8) | 23.90 | 0.9386 | 101.03 | 22.62 | 0.8949 | 135.43 | 23.28 | 0.8821 | 128.82 | 20.44 | 0.8861 | 48.76 | 24.14 | 0.9099 | 27.33 |
| 512x512(fp32) | 23.44 | 0.9348 | 112.65 | 22.40 | 0.8932 | 139.71 | 23.21 | 0.8809 | 133.87 | 20.01 | 0.8815 | 51.97 | 23.97 | 0.9099 | 29.02 |
| 512x512(int8) | 23.37 | 0.9303 | 117.11 | 22.29 | 0.8921 | 138.18 | 23.05 | 0.8796 | 128.34 | 19.96 | 0.8773 | 49.70 | 23.79 | 0.9024 | 25.40 |
| 1024x1024(fp32) | 23.44 | 0.9348 | 112.65 | 22.40 | 0.8932 | 139.71 | 23.21 | 0.8809 | 133.87 | 20.01 | 0.8815 | 51.97 | 23.97 | 0.9099 | 28.98 |
| 1024x1024(int8) | 23.10 | 0.9249 | 113.23 | 22.10 | 0.8835 | 140.06 | 22.82 | 0.8692 | 130.24 | 19.80 | 0.8710 | 50.43 | 23.42 | 0.8932 | 27.59 |
Table 1: Model accuracy metrics for AMD AI PC FP32 and INT8 quantized models.
In their paper, Wang et al. (2021) compare Real-ESRGAN with several state-of-the-art methods (Table 2) with NIQE scores.
| Bicubic | ESRGAN | DAN | RealSR | CDC | BSRGAN | Real-ESRGAN | Real-ESRGAN+ | |
|---|---|---|---|---|---|---|---|---|
| RealSR-Canon (β) | 6.1269 | 6.7715 | 6.5282 | 6.8692 | 6.1488 | 5.7489 | 4.5899 | 4.5314 |
| RealSR-Nikon (β) | 6.3607 | 6.7480 | 6.6063 | 6.7390 | 6.3265 | 5.9920 | 5.0753 | 5.0247 |
| DRealSR (β) | 6.5766 | 8.6335 | 7.0720 | 7.7213 | 6.6359 | 6.1362 | 4.9796 | 4.8458 |
| DPED-iphone (β) | 6.0121 | 5.7363 | 6.1414 | 5.5855 | 6.2738 | 5.9906 | 5.4352 | 5.2631 |
| OST300 (β) | 4.4440 | 3.5245 | 5.0232 | 4.5715 | 4.7441 | 4.1662 | 2.8659 | 2.8191 |
| ImageNet val (β) | 7.4985 | 3.6474 | 6.0932 | 3.8303 | 7.0441 | 4.3528 | 4.8580 | 4.6448 |
| ADE20K val (β) | 7.5239 | 3.6905 | 6.3839 | 3.4102 | 6.9219 | 3.9434 | 3.7886 | 3.5778 |
Table 2: "NIQE scores on several diverse testing datasets with real-world images. The lower, the better." From Table 1 in Wang et al. (2021).
β Ethical Considerations
AMD is committed to conducting our business in a fair, ethical and honest manner and in compliance with all applicable laws, rules and regulations. You can find out more at the AMD Ethics and Compliance page.
β οΈ Caveats and Recommendations
Wang et al. (2021) note that there are limitations with the Real-ESRGAN model, including aliasing, introduction of unpleasant artifacts, and the inability to remove complicated degradations.
π Citation Details
@InProceedings{wang2021realesrgan,
author = {Xintao Wang and Liangbin Xie and Chao Dong and Ying Shan},
title = {Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data},
booktitle = {International Conference on Computer Vision Workshops (ICCVW)},
date = {2021}
}




