πŸš€ Real-ESRGAN 128x128 for 4x Single Image Super-Resolution on AMD AI PC NPU

The Real Enhanced Super-Resolution Generative Adversarial Networks (Real-ESRGAN) model is an AI model that takes an input low-resolution image and creates high or "super-resolution" image. This version of the Real-ESRGAN model has been re-trained from scratch with reduced feature channels and fewer stacked blocks for improved efficiency. It was then quantized from FP32 to INT8 and optimized to run on the AMD AI PC NPU. This model takes an input image and scales it up by 4x.

The "128x128" in the title means that this model works on tile sizes 128x128, but almost any size input image can be upscaled by 4x. The inference pipeline tiles the input image into patches based on the ONNX model’s expected input resolution (with overlap), runs inference on each tile, and then stitches the results back together. A model with a larger tile size would lower the stitching overhead and may contain less boundary artifacts. Figure 1 shows an example of a 4x scaled image.

Input image Output image
assets/input_tiger_320x480_108005.png assets/output_tiger_4x_1280x1920_108005.png

Figure 1: Input 320x480 scaled up by 4x to 1280x1920 with Real-ESRGAN model running on AMD AI PC NPU. Source: EDSR Benchmark dataset (edsr_benchmark\B100\HR\108005.png).

The original model and architecture can be found on GitHub: xinntao/Real-ESRGAN.

Wang et al. (2018) introduced ESRGAN with the "Residual-in-Residual Dense Block (RRDB), without batch normalization as the basic network building unit." Figure 2 shows their original ESRGAN model architecture and Figure 3 shows their updated Real-ESRGAN (Wang et al., 2021) architecture.

esrgan_architecture

Figure 2: ESRGAN architecture with Residual in Residual Dense Block (RRDB) and removed batched normalization (BN) layers. Image from Fig. 4 of Wang et al. (2018).

real-esrgan_architecture

Figure 3: Real-ESRGAN architecture "adopts the same generator network as that in ESRGAN. For the scale factor of Γ—2\times2 and Γ—1\times1, it first employs a pixel-unshuffle operation to reduce spatial size and re-arrange information to the channel dimension." Image from Fig. 4 of Wang et al. (2021).

Model Details Description
Person or organization developing model Yixuan Liu (AMD), Hongwei Qin (AMD), Benjamin Consolvo (AMD)
Model date January 2026
Model version 1
Model type Super-Resolution (Image-to-Image)
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features This version of the Real-ESRGAN model has been re-trained from scratch with reduced feature channels and fewer stacked blocks for improved efficiency.
License Apache 2.0
Where to send questions or comments about the model Community Tab and AMD Developer Community Discord

⚑ Intended Use

Intended Use Description
Primary intended uses The model can be used to create high-resolution images from low-resolution images. The model has been converted to ONNX format and quantized for optimized performance on AMD AI PC NPUs.
Primary intended users Anyone using or evaluating super-resolution models on AMD AI PCs.
Out-of-scope uses This model is not intended for generating misinformation or disinformation, impersonating others, facilitating or inciting harassment or violence, any use that could lead to the violation of a human right.

How to Use

πŸ“ Hardware Prerequisites

Before getting started, make sure you meet the minimum hardware and OS requirements:

Series Codename Abbreviation Launch Year Windows 11 Linux
Ryzen AI Max PRO 300 Series Strix Halo STX 2025 β˜‘οΈ
Ryzen AI PRO 300 Series Strix Point / Krackan Point STX/KRK 2025 β˜‘οΈ
Ryzen AI Max 300 Series Strix Halo STX 2025 β˜‘οΈ
Ryzen AI 300 Series Strix Point STX 2025 β˜‘οΈ
Ryzen Pro 200 Series Hawk Point HPT 2025 β˜‘οΈ
Ryzen 200 Series Hawk Point HPT 2025 β˜‘οΈ
Ryzen PRO 8000 Series Hawk Point HPT 2024 β˜‘οΈ
Ryzen 8000 Series Hawk Point HPT 2024 β˜‘οΈ
Ryzen Pro 7000 Series Phoenix PHX 2023 β˜‘οΈ
Ryzen 7000 Series Phoenix PHX 2023 β˜‘οΈ

Getting Started

  1. Follow the instructions here to download necessary NPU drivers and Ryzen AI software: Ryzen AI SW Installation Instructions. Please allow for around 30 minutes to install all of the necessary components of Ryzen AI SW.

  2. Activate the previously installed conda environment from Ryzen AI (RAI) SW, and set the RAI environment variable to your installation path. Substitute the correct RAI version number for v.v.v, such as 1.7.0.

conda activate ryzen-ai-v.v.v
$Env:RYZEN_AI_INSTALLATION_PATH = 'C:/Program Files/RyzenAI/v.v.v/'
  1. Clone the Hugging Face model repository:
git clone https://hf.co/amd/realesrgan-128x128-tiles-amdnpu

Alternatively, you can use the Hugging Face Hub API to download all of the files with Python:

from huggingface_hub import snapshot_download
snapshot_download("amd/realesrgan-128x128-tiles-amdnpu")
  1. Install the necessary packages into the existing conda environment:
pip install -r requirements.txt
  1. Data Preparation (optional: for evaluation). Download the EDSR benchmark dataset extract it into the datasets/ directory. Note that you will need to run this script twice, as it seems to fail on first attempt.
python download_edsr_benchmark.py

Download and extract the DIV2K validation set:

python download_div2k.py

The datasets/ directory should look like this:

datasets
  └──DIV2K_valid_HR
  └──DIV2K_valid_LR_bicubic/X4
  └──edsr_benchmark
      └── B100
            └── HR
              β”œβ”€β”€ 3096.png
              β”œβ”€β”€ ...
            └── LR_bicubic/X4
              β”œβ”€β”€ 3096x4.png
              β”œβ”€β”€ ...
      └── Set5
            └── HR
              β”œβ”€β”€ baby.png
              β”œβ”€β”€ ...
            └── LR_bicubic/X4
              β”œβ”€β”€ babyx4.png
              β”œβ”€β”€ ...
  1. Run inference on a single image or a folder of images. For example, for one image, run
python onnx_inference.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --input .\datasets\edsr_benchmark\B100\HR\108005.png --out-dir outputs  --device npu

Arguments:

--onnx: The ONNX model file path.

--input: Accepts either a single image file path or a directory path. If it's a file, the script will process that image only. If it's a directory, the script will recursively scan for .png, .jpg, and .jpeg files and process all of them.

--out-dir: Output directory where the restored images will be saved.

--device: Accepts "npu" or "cpu". The NPU will attempt to use the VitisAIExecutionProvider; the CPU will attempt to use the CPUExecutionProvider. Note that to use the NPU, the updated NPU drivers and Ryzen AI SW must first be installed.

The model has already been compiled and cached under modelcachekey_realesrgan_nchw_128x128_u8s8, but if this folder is not present, the model will be recompiled and then inference can be run.

  1. Evaluate the accuracy of the model on benchmark datasets (optional).

Eval on Set14. Enabling the -clean option will remove generated SR images.

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/Set14/HR --lq-dir datasets/edsr_benchmark/Set14/LR_bicubic/X4 --out-dir outputs/u8s8-Set14 --device npu -clean

The output will be a set of accuracy metrics: PSNR, MS_SSIM, SSIM, and FID, in JSON format as below:

{
  "onnx": "onnx-models/realesrgan_nchw_128x128_u8s8.onnx",
  "psnr": 23.327783584594727,
  "ms_ssim": 0.8939759698835051,
  "ssim": 0.6422613034676046,
  "fid": 138.82432193927548
}

The following are example scripts to run evaluation on the other datasets:

Eval on B100:

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/B100/HR --lq-dir datasets/edsr_benchmark/B100/LR_bicubic/X4 --out-dir outputs/u8s8-B100 --device npu -clean

Eval on Urban100:

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/Urban100/HR --lq-dir datasets/edsr_benchmark/Urban100/LR_bicubic/X4 --out-dir outputs/u8s8-Urban100 --device npu -clean

Eval on DIV2K:

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/DIV2K_valid_HR --lq-dir datasets/DIV2K_valid_LR_bicubic/X4 --out-dir outputs/u8s8-DIV2K --device npu -clean

πŸ”§ Evaluation Data

Datasets:

The AMD ONNX model results were evaluated with the DIV2K and EDSR (B100, Urban100, Set14, Set5) datasets on peak signal-to-noise ratio (PSNR), multi-scale structural similarity (MS-SSIM), and FrΓ©chet Inception Distance (FID) (see Table 1).

The original Real-ESRGAN model from Wang et al. (2021) was evaluated on RealSR-Canon, RealSR-Nikon, DRealSR, DPED-iphone, OST300, ImageNet val, and ADE20K val (see Table 2).

Figure 4 shows their perceptual quality results as compared to other state-of-the-art models.

wangetal2021_fig7

Figure 4: "Qualitative comparisons on several representative real-world samples with upsampling scale factor of 4. Our Real-ESRGAN outperforms previous approaches in both removing artifacts and restoring texture details. Real-ESRGAN+ (trained with sharpened ground-truths) can further boost visual sharpness. Other methods may either fail to remove overshoot (the 1st sample) and complicated artifacts (the 2nd sample), or fail to restore realistic and natural textures for various scenes (the 3rd, 4th, 5th samples)". Image and caption from Fig. 7 of Wang et al. (2021).

The original ESRGAN model from Wang et al. (2018) is evaluated "on widely used benchmark datasets – Set5, Set14, BSD100, Urban100, and the PIRM self-validation dataset that is provided in the PIRM-SR Challenge."

Motivation: We evaluate the model's performance to industry standards datasets in quantitative measures (see Quantitative Analyses).

πŸ“š Training Data

Both the Real-ESRGAN and the the original ESRGAN model were trained on 3 image datasets:

  1. DIV2K: a set of 800 2K-resolution images for image restoration tasks.
  2. Flickr2K: 2,650 2K-resolution images collected on the Flickr website.
  3. OutdoorSceneTraining (OST): 10,324 1K- to 2K-resolution images of outdoor scenes.

However, the Real-ESRGAN data were synthetically generated through a preprocessing workflow to degrade the images involving blur, downsampling, noise, and compression.

Real-ESRGAN is finetuned from ESRGAN for faster convergence, for 400K iterations and a learning rate of 1Γ—10βˆ’41\times10^{-4}. "RealESRGAN is trained with a combination of L1 loss, perceptual loss and GAN loss, with weights {1,1,0.1}, respectively" (Wang et al., 2021). For more detailed information on training, see their paper.

πŸ“ Quantitative Analyses

Table 1 shows the accuracy metrics for the AMD ONNX models of Real-ESRGAN.

Model Set5 Set14 B100 Urban100 DIV2K
PSNR(↑) MS_SSIM(↑) FID(↓) PSNR(↑) MS_SSIM(↑) FID(↓) PSNR(↑) MS_SSIM (↑) FID(↓) PSNR(↑) MS_SSIM(↑) FID(↓) PSNR(↑) MS_SSIM(↑) FID(↓)
128x128(fp32) 23.43 0.9346 114.31 22.38 0.8928 141.12 23.17 0.8804 134.00 20.02 0.8813 52.44 23.96 0.9096 29.79
128x128(int8) 23.99 0.9387 97.89 22.65 0.8942 137.35 23.37 0.8817 131.91 20.51 0.8861 49.88 24.26 0.9103 27.46
256x256(fp32) 23.44 0.9348 112.65 22.40 0.8932 139.71 23.21 0.8809 133.87 20.01 0.8815 52.09 23.96 0.9098 29.32
256x256(int8) 23.90 0.9386 101.03 22.62 0.8949 135.43 23.28 0.8821 128.82 20.44 0.8861 48.76 24.14 0.9099 27.33
512x512(fp32) 23.44 0.9348 112.65 22.40 0.8932 139.71 23.21 0.8809 133.87 20.01 0.8815 51.97 23.97 0.9099 29.02
512x512(int8) 23.37 0.9303 117.11 22.29 0.8921 138.18 23.05 0.8796 128.34 19.96 0.8773 49.70 23.79 0.9024 25.40
1024x1024(fp32) 23.44 0.9348 112.65 22.40 0.8932 139.71 23.21 0.8809 133.87 20.01 0.8815 51.97 23.97 0.9099 28.98
1024x1024(int8) 23.10 0.9249 113.23 22.10 0.8835 140.06 22.82 0.8692 130.24 19.80 0.8710 50.43 23.42 0.8932 27.59

Table 1: Model accuracy metrics for AMD AI PC FP32 and INT8 quantized models.

In their paper, Wang et al. (2021) compare Real-ESRGAN with several state-of-the-art methods (Table 2) with NIQE scores.

Bicubic ESRGAN DAN RealSR CDC BSRGAN Real-ESRGAN Real-ESRGAN+
RealSR-Canon (↓) 6.1269 6.7715 6.5282 6.8692 6.1488 5.7489 4.5899 4.5314
RealSR-Nikon (↓) 6.3607 6.7480 6.6063 6.7390 6.3265 5.9920 5.0753 5.0247
DRealSR (↓) 6.5766 8.6335 7.0720 7.7213 6.6359 6.1362 4.9796 4.8458
DPED-iphone (↓) 6.0121 5.7363 6.1414 5.5855 6.2738 5.9906 5.4352 5.2631
OST300 (↓) 4.4440 3.5245 5.0232 4.5715 4.7441 4.1662 2.8659 2.8191
ImageNet val (↓) 7.4985 3.6474 6.0932 3.8303 7.0441 4.3528 4.8580 4.6448
ADE20K val (↓) 7.5239 3.6905 6.3839 3.4102 6.9219 3.9434 3.7886 3.5778

Table 2: "NIQE scores on several diverse testing datasets with real-world images. The lower, the better." From Table 1 in Wang et al. (2021).

βš“ Ethical Considerations

AMD is committed to conducting our business in a fair, ethical and honest manner and in compliance with all applicable laws, rules and regulations. You can find out more at the AMD Ethics and Compliance page.

⚠️ Caveats and Recommendations

Wang et al. (2021) note that there are limitations with the Real-ESRGAN model, including aliasing, introduction of unpleasant artifacts, and the inability to remove complicated degradations.

πŸ“Œ Citation Details

@InProceedings{wang2021realesrgan,
    author    = {Xintao Wang and Liangbin Xie and Chao Dong and Ying Shan},
    title     = {Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data},
    booktitle = {International Conference on Computer Vision Workshops (ICCVW)},
    date      = {2021}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train amd/realesrgan-128x128-tiles-amdnpu

Collection including amd/realesrgan-128x128-tiles-amdnpu

Papers for amd/realesrgan-128x128-tiles-amdnpu