🚀 Real-ESRGAN 128x128 for 4x Single Image Super-Resolution on AMD AI PC NPU

The Real Enhanced Super-Resolution Generative Adversarial Networks (Real-ESRGAN) model is an AI model that takes an input low-resolution image and creates high or "super-resolution" image. This version of the Real-ESRGAN model has been re-trained from scratch with reduced feature channels and fewer stacked blocks for improved efficiency. It was then quantized from FP32 to INT8 and optimized to run on the AMD AI PC NPU. This model takes an input image and scales it up by 4x.

The "128x128" in the title means that this model works on tile sizes 128x128, but almost any size input image can be upscaled by 4x. The inference pipeline tiles the input image into patches based on the ONNX model’s expected input resolution (with overlap), runs inference on each tile, and then stitches the results back together. A model with a larger tile size would lower the stitching overhead and may contain less boundary artifacts. Figure 1 shows an example of a 4x scaled image.

Input image	Output image

Figure 1: Input 320x480 scaled up by 4x to 1280x1920 with Real-ESRGAN model running on AMD AI PC NPU. Source: EDSR Benchmark dataset (edsr_benchmark\B100\HR\108005.png).

The original model and architecture can be found on GitHub: xinntao/Real-ESRGAN.

Wang et al. (2018) introduced ESRGAN with the "Residual-in-Residual Dense Block (RRDB), without batch normalization as the basic network building unit." Figure 2 shows their original ESRGAN model architecture and Figure 3 shows their updated Real-ESRGAN (Wang et al., 2021) architecture.

Figure 2: ESRGAN architecture with Residual in Residual Dense Block (RRDB) and removed batched normalization (BN) layers. Image from Fig. 4 of Wang et al. (2018).

Figure 3: Real-ESRGAN architecture "adopts the same generator network as that in ESRGAN. For the scale factor of $\times2$ and $\times1$ , it first employs a pixel-unshuffle operation to reduce spatial size and re-arrange information to the channel dimension." Image from Fig. 4 of Wang et al. (2021).

Model Details	Description
Person or organization developing model	Yixuan Liu (AMD), Hongwei Qin (AMD), Benjamin Consolvo (AMD)
Model date	January 2026
Model version	1
Model type	Super-Resolution (Image-to-Image)
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features	This version of the Real-ESRGAN model has been re-trained from scratch with reduced feature channels and fewer stacked blocks for improved efficiency.
License	Apache 2.0
Where to send questions or comments about the model	Community Tab and AMD Developer Community Discord

⚡ Intended Use

Intended Use	Description
Primary intended uses	The model can be used to create high-resolution images from low-resolution images. The model has been converted to ONNX format and quantized for optimized performance on AMD AI PC NPUs.
Primary intended users	Anyone using or evaluating super-resolution models on AMD AI PCs.
Out-of-scope uses	This model is not intended for generating misinformation or disinformation, impersonating others, facilitating or inciting harassment or violence, any use that could lead to the violation of a human right.

How to Use

📐 Hardware Prerequisites

Before getting started, make sure you meet the minimum hardware and OS requirements:

Series	Codename	Abbreviation	Launch Year	Windows 11
Ryzen AI Max PRO 300 Series	Strix Halo	STX	2025	☑️
Ryzen AI PRO 300 Series	Strix Point / Krackan Point	STX/KRK	2025	☑️
Ryzen AI Max 300 Series	Strix Halo	STX	2025	☑️
Ryzen AI 300 Series	Strix Point	STX	2025	☑️
Ryzen Pro 200 Series	Hawk Point	HPT	2025	☑️
Ryzen 200 Series	Hawk Point	HPT	2025	☑️
Ryzen PRO 8000 Series	Hawk Point	HPT	2024	☑️
Ryzen 8000 Series	Hawk Point	HPT	2024	☑️
Ryzen Pro 7000 Series	Phoenix	PHX	2023	☑️
Ryzen 7000 Series	Phoenix	PHX	2023	☑️

Getting Started

Follow the instructions here to download necessary NPU drivers and Ryzen AI software: Ryzen AI SW Installation Instructions. Please allow for around 30 minutes to install all of the necessary components of Ryzen AI SW.
Activate the previously installed conda environment from Ryzen AI (RAI) SW, and set the RAI environment variable to your installation path. Substitute the correct RAI version number for v.v.v, such as 1.7.0.

conda activate ryzen-ai-v.v.v
$Env:RYZEN_AI_INSTALLATION_PATH = 'C:/Program Files/RyzenAI/v.v.v/'

Clone the Hugging Face model repository:

git clone https://hf.co/amd/realesrgan-128x128-tiles-amdnpu

Alternatively, you can use the Hugging Face Hub API to download all of the files with Python:

from huggingface_hub import snapshot_download
snapshot_download("amd/realesrgan-128x128-tiles-amdnpu")

Install the necessary packages into the existing conda environment:

pip install -r requirements.txt

Data Preparation (optional: for evaluation). Download the EDSR benchmark dataset extract it into the datasets/ directory. Note that you will need to run this script twice, as it seems to fail on first attempt.

python download_edsr_benchmark.py

Download and extract the DIV2K validation set:

python download_div2k.py

The datasets/ directory should look like this:

datasets
  └──DIV2K_valid_HR
  └──DIV2K_valid_LR_bicubic/X4
  └──edsr_benchmark
      └── B100
            └── HR
              ├── 3096.png
              ├── ...
            └── LR_bicubic/X4
              ├── 3096x4.png
              ├── ...
      └── Set5
            └── HR
              ├── baby.png
              ├── ...
            └── LR_bicubic/X4
              ├── babyx4.png
              ├── ...

Run inference on a single image or a folder of images. For example, for one image, run

python onnx_inference.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --input .\datasets\edsr_benchmark\B100\HR\108005.png --out-dir outputs  --device npu

Arguments:

--onnx: The ONNX model file path.

--input: Accepts either a single image file path or a directory path. If it's a file, the script will process that image only. If it's a directory, the script will recursively scan for .png, .jpg, and .jpeg files and process all of them.

--out-dir: Output directory where the restored images will be saved.

--device: Accepts "npu" or "cpu". The NPU will attempt to use the VitisAIExecutionProvider; the CPU will attempt to use the CPUExecutionProvider. Note that to use the NPU, the updated NPU drivers and Ryzen AI SW must first be installed.

The model has already been compiled and cached under modelcachekey_realesrgan_nchw_128x128_u8s8, but if this folder is not present, the model will be recompiled and then inference can be run.

Evaluate the accuracy of the model on benchmark datasets (optional).

Eval on Set14. Enabling the -clean option will remove generated SR images.

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/Set14/HR --lq-dir datasets/edsr_benchmark/Set14/LR_bicubic/X4 --out-dir outputs/u8s8-Set14 --device npu -clean

The output will be a set of accuracy metrics: PSNR, MS_SSIM, SSIM, and FID, in JSON format as below:

{
  "onnx": "onnx-models/realesrgan_nchw_128x128_u8s8.onnx",
  "psnr": 23.327783584594727,
  "ms_ssim": 0.8939759698835051,
  "ssim": 0.6422613034676046,
  "fid": 138.82432193927548
}

The following are example scripts to run evaluation on the other datasets:

Eval on B100:

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/B100/HR --lq-dir datasets/edsr_benchmark/B100/LR_bicubic/X4 --out-dir outputs/u8s8-B100 --device npu -clean

Eval on Urban100:

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/edsr_benchmark/Urban100/HR --lq-dir datasets/edsr_benchmark/Urban100/LR_bicubic/X4 --out-dir outputs/u8s8-Urban100 --device npu -clean

Eval on DIV2K:

python onnx_eval.py --onnx onnx-models/realesrgan_nchw_128x128_u8s8.onnx --hq-dir datasets/DIV2K_valid_HR --lq-dir datasets/DIV2K_valid_LR_bicubic/X4 --out-dir outputs/u8s8-DIV2K --device npu -clean

🔧 Evaluation Data

Datasets:

The AMD ONNX model results were evaluated with the DIV2K and EDSR (B100, Urban100, Set14, Set5) datasets on peak signal-to-noise ratio (PSNR), multi-scale structural similarity (MS-SSIM), and Fréchet Inception Distance (FID) (see Table 1).

The original Real-ESRGAN model from Wang et al. (2021) was evaluated on RealSR-Canon, RealSR-Nikon, DRealSR, DPED-iphone, OST300, ImageNet val, and ADE20K val (see Table 2).

Figure 4 shows their perceptual quality results as compared to other state-of-the-art models.

Figure 4: "Qualitative comparisons on several representative real-world samples with upsampling scale factor of 4. Our Real-ESRGAN outperforms previous approaches in both removing artifacts and restoring texture details. Real-ESRGAN+ (trained with sharpened ground-truths) can further boost visual sharpness. Other methods may either fail to remove overshoot (the 1st sample) and complicated artifacts (the 2nd sample), or fail to restore realistic and natural textures for various scenes (the 3rd, 4th, 5th samples)". Image and caption from Fig. 7 of Wang et al. (2021).

The original ESRGAN model from Wang et al. (2018) is evaluated "on widely used benchmark datasets – Set5, Set14, BSD100, Urban100, and the PIRM self-validation dataset that is provided in the PIRM-SR Challenge."

Motivation: We evaluate the model's performance to industry standards datasets in quantitative measures (see Quantitative Analyses).

📚 Training Data

Both the Real-ESRGAN and the the original ESRGAN model were trained on 3 image datasets:

DIV2K: a set of 800 2K-resolution images for image restoration tasks.
Flickr2K: 2,650 2K-resolution images collected on the Flickr website.
OutdoorSceneTraining (OST): 10,324 1K- to 2K-resolution images of outdoor scenes.

However, the Real-ESRGAN data were synthetically generated through a preprocessing workflow to degrade the images involving blur, downsampling, noise, and compression.

Real-ESRGAN is finetuned from ESRGAN for faster convergence, for 400K iterations and a learning rate of $1\times10^{-4}$ . "RealESRGAN is trained with a combination of L1 loss, perceptual loss and GAN loss, with weights {1,1,0.1}, respectively" (Wang et al., 2021). For more detailed information on training, see their paper.

📝 Quantitative Analyses

Table 1 shows the accuracy metrics for the AMD ONNX models of Real-ESRGAN.

Model		Set5			Set14			B100			Urban100			DIV2K
	PSNR(↑)	MS_SSIM(↑)	FID(↓)	PSNR(↑)	MS_SSIM(↑)	FID(↓)	PSNR(↑)	MS_SSIM (↑)	FID(↓)	PSNR(↑)	MS_SSIM(↑)	FID(↓)	PSNR(↑)	MS_SSIM(↑)	FID(↓)
128x128(fp32)	23.43	0.9346	114.31	22.38	0.8928	141.12	23.17	0.8804	134.00	20.02	0.8813	52.44	23.96	0.9096	29.79
128x128(int8)	23.99	0.9387	97.89	22.65	0.8942	137.35	23.37	0.8817	131.91	20.51	0.8861	49.88	24.26	0.9103	27.46

256x256(fp32)	23.44	0.9348	112.65	22.40	0.8932	139.71	23.21	0.8809	133.87	20.01	0.8815	52.09	23.96	0.9098	29.32
256x256(int8)	23.90	0.9386	101.03	22.62	0.8949	135.43	23.28	0.8821	128.82	20.44	0.8861	48.76	24.14	0.9099	27.33

512x512(fp32)	23.44	0.9348	112.65	22.40	0.8932	139.71	23.21	0.8809	133.87	20.01	0.8815	51.97	23.97	0.9099	29.02
512x512(int8)	23.37	0.9303	117.11	22.29	0.8921	138.18	23.05	0.8796	128.34	19.96	0.8773	49.70	23.79	0.9024	25.40

1024x1024(fp32)	23.44	0.9348	112.65	22.40	0.8932	139.71	23.21	0.8809	133.87	20.01	0.8815	51.97	23.97	0.9099	28.98
1024x1024(int8)	23.10	0.9249	113.23	22.10	0.8835	140.06	22.82	0.8692	130.24	19.80	0.8710	50.43	23.42	0.8932	27.59

Table 1: Model accuracy metrics for AMD AI PC FP32 and INT8 quantized models.

In their paper, Wang et al. (2021) compare Real-ESRGAN with several state-of-the-art methods (Table 2) with NIQE scores.

	Bicubic	ESRGAN	DAN	RealSR	CDC	BSRGAN	Real-ESRGAN	Real-ESRGAN+
RealSR-Canon (↓)	6.1269	6.7715	6.5282	6.8692	6.1488	5.7489	4.5899	4.5314
RealSR-Nikon (↓)	6.3607	6.7480	6.6063	6.7390	6.3265	5.9920	5.0753	5.0247
DRealSR (↓)	6.5766	8.6335	7.0720	7.7213	6.6359	6.1362	4.9796	4.8458
DPED-iphone (↓)	6.0121	5.7363	6.1414	5.5855	6.2738	5.9906	5.4352	5.2631
OST300 (↓)	4.4440	3.5245	5.0232	4.5715	4.7441	4.1662	2.8659	2.8191
ImageNet val (↓)	7.4985	3.6474	6.0932	3.8303	7.0441	4.3528	4.8580	4.6448
ADE20K val (↓)	7.5239	3.6905	6.3839	3.4102	6.9219	3.9434	3.7886	3.5778

Table 2: "NIQE scores on several diverse testing datasets with real-world images. The lower, the better." From Table 1 in Wang et al. (2021).

⚓ Ethical Considerations

AMD is committed to conducting our business in a fair, ethical and honest manner and in compliance with all applicable laws, rules and regulations. You can find out more at the AMD Ethics and Compliance page.

⚠️ Caveats and Recommendations

Wang et al. (2021) note that there are limitations with the Real-ESRGAN model, including aliasing, introduction of unpleasant artifacts, and the inability to remove complicated degradations.

📌 Citation Details

@InProceedings{wang2021realesrgan,
    author    = {Xintao Wang and Liangbin Xie and Chao Dong and Ying Shan},
    title     = {Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data},
    booktitle = {International Conference on Computer Vision Workshops (ICCVW)},
    date      = {2021}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Datasets used to train amd/realesrgan-128x128-tiles-amdnpu

Collection including amd/realesrgan-128x128-tiles-amdnpu

Ryzen-AI-1.7-NPU-creativity-models

Collection

9 items • Updated 16 days ago • 2

Papers for amd/realesrgan-128x128-tiles-amdnpu

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Paper • 2107.10833 • Published Jul 22, 2021 • 1

The 2018 PIRM Challenge on Perceptual Image Super-resolution

Paper • 1809.07517 • Published Sep 20, 2018