Skip to main content

inference_realesrgan.py

Command-line script for upsampling images using Real-ESRGAN models.

Usage

python inference_realesrgan.py -i inputs -o results -n RealESRGAN_x4plus

Arguments

-i, --input
str
default:"inputs"
Input image or folder path
-n, --model_name
str
default:"RealESRGAN_x4plus"
Model name to use for inferenceAvailable models:
  • RealESRGAN_x4plus - General 4x upsampling
  • RealESRNet_x4plus - 4x upsampling without GAN
  • RealESRGAN_x4plus_anime_6B - Anime images 4x
  • RealESRGAN_x2plus - General 2x upsampling
  • realesr-animevideov3 - Anime video 4x
  • realesr-general-x4v3 - General purpose 4x with denoise control
-o, --output
str
default:"results"
Output folder path
-dn, --denoise_strength
float
default:"0.5"
Denoise strength. Range: 0 (weak denoise, keep noise) to 1 (strong denoise ability)Only used for the realesr-general-x4v3 model
-s, --outscale
float
default:"4"
The final upsampling scale of the image
--model_path
str
default:"None"
Optional custom model path. Usually not needed as models are downloaded automatically
--suffix
str
default:"out"
Suffix of the restored image filename
-t, --tile
int
default:"0"
Tile size for processing. 0 means no tiling. Use tiling to avoid out-of-memory errors with large images
--tile_pad
int
default:"10"
Tile padding size to reduce border artifacts
--pre_pad
int
default:"0"
Pre-padding size at each border
--face_enhance
bool
Use GFPGAN to enhance faces in the image (flag, no value needed)
--fp32
bool
Use fp32 (full) precision during inference. Default is fp16 (half precision)
--alpha_upsampler
str
default:"realesrgan"
The upsampler for alpha channels (transparency)Options: realesrgan | bicubic
--ext
str
default:"auto"
Output image extensionOptions: auto | jpg | pngauto uses the same extension as the input
-g, --gpu-id
int
default:"None"
GPU device to use. Can be 0, 1, 2, etc. for multi-GPU systems

Examples

# Upscale a single image
python inference_realesrgan.py -i input.jpg -o output_folder -n RealESRGAN_x4plus
Memory Management: If you encounter CUDA out of memory errors, try using the --tile option with a smaller tile size (e.g., -t 200 or -t 400). Tiling processes the image in smaller chunks at the cost of slightly slower processing.
File Formats: RGBA images (with transparency) are automatically saved as PNG regardless of the --ext setting to preserve the alpha channel.

inference_realesrgan_video.py

Command-line script for upsampling videos using Real-ESRGAN models. Optimized for anime videos.

Usage

python inference_realesrgan_video.py -i video.mp4 -o results -n realesr-animevideov3

Arguments

-i, --input
str
default:"inputs"
Input video, image, or folder path
-n, --model_name
str
default:"realesr-animevideov3"
Model name to use for inferenceAvailable models:
  • realesr-animevideov3 - Optimized for anime videos (default)
  • RealESRGAN_x4plus_anime_6B - Anime images/videos 4x
  • RealESRGAN_x4plus - General 4x upsampling
  • RealESRNet_x4plus - 4x upsampling without GAN
  • RealESRGAN_x2plus - General 2x upsampling
  • realesr-general-x4v3 - General purpose 4x with denoise control
-o, --output
str
default:"results"
Output folder path
-dn, --denoise_strength
float
default:"0.5"
Denoise strength. Range: 0 (weak denoise, keep noise) to 1 (strong denoise ability)Only used for the realesr-general-x4v3 model
-s, --outscale
float
default:"4"
The final upsampling scale of the video
--suffix
str
default:"out"
Suffix of the restored video filename
-t, --tile
int
default:"0"
Tile size for processing. 0 means no tiling. Use tiling to avoid out-of-memory errors
--tile_pad
int
default:"10"
Tile padding size to reduce border artifacts
--pre_pad
int
default:"0"
Pre-padding size at each border
--face_enhance
bool
Use GFPGAN to enhance faces in the video (flag, no value needed)Note: Automatically disabled for anime models
--fp32
bool
Use fp32 (full) precision during inference. Default is fp16 (half precision)
--fps
float
default:"None"
FPS of the output video. If not specified, uses the input video’s FPS
--ffmpeg_bin
str
default:"ffmpeg"
Path to the ffmpeg binary
--extract_frame_first
bool
Extract frames to disk before processing (flag, no value needed). Can be useful for certain workflows
--num_process_per_gpu
int
default:"1"
Number of processes to spawn per GPU for parallel processing
--alpha_upsampler
str
default:"realesrgan"
The upsampler for alpha channels (transparency)Options: realesrgan | bicubic
--ext
str
default:"auto"
Image extension when processing image foldersOptions: auto | jpg | png

Examples

# Upscale anime video with default settings
python inference_realesrgan_video.py -i anime_video.mp4 -o results/
Performance Warning: If you are generating videos larger than 4K resolution, processing will be very slow due to I/O speed limitations. It is highly recommended to decrease the --outscale value.
Multi-GPU Processing: The script automatically detects available GPUs and can process video segments in parallel. Use --num_process_per_gpu to control how many processes run per GPU. For example, with 2 GPUs and --num_process_per_gpu 2, a total of 4 processes will run in parallel.
FLV Format: If the input is a .flv file, the script automatically converts it to .mp4 using ffmpeg before processing.
Audio Preservation: The script automatically preserves the audio track from the input video in the output video.

Helper Classes

The video inference script includes two helper classes:

Reader

Handles reading frames from videos, images, or folders. Supports streaming from video files using ffmpeg. Methods:
  • get_resolution() - Returns (height, width) of the input
  • get_fps() - Returns the FPS of the video
  • get_audio() - Returns the audio stream
  • get_frame() - Returns the next frame
  • close() - Closes the stream reader

Writer

Handles writing frames to output video using ffmpeg with H.264 encoding. Methods:
  • write_frame(frame) - Writes a frame to the output video
  • close() - Finalizes and closes the output video