Why bilinear scaling of images with PIL and pytorch produces different results?

Question:

In order to feed an image to the pytorch network I first need to downscale it to some fixed size. At first I’ve done it using PIL.Image.resize() method, with interpolation mode set to BILINEAR. Then I though it would be more convenient to first convert a batch of images to pytorch tensor and then use torch.nn.functional.interpolate() function to scale the whole tensor at once on a GPU (‘bilinear’ interpolation mode as well). This lead to a decrease of the model accuracy because now during inference a type of scaling (torch) was different from the one used during training (PIL). After that, I compared two methods of downscaling visually and found out that they produce different results. Pillow downscaling seems more smooth. Do these methods perform different operations under the hood though both being bilinear? If so, I am also curious if there is a way to achieve the same result as Pillow image scaling with torch tensor scaling?

Original image (the well-known Lenna image)

Pillow scaled image:

Pillow scaled image

Torch scaled image:

Torch scaled image

Mean channel absolute difference map:

Mean channel absolute difference map

Demo code:

import numpy as np
from PIL import Image
import torch
import torch.nn.functional as F
from torchvision import transforms
import matplotlib.pyplot as plt

pil_to_torch = transforms.ToTensor()
res_shape = (128, 128)


pil_img = Image.open('Lenna.png')
torch_img = pil_to_torch(pil_img)

pil_image_scaled = pil_img.resize(res_shape, Image.BILINEAR)
torch_img_scaled = F.interpolate(torch_img.unsqueeze(0), res_shape, mode='bilinear').squeeze(0)

pil_image_scaled_on_torch = pil_to_torch(pil_image_scaled)
relative_diff = torch.abs((pil_image_scaled_on_torch - torch_img_scaled) / pil_image_scaled_on_torch).mean().item()
print('relative pixel diff:', relative_diff)

pil_image_scaled_numpy = pil_image_scaled_on_torch.cpu().numpy().transpose([1, 2, 0])
torch_img_scaled_numpy = torch_img_scaled.cpu().numpy().transpose([1, 2, 0])
plt.imsave('pil_scaled.png', pil_image_scaled_numpy)
plt.imsave('torch_scaled.png', torch_img_scaled_numpy)
plt.imsave('mean_diff.png', np.abs(pil_image_scaled_numpy - torch_img_scaled_numpy).mean(-1))

Python 3.6.6, requirements:

cycler==0.10.0
kiwisolver==1.1.0
matplotlib==3.2.1
numpy==1.18.2
Pillow==7.0.0
pyparsing==2.4.6
python-dateutil==2.8.1
six==1.14.0
torch==1.4.0
torchvision==0.5.0
Asked By: defias

||

Answers:

"Bilinear interpolation" is an interpolation method.

But downscaling an image is not necessarily only accomplished using interpolation.

It is possible to simply resample the image as a lower sampling rate, using an interpolation method to compute new samples that don’t coincide with old samples. But this leads to aliasing (which is what you get when higher frequency components in the image cannot be represented at the lower sampling density, "aliasing" the energy of these higher frequencies onto lower frequency components; that is, new low frequency components appear in the image after the resampling).

To avoid aliasing, some libraries apply a low-pass filter (remove high frequencies that cannot be represented at the lower sampling frequency) before resampling. The subsampling algorithm in these libraries do much more than just interpolating.

The difference you see is because these two libraries take different approaches, one tries to avoid aliasing by low-pass filtering, the other doesn’t.

To obtain the same results in Torch as in Pillow, you need to explicitly low-pass filter the image yourself. To get identical results you will have to figure out exactly how Pillow filters the image, there are different methods and different possible parameter settings. Looking at the source code is the best way to find out exactly what they do.

Answered By: Cris Luengo

From the documentation:
Using torch.nn.functional.interpolate with antialias=True gives you the results that match Pillow’s implementation.

Answered By: ernestchu