High performance (python) library for reading tiff files?

Question:

I am using code to read a .tiff file in order to calculate a fractal dimension. My code looks like this:

import matplotlib.pyplot as plt

raster = plt.imread('xyz.tif')

for i in range(x1, x2):
    for j in range(y1, y2):
        pixel = raster[i][j]

This works, but I have to read a lot of pixels so I would like this to be fast, and ideally minimize electricity usage given current events. Is there a better library than matplotlib for this purpose? For example, could using a library specialized for matrix operations such as pandas help? Additionally, would another language such as C have better performance than python?

Asked By: HAL

||

Answers:

I am not sure which library is the fastest but I have very good experience with Pillow:

from PIL import Image
raster = Image.open('xyz.tif')

then you could convert it to a numpy array:

import numpy
pixels = numpy.array(raster)

I would need to see the rest of the code to be able to recommend any other libraries. As for the language C++ or C would have better performance as they are low level languages. So depends on how complex your operations are and how much data you need to process, C++ scripts were shown to be 10-200x faster(increasing with the complexity of calculations). Hope this helps if you have any further questions just ask.

Answered By: Jan Hrubec

Edit: @cgohlke in the comments and others have found that cv2 is slower than tifffile for large and/or compressed images. It is best you test the different options on realistic data for your application.

I have found cv2 to be the fastest library for this. Using 5000 128×128 uint16 tif images gives the following result:

import time
import matplotlib.pyplot as plt
t0 = time.time()
for file in files:
    raster = plt.imread(file)
print(f'{time.time()-t0:.2f} s')

1.52 s

import time
from PIL import Image
t0 = time.time()
for file in files:
    im = np.array(Image.open(file))
print(f'{time.time()-t0:.2f} s')

1.42 s

import time
import tifffile
t0 = time.time()
for file in files:
    im = tifffile.imread(file)
print(f'{time.time()-t0:.2f} s')

1.25 s

import time
import cv2
t0 = time.time()
for file in files:
    im = cv2.imread(file, cv2.IMREAD_UNCHANGED)
print(f'{time.time()-t0:.2f} s')

0.20 s

cv2 is a computer vision library written in c++, which as the other commenter mentioned is much faster than pure python. Note the cv2.IMREAD_UNCHANGED flag, otherwise cv2 will convert monochrome images to 8-bit rgb.

Answered By: trygvrad
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.