High performance (python) library for reading tiff files?
Question:
I am using code to read a .tiff file in order to calculate a fractal dimension. My code looks like this:
import matplotlib.pyplot as plt
raster = plt.imread('xyz.tif')
for i in range(x1, x2):
for j in range(y1, y2):
pixel = raster[i][j]
This works, but I have to read a lot of pixels so I would like this to be fast, and ideally minimize electricity usage given current events. Is there a better library than matplotlib for this purpose? For example, could using a library specialized for matrix operations such as pandas help? Additionally, would another language such as C have better performance than python?
Answers:
I am not sure which library is the fastest but I have very good experience with Pillow:
from PIL import Image
raster = Image.open('xyz.tif')
then you could convert it to a numpy array:
import numpy
pixels = numpy.array(raster)
I would need to see the rest of the code to be able to recommend any other libraries. As for the language C++ or C would have better performance as they are low level languages. So depends on how complex your operations are and how much data you need to process, C++ scripts were shown to be 10-200x faster(increasing with the complexity of calculations). Hope this helps if you have any further questions just ask.
Edit: @cgohlke in the comments and others have found that cv2 is slower than tifffile for large and/or compressed images. It is best you test the different options on realistic data for your application.
I have found cv2
to be the fastest library for this. Using 5000 128×128 uint16 tif images gives the following result:
import time
import matplotlib.pyplot as plt
t0 = time.time()
for file in files:
raster = plt.imread(file)
print(f'{time.time()-t0:.2f} s')
1.52 s
import time
from PIL import Image
t0 = time.time()
for file in files:
im = np.array(Image.open(file))
print(f'{time.time()-t0:.2f} s')
1.42 s
import time
import tifffile
t0 = time.time()
for file in files:
im = tifffile.imread(file)
print(f'{time.time()-t0:.2f} s')
1.25 s
import time
import cv2
t0 = time.time()
for file in files:
im = cv2.imread(file, cv2.IMREAD_UNCHANGED)
print(f'{time.time()-t0:.2f} s')
0.20 s
cv2
is a computer vision library written in c++, which as the other commenter mentioned is much faster than pure python. Note the cv2.IMREAD_UNCHANGED
flag, otherwise cv2
will convert monochrome images to 8-bit rgb.
I am using code to read a .tiff file in order to calculate a fractal dimension. My code looks like this:
import matplotlib.pyplot as plt
raster = plt.imread('xyz.tif')
for i in range(x1, x2):
for j in range(y1, y2):
pixel = raster[i][j]
This works, but I have to read a lot of pixels so I would like this to be fast, and ideally minimize electricity usage given current events. Is there a better library than matplotlib for this purpose? For example, could using a library specialized for matrix operations such as pandas help? Additionally, would another language such as C have better performance than python?
I am not sure which library is the fastest but I have very good experience with Pillow:
from PIL import Image
raster = Image.open('xyz.tif')
then you could convert it to a numpy array:
import numpy
pixels = numpy.array(raster)
I would need to see the rest of the code to be able to recommend any other libraries. As for the language C++ or C would have better performance as they are low level languages. So depends on how complex your operations are and how much data you need to process, C++ scripts were shown to be 10-200x faster(increasing with the complexity of calculations). Hope this helps if you have any further questions just ask.
Edit: @cgohlke in the comments and others have found that cv2 is slower than tifffile for large and/or compressed images. It is best you test the different options on realistic data for your application.
I have found cv2
to be the fastest library for this. Using 5000 128×128 uint16 tif images gives the following result:
import time
import matplotlib.pyplot as plt
t0 = time.time()
for file in files:
raster = plt.imread(file)
print(f'{time.time()-t0:.2f} s')
1.52 s
import time
from PIL import Image
t0 = time.time()
for file in files:
im = np.array(Image.open(file))
print(f'{time.time()-t0:.2f} s')
1.42 s
import time
import tifffile
t0 = time.time()
for file in files:
im = tifffile.imread(file)
print(f'{time.time()-t0:.2f} s')
1.25 s
import time
import cv2
t0 = time.time()
for file in files:
im = cv2.imread(file, cv2.IMREAD_UNCHANGED)
print(f'{time.time()-t0:.2f} s')
0.20 s
cv2
is a computer vision library written in c++, which as the other commenter mentioned is much faster than pure python. Note the cv2.IMREAD_UNCHANGED
flag, otherwise cv2
will convert monochrome images to 8-bit rgb.