Why do image size differ when vertical vs horizontal?

Question:

Tried to create a random image with PIL as per the example:

import numpy
from PIL import image

a = numpy.random.rand(48,84)
img = Image.fromarray(a.astype('uint8')).convert('1')
print(len(img.tobytes()))

This particular code will output 528.
Wen we flip the numbers of the numpy array:

a = numpy.random.rand(84,48)

The output we get is 504.
Why is that?

I was expecting for the byte number to be the same, since the numpy arrays are the same size.

Asked By: pgilfc

||

Answers:

When you call tobytes() on the boolean array*, the data is likely encoded per row. In your second example, there are 48 booleans in each row of img. So each row can be represented with 6 bytes (48 bits). 6 bytes * 84 rows = 504 bytes in img. However, in your first example, there are 84 pixels per row, which is not divisible by 8. In this case, the encoder represents each row with 11 bytes (88 bits). There are 4 extra bits of padding per row. So now the total size is 11 bytes * 48 rows = 528 bytes.

If you test a bunch of random input shapes for a 2d boolean array to encode, you will find that when the number of elements per row is divisible by 8, the number of total bytes in the encoding is equal to the width * height / 8. However, when the row length is not divisible by 8, the encoding will contain more bytes because it has to pad each row with between 1 and 7 bits.

In summary – ideally, we would want to store eight boolean values per byte, but this is complicated by the fact that the row length isn’t always divisible by 8, and the encoder serializes the array by row.

Edit for clarification: *the PIL.Image object in mode "1" (binary or "bilevel" image) effectively represents a boolean array. In mode 1, the original image (in this case, the numpy array a) is thresholded to convert it to a binary image.

Answered By: Noah