How can i write a byte object file from integers of different length of bytes

Question:

I’m new in programming and i want to write bytes from integers of different length to file ex:

    my_list = [1, 2, 3200 , 60, 72000 ]
    with open(Path , "wb") as f:
        [f.write(i.to_bytes(3, "big")) for i in my_list]

to read this file i use :

    i, s = 0, 3
    with open(Path, "rb") as f:
        rb = f.read()
    print([int.from_bytes(rb[i:i+s], 'big') for i in range(i , len(rb), s)])     

but this creats for every integer 3 bytes , and it’s not a good method , is there a method to create dynamic bytes for each int and to read also , thank you in advance

Asked By: chamso

||

Answers:

import gzip
from struct import pack, unpack
from pickle import loads, dumps
from pathlib import Path
from random import randint, seed

-there is a difference between using struct and pickle modules,
to make a file with binary format :

  • struct module allow the conversion of values wish are in original C struct to bytes objects with fixed string format defined as a parameter, if an integer needs 4 bytes to store it the other integers of 1 and 2 bytes will take 4 bytes too, this methode is static and help for only for values with same byte length, also you need to know format for unpacking, work only with values and have fast packing and unpacking

  • in contrast with pickle module wish implement recursive object(contain reference of themselves) and object sharing makes it specific for only python (can represent pointer sharing) allowing to serialize and de-serialize complexe objects and store/read/transfert it in different platforms, and because the protocole is detected automatically, no need to know the protocole whene de-serializing, https://docs.python.org/3/library/pickle.html will take less space only if the majority of numbers are small compared with others with long bytes and take more time for pickling and unpickling,it’s robust

      p_struct = Path('/tmp/my_file_st.bin.gz')
      p_pickle = Path('/tmp/my_file_pck.bin.gz')
      seed('stackoverflow')
      my_list = [random.randint(0, 16**4) for i in 
      range(1000)]
    
    
      with gzip.open(p_struct, "wb") as f: 
          f.write(pack(f'<{1000}i', *my_list))
      print(f'st_size: 
      {p_struct.stat().st_size/1000} ko')    
    
      with gzip.open(p_pickle, "wb") as f: 
          f.write(dumps(mylist))
      print(f'pk_size: 
      {p_pickle.stat().st_size/1000} ko')
    
      with gzip.open(p_struct, "rb") as rb: 
      print(unpack(f'<{1000}i', *my_list))
      with gzip.open(p_pickle, "rb") as rb: 
      print(loads(rb.read()))  
    

we can check the return: pickle return the list given , and struct return tuple of the values.
different cases with different implementations

Answered By: chamso

If you want to store the values in a format that isn’t specific to Python then using binary file with the bytes is good.

If you want variable length of bytes to represent the integers to save space, then there are other ways to do this. For example you could use the gzip format.

Here is an example that uses the Python gzip library to create a file of a 1000 integers and compare it to non-gzip file with the same content.

It also uses the Python struct library to convert between integers into bytes

import gzip
import struct
from pathlib import Path
import random

path = Path('/tmp/my_file.bin')
path_z = Path('/tmp/my_file.bin.gz')
random.seed('stackoverflow')
data_len = 1000
my_list = [random.randint(0, 16**4) for i in range(data_len)]
print(f"created list: {my_list[:4]}...")


with open(path, "wb") as f:
    data = struct.pack(f'>{data_len}I', *my_list)
    f.write(data)

with open(path, "rb") as f:
    rb = f.read()
    read_list = struct.unpack(f'>{data_len}I', rb)
print(f'Normal list: {read_list[:4]}...')
bin_file_size = path.stat().st_size
print(f"Normal Size: {bin_file_size} [bytes]")

with gzip.open(path_z, "wb") as f:
    data = struct.pack(f'>{data_len}I', *my_list)
    f.write(data)

with gzip.open(path_z, "rb") as f:
    rb = f.read()
    read_list = struct.unpack(f'>{data_len}I', rb)
print(f'gzip list: {read_list[:4]}...')
gzip_file_size = path_z.stat().st_size
print(f"gzip Size: {gzip_file_size} [bytes]")
print(f"shrunk to {gzip_file_size / bin_file_size * 100} %")

Which gave the following output:

$ python3 bytes_file.py 
created list: [36238, 568, 20603, 3324]...
Normal list: (36238, 568, 20603, 3324)...
Normal Size: 4000 [bytes]
gzip list: (36238, 568, 20603, 3324)...
gzip Size: 2804 [bytes]
shrunk to 70.1 %

These files are still readable by other programs:

$ od -A d --endian=big -t u4 --width=4 --read-bytes 16 /tmp/my_file.bin
0000000      36238
0000004        568
0000008      20603
0000012       3324

And also the gzip file:

$ gunzip -c /tmp/my_file.bin.gz | od -A d --endian=big -t u4 --width=4 --read-bytes 16
0000000      36238
0000004        568
0000008      20603
0000012       3324
Answered By: ukBaz
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.