Fastest way to pack a list of floats into bytes in python

Question:

I have a list of say 100k floats and I want to convert it into a bytes buffer.

buf = bytes()
for val in floatList:
   buf += struct.pack('f', val)
return buf

This is quite slow. How can I make it faster using only standard Python 3.x libraries.

Asked By: MxLDevs

||

Answers:

Just tell struct how many floats you have. 100k floats takes about a 1/100th of a second on my slow laptop.

import random
import struct

floatlist = [random.random() for _ in range(10**5)]
buf = struct.pack('%sf' % len(floatlist), *floatlist)
Answered By: agf

Most of the slowness will be that you’re repeatedly appending to a bytestring. That copies the bytestring each time. Instead, you should use b''.join():

import struct
packed = [struct.pack('f', val) for val in floatList]
return b''.join(packed)
Answered By: craigds

As with strings, using .join() will be faster than continually concatenating. Eg:

import struct
b = bytes()
floatList = [5.4, 3.5, 7.3, 6.8, 4.6]
b = b.join((struct.pack('f', val) for val in floatList))

Results in:

b'xcdxccxac@x00x00`@x9ax99xe9@x9ax99xd9@33x93@'
Answered By: Gareth Latty

That should work:

return struct.pack('f' * len(floatList), *floatList)
Answered By: katzenversteher

You can use ctypes, and have a double-array (or float array) exactly as you’d have in C , instead of keeping your data in a list. This is fair low level, but is a recommendation if you need great performance and if your list is of a fixed size.

You can create the equivalent of a C
double array[100];
in Python by doing:

array = (ctypes.c_double * 100)()

The ctypes.c_double * 100 expression yields a Python class for an array of doubles, 100 items long. To wire it to a file, you can just use buffer to get its contents:

>>> f = open("bla.dat", "wb")
>>> f.write(buffer(array))

If your data is already in a Python list, packing it into a double array may or may not be faster than calling structas in Agf’s accepted answer – I will leave measuring which is faster as homework, but all the code you need is this:

>>> import ctypes
>>> array = (ctypes.c_double * len(floatlist))(*floatlist)

To see it as a string, just do: str(buffer(array)) – the one drawback here is that you have to take care of float size (float vs double) and CPU dependent float type – the struct module can take care of this for you.

The big win is that with a float array you can still use the elements as numbers, by accessing then just as if it where a plain Python list, while having then readily available as a planar memory region with buffer.

Answered By: jsbueno

As you say that you really do want single-precision ‘f’ floats, you might like to try the array module (in the the standard library since 1.x).

>>> mylist = []
>>> import array
>>> myarray = array.array('f')
>>> for guff in [123.45, -987.654, 1.23e-20]:
...    mylist.append(guff)
...    myarray.append(guff)
...
>>> mylist
[123.45, -987.654, 1.23e-20]
>>> myarray
array('f', [123.44999694824219, -987.6539916992188, 1.2299999609665927e-20])
>>> import struct
>>> mylistb = struct.pack(str(len(mylist)) + 'f', *mylist)
>>> myarrayb = myarray.tobytes()
>>> myarrayb == mylistb
True
>>> myarrayb
b'fxe6xf6Bxdbxe9vxc4&Whx1e'

This can save you a bag-load of memory, while still having a variable-length container with most of the list methods. The array.array approach takes 4 bytes per single-precision float. The list approach consumes a pointer to a Python float object (4 or 8 bytes) plus the size of that object; on a 32-bit CPython implementation, that is 16:

>>> import sys
>>> sys.getsizeof(123.456)
16

Total: 20 bytes per item best case for a list, 4 bytes per item always for an array.array('f').

Answered By: John Machin

For array of single precision float there are two options: to use struct or array.

In[103]: import random
import struct
from array import array

floatlist = [random.random() for _ in range(10**5)]

In[104]: %timeit struct.pack('%sf' % len(floatlist), *floatlist)
100 loops, best of 3: 2.86 ms per loop

In[105]: %timeit array('f', floatlist).tostring()
100 loops, best of 3: 4.11 ms per loop

So struct is faster.

Answered By: Sklavit

In my opinion the best way is to create a cycle:

e.g.

import struct 
file_i="test.txt"
fd_out= open ("test_bin_file",'wb')
b = bytes()
f_i = open(file_i, 'r')
for riga in file(file_i):
     line = riga
     print i,float(line)
     i+=1
     b=struct.pack('f',float(line))
     fd_out.write(b)
     fd_out.flush()


fd_out.close()

To append to an existing file use instead:

fd_out= open ("test_bin_file",'ab')
Answered By: Roberto Marzocchi

A couple of answers suggest

import struct
buf = struct.pack(f'{len(floatlist)}f', *floatlist)

but the use of ‘*‘ needlessly converts floatlist to a tuple before passing it to struct.pack. It’s faster to avoid that, by first creating an empty buffer, and then populating it using slice assignment:

import ctypes
buf = (ctypes.c_double * len(floatlist))()
buf[:] = floatlist

Other performance savings some people might be able to use:

  • You can reuse an existing buffer by just doing the assignment again, without having to create a new buffer.
  • You can modify parts of an existing buffer by assigning to the appropriate slice.
Answered By: Jonathan Hartley
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.