How to output list of floats to a binary file in Python

Question:

I have a list of floating-point values in Python:

floats = [3.14, 2.7, 0.0, -1.0, 1.1]

I would like to write these values out to a binary file using IEEE 32-bit encoding. What is the best way to do this in Python? My list actually contains about 200 MB of data, so something “not too slow” would be best.

Since there are 5 values, I just want a 20-byte file as output.

Asked By: Frank Krueger

||

Answers:

have a look at struct.pack_into

Answered By: SilentGhost

See: Python’s struct module

import struct
s = struct.pack('f'*len(floats), *floats)
f = open('file','wb')
f.write(s)
f.close()
Answered By: thesamet

struct.pack() looks like what you need.

http://docs.python.org/library/struct.html

Answered By: joshdick

The array module in the standard library may be more suitable for this task than the struct module which everybody is suggesting. Performance with 200 MB of data should be substantially better with array.

If you’d like to take at a variety of options, try profiling on your system with something like this

Answered By: Alex Martelli

Alex is absolutely right, it’s more efficient to do it this way:

from array import array
output_file = open('file', 'wb')
float_array = array('d', [3.14, 2.7, 0.0, -1.0, 1.1])
float_array.tofile(output_file)
output_file.close()

And then read the array like that:

input_file = open('file', 'rb')
float_array = array('d')
float_array.fromstring(input_file.read())

array.array objects also have a .fromfile method which can be used for reading the file, if you know the count of items in advance (e.g. from the file size, or some other mechanism)

Answered By: Nadia Alramli

I’m not sure how NumPy will compare performance-wise for your application, but it may be worth investigating.

Using NumPy:

from numpy import array
a = array(floats,'float32')
output_file = open('file', 'wb')
a.tofile(output_file)
output_file.close()

results in a 20 byte file as well.

Answered By: Grant

I ran into a similar issue while inadvertently writing a 100+ GB csv file. The answers here were extremely helpful but, to get to the bottom of it, I profiled all of the solutions mentioned and then some. All profiling runs were done on a 2014 Macbook Pro with a SSD using python 2.7. From what I’m seeing, the struct approach is definitely the fastest from a performance point of view:

6.465 seconds print_approach    print list of floats
4.621 seconds csv_approach      write csv file
4.819 seconds csvgz_approach    compress csv output using gzip
0.374 seconds array_approach    array.array.tofile
0.238 seconds numpy_approach    numpy.array.tofile
0.178 seconds struct_approach   struct.pack method
Answered By: dino

My “answer” is really a comment on the various answers. I can’t comment since I don’t have 50 reputation.

If the file is to be read back by Python, then use the “pickle” module. This one tool can read and write many things in binary.

But the way the question is asked, saying “IEEE 32-bit encoding”, it sounded like the file will be read back in other languages. In that case the byte order should be specified. The problem is that most machines are x86, with little-endian byte order, but the number one data processing language is Java/JVM, using the big-endian byte order. So the Python “tofile()” will use C, which uses little endian due to the machine being little-endian, and then the data processing code on Java/JVM will decode using big endian, leading to error.

To work with JVM:

# convert to bytes, BIG endian, for use by Java
import struct
f = [3.14, 2.7, 0.0, -1.0, 1.1]
b = struct.pack('>'+'f'*len(f), *f)

with open("f.bin", "wb") as file:
    file.write(b)

On the Java side:

try(var stream = new DataInputStream(new FileInputStream("f.bin")))
{
    for(int i = 0; i < 5; i++)
        System.out.println(stream.readFloat());
}
catch(Exception ex) {}

The problem now is the Python 'f'*len(f) code – hopefully the interpreter doesn’t actually create a super long “ffffff…” string.

I would use numpy array and byteswap

import numpy, sys
f = numpy.array([3.14, 2.7, 0.0, -1.0, 1.1], dtype=numpy.float32)

if sys.byteorder == "little":
    f.byteswap().tofile("f.bin") # using BIG endian, for use by Java
else:
    f.tofile("f.bin")
Answered By: louis
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.