Python: bytearray vs array
Question:
What is the difference between array.array('B')
and bytearray
?
from array import array
a = array('B', 'abc')
b = bytearray('abc')
a[0] = 100
b[0] = 'd'
print a
print b
Are there any memory or speed differences? What is the preferred use case of each one?
Answers:
bytearray
is the successor of Python 2.x’s string
type. It’s basically the built-in byte array type. Unlike the original string
type, it’s mutable.
The array
module, on the other hand, was created to create binary data structures to communicate with the outside world (for example, to read/write binary file formats).
Unlike bytearray
, it supports all kinds of array elements. It’s flexible.
So if you just need an array of bytes, bytearray
should work fine. If you need flexible formats (say when the element type of the array needs to be determined at runtime), array.array
is your friend.
Without looking at the code, my guess would be that bytearray
is probably faster since it doesn’t have to consider different element types. But it’s possible that array('B')
returns a bytearray
.
bytearray
has all the usual str
methods. You can thing of it as a mutable str
(bytes in Python3)
While array.array is geared to reading and writing files. ‘B’ is just a special case for array.array
You can see there is quite a difference looking at the dir()
of each
>>> dir(bytearray)
['__add__', '__alloc__', '__class__', '__contains__', '__delattr__',
'__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',
'__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append',
'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'extend',
'find', 'fromhex', 'index', 'insert', 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans',
'partition', 'pop', 'remove', 'replace', 'reverse', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> dir(array)
['__add__', '__class__', '__contains__', '__copy__', '__deepcopy__',
'__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__',
'__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',
'__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append',
'buffer_info', 'byteswap', 'count', 'extend', 'frombytes', 'fromfile',
'fromlist', 'fromstring', 'fromunicode', 'index', 'insert', 'itemsize', 'pop',
'remove', 'reverse', 'tobytes', 'tofile', 'tolist', 'tostring', 'tounicode',
'typecode']
You almost never need to use array.array
module yourself. It’s usually used for creating binary data for binary file format or protocol, like the struct
module.
bytearray
is usually used for dealing with encoded text (e.g. utf-8, ascii, etc), as opposed to Python 3’s str()
or Python 2’s unicode()
which is used for Unicode text.
Most of the time, you should be using either str() when dealing with text, or list and tuple when you need a collection of items, including numbers.
Python Patterns – An Optimization Anecdote is a good read which points to array.array('B')
as being fast. Using the timing()
function from that essay does show that array.array('B')
is faster than bytearray()
:
#!/usr/bin/env python
from array import array
from struct import pack
from timeit import timeit
from time import clock
def timing(f, n, a):
start = clock()
for i in range(n):
f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
finish = clock()
return '%st%f' % (f.__name__, finish - start)
def time_array(addr):
return array('B', addr)
def time_bytearray(addr):
return bytearray(addr)
def array_tostring(addr):
return array('B', addr).tostring()
def str_bytearray(addr):
return str(bytearray(addr))
def struct_pack(addr):
return pack('4B', *addr)
if __name__ == '__main__':
count = 10000
addr = '192.168.4.2'
addr = tuple([int(i) for i in addr.split('.')])
print('tttimingttfuncttno func')
print('%st%st%s' % (timing(time_array, count, addr),
timeit('time_array((192,168,4,2))', number=count, setup='from __main__ import time_array'),
timeit("array('B', (192,168,4,2))", number=count, setup='from array import array')))
print('%st%st%s' % (timing(time_bytearray, count, addr),
timeit('time_bytearray((192,168,4,2))', number=count, setup='from __main__ import time_bytearray'),
timeit('bytearray((192,168,4,2))', number=count)))
print('%st%st%s' % (timing(array_tostring, count, addr),
timeit('array_tostring((192,168,4,2))', number=count, setup='from __main__ import array_tostring'),
timeit("array('B', (192,168,4,2)).tostring()", number=count, setup='from array import array')))
print('%st%st%s' % (timing(str_bytearray, count, addr),
timeit('str_bytearray((192,168,4,2))', number=count, setup='from __main__ import str_bytearray'),
timeit('str(bytearray((192,168,4,2)))', number=count)))
print('%st%st%s' % (timing(struct_pack, count, addr),
timeit('struct_pack((192,168,4,2))', number=count, setup='from __main__ import struct_pack'),
timeit("pack('4B', *(192,168,4,2))", number=count, setup='from struct import pack')))
The timeit measure actually shows array.array('B')
is sometimes more than double the speed of bytearray()
I was interested specifically in the fastest way to pack an IP address into a four byte string for sorting. Looks like neither str(bytearray(addr))
nor array('B', addr).tostring()
come close to the speed of pack('4B', *addr)
.
From my test, both used amostly same size of memory but the speed of bytearry is 1.5 times of array when I create a large buffer to read and write.
from array import array
from time import time
s = time()
"""
map = array('B')
for i in xrange(256**4/8):
map.append(0)
"""
#bytearray
map = bytearray()
for i in xrange(256**4/8):
map.append(0)
print "init:", time() - s
One difference that has not been mentioned is that the end user string representations differs for bytearrays and arrays with type 'b'
.
>>> import array
>>> arr = array.array('b', [104, 105])
>>> byte_arr = bytearray([104, 105])
>>> print(arr)
array('b', [104, 105])
>>> print(byte_arr)
bytearray(b'hi')
This goes in line with the notion that bytearray
is supposed to be Python3’s (mutable) “raw” string type and assumes its data represents characters.
edit:
Another notable difference is that array.array
has a tofile
method for efficiently dumping data to a file which bytearray
and bytes
lack.
What is the difference between array.array('B')
and bytearray
?
from array import array
a = array('B', 'abc')
b = bytearray('abc')
a[0] = 100
b[0] = 'd'
print a
print b
Are there any memory or speed differences? What is the preferred use case of each one?
bytearray
is the successor of Python 2.x’s string
type. It’s basically the built-in byte array type. Unlike the original string
type, it’s mutable.
The array
module, on the other hand, was created to create binary data structures to communicate with the outside world (for example, to read/write binary file formats).
Unlike bytearray
, it supports all kinds of array elements. It’s flexible.
So if you just need an array of bytes, bytearray
should work fine. If you need flexible formats (say when the element type of the array needs to be determined at runtime), array.array
is your friend.
Without looking at the code, my guess would be that bytearray
is probably faster since it doesn’t have to consider different element types. But it’s possible that array('B')
returns a bytearray
.
bytearray
has all the usual str
methods. You can thing of it as a mutable str
(bytes in Python3)
While array.array is geared to reading and writing files. ‘B’ is just a special case for array.array
You can see there is quite a difference looking at the dir()
of each
>>> dir(bytearray)
['__add__', '__alloc__', '__class__', '__contains__', '__delattr__',
'__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',
'__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append',
'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'extend',
'find', 'fromhex', 'index', 'insert', 'isalnum', 'isalpha', 'isdigit', 'islower',
'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans',
'partition', 'pop', 'remove', 'replace', 'reverse', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> dir(array)
['__add__', '__class__', '__contains__', '__copy__', '__deepcopy__',
'__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__',
'__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__',
'__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append',
'buffer_info', 'byteswap', 'count', 'extend', 'frombytes', 'fromfile',
'fromlist', 'fromstring', 'fromunicode', 'index', 'insert', 'itemsize', 'pop',
'remove', 'reverse', 'tobytes', 'tofile', 'tolist', 'tostring', 'tounicode',
'typecode']
You almost never need to use array.array
module yourself. It’s usually used for creating binary data for binary file format or protocol, like the struct
module.
bytearray
is usually used for dealing with encoded text (e.g. utf-8, ascii, etc), as opposed to Python 3’s str()
or Python 2’s unicode()
which is used for Unicode text.
Most of the time, you should be using either str() when dealing with text, or list and tuple when you need a collection of items, including numbers.
Python Patterns – An Optimization Anecdote is a good read which points to array.array('B')
as being fast. Using the timing()
function from that essay does show that array.array('B')
is faster than bytearray()
:
#!/usr/bin/env python
from array import array
from struct import pack
from timeit import timeit
from time import clock
def timing(f, n, a):
start = clock()
for i in range(n):
f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
finish = clock()
return '%st%f' % (f.__name__, finish - start)
def time_array(addr):
return array('B', addr)
def time_bytearray(addr):
return bytearray(addr)
def array_tostring(addr):
return array('B', addr).tostring()
def str_bytearray(addr):
return str(bytearray(addr))
def struct_pack(addr):
return pack('4B', *addr)
if __name__ == '__main__':
count = 10000
addr = '192.168.4.2'
addr = tuple([int(i) for i in addr.split('.')])
print('tttimingttfuncttno func')
print('%st%st%s' % (timing(time_array, count, addr),
timeit('time_array((192,168,4,2))', number=count, setup='from __main__ import time_array'),
timeit("array('B', (192,168,4,2))", number=count, setup='from array import array')))
print('%st%st%s' % (timing(time_bytearray, count, addr),
timeit('time_bytearray((192,168,4,2))', number=count, setup='from __main__ import time_bytearray'),
timeit('bytearray((192,168,4,2))', number=count)))
print('%st%st%s' % (timing(array_tostring, count, addr),
timeit('array_tostring((192,168,4,2))', number=count, setup='from __main__ import array_tostring'),
timeit("array('B', (192,168,4,2)).tostring()", number=count, setup='from array import array')))
print('%st%st%s' % (timing(str_bytearray, count, addr),
timeit('str_bytearray((192,168,4,2))', number=count, setup='from __main__ import str_bytearray'),
timeit('str(bytearray((192,168,4,2)))', number=count)))
print('%st%st%s' % (timing(struct_pack, count, addr),
timeit('struct_pack((192,168,4,2))', number=count, setup='from __main__ import struct_pack'),
timeit("pack('4B', *(192,168,4,2))", number=count, setup='from struct import pack')))
The timeit measure actually shows array.array('B')
is sometimes more than double the speed of bytearray()
I was interested specifically in the fastest way to pack an IP address into a four byte string for sorting. Looks like neither str(bytearray(addr))
nor array('B', addr).tostring()
come close to the speed of pack('4B', *addr)
.
From my test, both used amostly same size of memory but the speed of bytearry is 1.5 times of array when I create a large buffer to read and write.
from array import array
from time import time
s = time()
"""
map = array('B')
for i in xrange(256**4/8):
map.append(0)
"""
#bytearray
map = bytearray()
for i in xrange(256**4/8):
map.append(0)
print "init:", time() - s
One difference that has not been mentioned is that the end user string representations differs for bytearrays and arrays with type 'b'
.
>>> import array
>>> arr = array.array('b', [104, 105])
>>> byte_arr = bytearray([104, 105])
>>> print(arr)
array('b', [104, 105])
>>> print(byte_arr)
bytearray(b'hi')
This goes in line with the notion that bytearray
is supposed to be Python3’s (mutable) “raw” string type and assumes its data represents characters.
edit:
Another notable difference is that array.array
has a tofile
method for efficiently dumping data to a file which bytearray
and bytes
lack.