Element-wise string concatenation in numpy
Question:
Is this a bug?
import numpy as np
a1=np.array(['a','b'])
a2=np.array(['E','F'])
In [20]: add(a1,a2)
Out[20]: NotImplemented
I am trying to do element-wise string concatenation. I thought Add() was the way to do it in numpy but obviously it is not working as expected.
Answers:
This can (and should) be done in pure Python, as numpy
also uses the Python string manipulation functions internally:
>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> map(''.join, zip(a1, a2))
['aE', 'bF']
This can be done using numpy.core.defchararray.add. Here is an example:
>>> import numpy as np
>>> a1 = np.array(['a', 'b'])
>>> a2 = np.array(['E', 'F'])
>>> np.core.defchararray.add(a1, a2)
array(['aE', 'bF'],
dtype='<U2')
There are other useful string operations available for NumPy data types.
Another solution is to convert string arrays into arrays of python of objects so that str.add is called:
>>> import numpy as np
>>> a = np.array(['a', 'b', 'c', 'd'], dtype=np.object)
>>> print a+a
array(['aa', 'bb', 'cc', 'dd'], dtype=object)
This is not that slow (less than twice as slow as adding integer arrays).
You can use the chararray
subclass to perform array operations with strings:
a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['E', 'F'])
a1 + a2
#chararray(['aE', 'bF'], dtype='|S2')
another nice example:
b = np.array([2, 4])
a1*b
#chararray(['aa', 'bbbb'], dtype='|S4')
One more basic, elegant and fast solution:
In [11]: np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
Out[11]: array(['aE', 'bF'], dtype='<U2')
It is very fast for smaller arrays.
In [12]: %timeit np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
3.67 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.core.defchararray.add(a1, a2)
6.27 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.char.array(a1) + np.char.array(a2)
22.1 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For larger arrays, time difference is not much.
In [15]: b1 = np.full(10000,'a')
In [16]: b2 = np.full(10000,'b')
In [189]: %timeit np.array([x1 + x2 for x1,x2 in zip(b1,b2)])
6.74 ms ± 66.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [188]: %timeit np.core.defchararray.add(b1, b2)
7.03 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [187]: %timeit np.char.array(b1) + np.char.array(b2)
6.97 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Adding to Niklas B. answer as in later versions of Python this may have changed because as of Python 3.10 this will result in a map object.
To fix this you need to add the list function..
>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> list(map(''.join, zip(a1, a2))) # <--- See here we have added list()
['aE', 'bF']
Is this a bug?
import numpy as np
a1=np.array(['a','b'])
a2=np.array(['E','F'])
In [20]: add(a1,a2)
Out[20]: NotImplemented
I am trying to do element-wise string concatenation. I thought Add() was the way to do it in numpy but obviously it is not working as expected.
This can (and should) be done in pure Python, as numpy
also uses the Python string manipulation functions internally:
>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> map(''.join, zip(a1, a2))
['aE', 'bF']
This can be done using numpy.core.defchararray.add. Here is an example:
>>> import numpy as np
>>> a1 = np.array(['a', 'b'])
>>> a2 = np.array(['E', 'F'])
>>> np.core.defchararray.add(a1, a2)
array(['aE', 'bF'],
dtype='<U2')
There are other useful string operations available for NumPy data types.
Another solution is to convert string arrays into arrays of python of objects so that str.add is called:
>>> import numpy as np
>>> a = np.array(['a', 'b', 'c', 'd'], dtype=np.object)
>>> print a+a
array(['aa', 'bb', 'cc', 'dd'], dtype=object)
This is not that slow (less than twice as slow as adding integer arrays).
You can use the chararray
subclass to perform array operations with strings:
a1 = np.char.array(['a', 'b'])
a2 = np.char.array(['E', 'F'])
a1 + a2
#chararray(['aE', 'bF'], dtype='|S2')
another nice example:
b = np.array([2, 4])
a1*b
#chararray(['aa', 'bbbb'], dtype='|S4')
One more basic, elegant and fast solution:
In [11]: np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
Out[11]: array(['aE', 'bF'], dtype='<U2')
It is very fast for smaller arrays.
In [12]: %timeit np.array([x1 + x2 for x1,x2 in zip(a1,a2)])
3.67 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.core.defchararray.add(a1, a2)
6.27 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.char.array(a1) + np.char.array(a2)
22.1 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
For larger arrays, time difference is not much.
In [15]: b1 = np.full(10000,'a')
In [16]: b2 = np.full(10000,'b')
In [189]: %timeit np.array([x1 + x2 for x1,x2 in zip(b1,b2)])
6.74 ms ± 66.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [188]: %timeit np.core.defchararray.add(b1, b2)
7.03 ms ± 419 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [187]: %timeit np.char.array(b1) + np.char.array(b2)
6.97 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Adding to Niklas B. answer as in later versions of Python this may have changed because as of Python 3.10 this will result in a map object.
To fix this you need to add the list function..
>>> a1 = ['a','b']
>>> a2 = ['E','F']
>>> list(map(''.join, zip(a1, a2))) # <--- See here we have added list()
['aE', 'bF']