Convert string to numpy array
Question:
Input:
mystr = "100110"
Desired output numpy array:
mynumpy == np.array([1, 0, 0, 1, 1, 0])
I have tried:
np.fromstring(mystr, dtype=int, sep='')
but the problem is I can’t split my string to every digit of it, so numpy takes it as an one number. Any idea how to convert my string to numpy array?
Answers:
list
may help you do that.
import numpy as np
mystr = "100110"
print np.array(list(mystr))
# ['1' '0' '0' '1' '1' '0']
If you want to get numbers instead of string:
print np.array(list(mystr), dtype=int)
# [1 0 0 1 1 0]
You could read them as ASCII characters then subtract 48 (the ASCII value of 0
). This should be the fastest way for large strings.
>>> np.fromstring("100110", np.int8) - 48
array([1, 0, 0, 1, 1, 0], dtype=int8)
Alternatively, you could convert the string to a list of integers first:
>>> np.array(map(int, "100110"))
array([1, 0, 0, 1, 1, 0])
Edit: I did some quick timing and the first method is over 100x faster than converting it to a list first.
Adding to above answers, numpy now gives a deprecation warning when you use fromstring
DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
.
A better option is to use the fromiter
. It performs twice as fast. This is what I got in jupyter notebook –
import numpy as np
mystr = "100110"
np.fromiter(mystr, dtype=int)
>> array([1, 0, 0, 1, 1, 0])
# Time comparison
%timeit np.array(list(mystr), dtype=int)
>> 3.5 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.fromstring(mystr, np.int8) - 48
>> 3.52 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.fromiter(mystr, dtype=int)
1.75 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Input:
mystr = "100110"
Desired output numpy array:
mynumpy == np.array([1, 0, 0, 1, 1, 0])
I have tried:
np.fromstring(mystr, dtype=int, sep='')
but the problem is I can’t split my string to every digit of it, so numpy takes it as an one number. Any idea how to convert my string to numpy array?
list
may help you do that.
import numpy as np
mystr = "100110"
print np.array(list(mystr))
# ['1' '0' '0' '1' '1' '0']
If you want to get numbers instead of string:
print np.array(list(mystr), dtype=int)
# [1 0 0 1 1 0]
You could read them as ASCII characters then subtract 48 (the ASCII value of 0
). This should be the fastest way for large strings.
>>> np.fromstring("100110", np.int8) - 48
array([1, 0, 0, 1, 1, 0], dtype=int8)
Alternatively, you could convert the string to a list of integers first:
>>> np.array(map(int, "100110"))
array([1, 0, 0, 1, 1, 0])
Edit: I did some quick timing and the first method is over 100x faster than converting it to a list first.
Adding to above answers, numpy now gives a deprecation warning when you use fromstring
DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead
.
A better option is to use the fromiter
. It performs twice as fast. This is what I got in jupyter notebook –
import numpy as np
mystr = "100110"
np.fromiter(mystr, dtype=int)
>> array([1, 0, 0, 1, 1, 0])
# Time comparison
%timeit np.array(list(mystr), dtype=int)
>> 3.5 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.fromstring(mystr, np.int8) - 48
>> 3.52 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.fromiter(mystr, dtype=int)
1.75 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)