Concatenating two string columns of numpy array into single column in python
Question:
I have a numpy array as following :
2016-07-02 10:55:01
2016-07-02 10:55:01
2016-07-02 10:55:01
2016-07-02 17:01:34
2016-07-02 17:01:34
2016-07-02 16:59:52
2016-07-02 17:01:34
2016-07-02 16:59:52
2016-07-02 16:59:52
2016-07-02 10:40:00
2016-07-02 12:01:14
this are two columns of array. date and time. but i want both into a single column concatenated by ‘t’. both the values are in string format.
I did it by a loop as follows, but that is a bad idea and taking much time. :
for D in Data:
Data2 = np.append(Data2,np.array(D[0]+"t"+D[1]))
Please suggest an efficient solution.
Answers:
Insert the tabs t
into your array using numpy.insert
and then do a numpy.reshape
from n by 3 to n*3 by 1
Neat, but not more efficient than simple loop (as Praveen pointed out in comment):
import numpy as np
np.apply_along_axis(lambda d: d[0] + 't' + d[1], 1, arr)
import numpy as np
a=[[1],[2],[3]]
b=[[4],[5],[6]]
np.concatenate((a,b),axis=1)
-
Below method works for any two or more columns. It is very convenient if you want to concatenate multiple columns at a time, or even the whole row, because you don’t have to explicitly write d[0] + ‘t’ + d[1] + …
-
On my computer it performs 50~60% faster than apply_along_axis()
given above.
To concatenate the whole row delimited by ‘t’
result = list(['t'.join(row) for row in data])
Or if the actual row is larger and you only want to concatenate the first two columns:
result = list(['t'.join(row[0:2]) for row in data])
Performance Comparison of both methods for 10,000 iterations with a very tiny data-set (< 100 rows) :
Method
Time (ms)
Above method
350 ms
apply_along_axis()
870 ms
I have a numpy array as following :
2016-07-02 10:55:01
2016-07-02 10:55:01
2016-07-02 10:55:01
2016-07-02 17:01:34
2016-07-02 17:01:34
2016-07-02 16:59:52
2016-07-02 17:01:34
2016-07-02 16:59:52
2016-07-02 16:59:52
2016-07-02 10:40:00
2016-07-02 12:01:14
this are two columns of array. date and time. but i want both into a single column concatenated by ‘t’. both the values are in string format.
I did it by a loop as follows, but that is a bad idea and taking much time. :
for D in Data:
Data2 = np.append(Data2,np.array(D[0]+"t"+D[1]))
Please suggest an efficient solution.
Insert the tabs t
into your array using numpy.insert
and then do a numpy.reshape
from n by 3 to n*3 by 1
Neat, but not more efficient than simple loop (as Praveen pointed out in comment):
import numpy as np
np.apply_along_axis(lambda d: d[0] + 't' + d[1], 1, arr)
import numpy as np
a=[[1],[2],[3]]
b=[[4],[5],[6]]
np.concatenate((a,b),axis=1)
-
Below method works for any two or more columns. It is very convenient if you want to concatenate multiple columns at a time, or even the whole row, because you don’t have to explicitly write d[0] + ‘t’ + d[1] + …
-
On my computer it performs 50~60% faster than
apply_along_axis()
given above.
To concatenate the whole row delimited by ‘t’
result = list(['t'.join(row) for row in data])
Or if the actual row is larger and you only want to concatenate the first two columns:
result = list(['t'.join(row[0:2]) for row in data])
Performance Comparison of both methods for 10,000 iterations with a very tiny data-set (< 100 rows) :
Method | Time (ms) |
---|---|
Above method | 350 ms |
apply_along_axis() |
870 ms |