Concatenating two string columns of numpy array into single column in python
Question:
I have a numpy array as following :
20160702 10:55:01
20160702 10:55:01
20160702 10:55:01
20160702 17:01:34
20160702 17:01:34
20160702 16:59:52
20160702 17:01:34
20160702 16:59:52
20160702 16:59:52
20160702 10:40:00
20160702 12:01:14
this are two columns of array. date and time. but i want both into a single column concatenated by ‘t’. both the values are in string format.
I did it by a loop as follows, but that is a bad idea and taking much time. :
for D in Data:
Data2 = np.append(Data2,np.array(D[0]+"t"+D[1]))
Please suggest an efficient solution.
Answers:
Insert the tabs t
into your array using numpy.insert
and then do a numpy.reshape
from n by 3 to n*3 by 1
Neat, but not more efficient than simple loop (as Praveen pointed out in comment):
import numpy as np
np.apply_along_axis(lambda d: d[0] + 't' + d[1], 1, arr)
import numpy as np
a=[[1],[2],[3]]
b=[[4],[5],[6]]
np.concatenate((a,b),axis=1)

Below method works for any two or more columns. It is very convenient if you want to concatenate multiple columns at a time, or even the whole row, because you don’t have to explicitly write d[0] + ‘t’ + d[1] + …

On my computer it performs 50~60% faster than
apply_along_axis()
given above.
To concatenate the whole row delimited by ‘t’
result = list(['t'.join(row) for row in data])
Or if the actual row is larger and you only want to concatenate the first two columns:
result = list(['t'.join(row[0:2]) for row in data])
Performance Comparison of both methods for 10,000 iterations with a very tiny dataset (< 100 rows) :
Method  Time (ms) 

Above method  350 ms 
apply_along_axis() 
870 ms 