Performance-warning when operating on dataframe
Question:
This code results in a performance warning, but i have a hard time optimizing it.
for i in range(len(data['Vektoren'][0])):
tmp_lst = []
for v in data['Vektoren']:
tmp_lst.append(v[i])
data[i] = tmp_lst
DataFrame is highly fragmented. This is usually the result of calling frame.insert
many times, which has poor performance. Consider joining all columns at once usi
ng pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
Answers:
You seem to want to convert your Series of lists/arrays into several columns.
Rather use:
data = data.join(pd.DataFrame(data['Vektoren'].tolist(), index=data.index))
Or:
data = pd.concat([data, pd.DataFrame(data['Vektoren'].tolist(), index=data.index)],
axis=1)
Example output:
Vektoren 0 1 2 3
0 [1, 2, 3, 4] 1.0 2.0 3.0 4.0
1 [5, 6] 5.0 6.0 NaN NaN
2 [] NaN NaN NaN NaN
Used input:
data = pd.DataFrame({'Vektoren': [[1,2,3,4],[5,6],[]]})
This code results in a performance warning, but i have a hard time optimizing it.
for i in range(len(data['Vektoren'][0])):
tmp_lst = []
for v in data['Vektoren']:
tmp_lst.append(v[i])
data[i] = tmp_lst
DataFrame is highly fragmented. This is usually the result of calling frame.insert
many times, which has poor performance. Consider joining all columns at once usi
ng pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
You seem to want to convert your Series of lists/arrays into several columns.
Rather use:
data = data.join(pd.DataFrame(data['Vektoren'].tolist(), index=data.index))
Or:
data = pd.concat([data, pd.DataFrame(data['Vektoren'].tolist(), index=data.index)],
axis=1)
Example output:
Vektoren 0 1 2 3
0 [1, 2, 3, 4] 1.0 2.0 3.0 4.0
1 [5, 6] 5.0 6.0 NaN NaN
2 [] NaN NaN NaN NaN
Used input:
data = pd.DataFrame({'Vektoren': [[1,2,3,4],[5,6],[]]})