Split dataframe columns value prefix and make headers in python dataframe
Question:
I have a traffic data that looks like this. Here, each column have data in format meters:seconds
. Like in row 1 column 2, 57:9
represents 57
meters and 9
seconds.
0
1
2
3
4
5
6
7
8
9
0:0
57:9
166:34
178:37
203:44
328:63
344:65
436:77
737:108
None
0:0
166:34
178:37
203:43
328:61
436:74
596:51
737:106
None
None
0:0
57:6
166:30
178:33
203:40
328:62
344:64
436:74
596:91
None
0:0
203:43
328:61
None
None
None
None
None
None
None
0:0
57:7
166:20
178:43
203:10
328:61
None
None
None
None
I want to extract meters
values from the dataframe and store them in a list in ascending order. Then create a new dataframe in which the the column header will be the meters
value (present in the list). Then it will match the meter
value in the parent dataframe and add the corresponding second
value beneath. The missing meters:second
pair should be replaced by NaN
and the current pair at the position would move to next column within same row.
The desired outcome is:
list = [0,57,166,178,203,328,344,436,596,737]
dataframe:
0
57
166
178
203
328
344
436
596
737
0
9
34
37
44
63
65
77
NaN
108
0
NaN
34
37
43
61
NaN
74
51
106
0
6
30
33
40
62
64
74
91
None
0
NaN
NaN
NaN
43
61
None
None
None
None
0
7
20
43
10
61
None
None
None
None
I know I must use a loop to iterate over whole dataframe. I am new to python so I am unable to solve this. I tried using str.split()
but it work only on 1 column. I have 98 columns and 290 rows. This is just one month data. I will be having 12 month data. So, need suggestions and help.
Answers:
Try:
tmp = df1.apply(
lambda x: dict(
map(int, val.split(":"))
for val in x
if isinstance(val, str) and ":" in val
),
axis=1,
).to_list()
out = pd.DataFrame(tmp)
print(out[sorted(out.columns)])
Prints:
0 57 166 178 203 328 344 436 596 737
0 0 9.0 34.0 37.0 44 63 65.0 77.0 NaN 108.0
1 0 NaN 34.0 37.0 43 61 NaN 74.0 51.0 106.0
2 0 6.0 30.0 33.0 40 62 64.0 74.0 91.0 NaN
3 0 NaN NaN NaN 43 61 NaN NaN NaN NaN
4 0 7.0 20.0 43.0 10 61 NaN NaN NaN NaN
I have a traffic data that looks like this. Here, each column have data in format meters:seconds
. Like in row 1 column 2, 57:9
represents 57
meters and 9
seconds.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
0:0 | 57:9 | 166:34 | 178:37 | 203:44 | 328:63 | 344:65 | 436:77 | 737:108 | None |
0:0 | 166:34 | 178:37 | 203:43 | 328:61 | 436:74 | 596:51 | 737:106 | None | None |
0:0 | 57:6 | 166:30 | 178:33 | 203:40 | 328:62 | 344:64 | 436:74 | 596:91 | None |
0:0 | 203:43 | 328:61 | None | None | None | None | None | None | None |
0:0 | 57:7 | 166:20 | 178:43 | 203:10 | 328:61 | None | None | None | None |
I want to extract meters
values from the dataframe and store them in a list in ascending order. Then create a new dataframe in which the the column header will be the meters
value (present in the list). Then it will match the meter
value in the parent dataframe and add the corresponding second
value beneath. The missing meters:second
pair should be replaced by NaN
and the current pair at the position would move to next column within same row.
The desired outcome is:
list = [0,57,166,178,203,328,344,436,596,737]
dataframe:
0 | 57 | 166 | 178 | 203 | 328 | 344 | 436 | 596 | 737 |
---|---|---|---|---|---|---|---|---|---|
0 | 9 | 34 | 37 | 44 | 63 | 65 | 77 | NaN | 108 |
0 | NaN | 34 | 37 | 43 | 61 | NaN | 74 | 51 | 106 |
0 | 6 | 30 | 33 | 40 | 62 | 64 | 74 | 91 | None |
0 | NaN | NaN | NaN | 43 | 61 | None | None | None | None |
0 | 7 | 20 | 43 | 10 | 61 | None | None | None | None |
I know I must use a loop to iterate over whole dataframe. I am new to python so I am unable to solve this. I tried using str.split()
but it work only on 1 column. I have 98 columns and 290 rows. This is just one month data. I will be having 12 month data. So, need suggestions and help.
Try:
tmp = df1.apply(
lambda x: dict(
map(int, val.split(":"))
for val in x
if isinstance(val, str) and ":" in val
),
axis=1,
).to_list()
out = pd.DataFrame(tmp)
print(out[sorted(out.columns)])
Prints:
0 57 166 178 203 328 344 436 596 737
0 0 9.0 34.0 37.0 44 63 65.0 77.0 NaN 108.0
1 0 NaN 34.0 37.0 43 61 NaN 74.0 51.0 106.0
2 0 6.0 30.0 33.0 40 62 64.0 74.0 91.0 NaN
3 0 NaN NaN NaN 43 61 NaN NaN NaN NaN
4 0 7.0 20.0 43.0 10 61 NaN NaN NaN NaN