Sorting Data from a Data Frame to a specific format using Pandas Python
Question:
I have a got below mentioned Dataframe extracted from an unstructured txt. file. I need it sorted as shown below mentioned example format in the exact order. Please also note that when capturing the date only the date next to(or following) TDY should be captured.
Please note this is a simplified version of the Data frame and the data continues for 3 months.
Dummy initial txt file
https://drive.google.com/drive/u/0/folders/1CRzCMRo_TV2B-6ZDPA0kb7S7DGuDg7IY
Dataframe
0
5
36
43
79
1
TDY
01JUN2022
1.2
TOM
1.2
2
SPT
05JUN2022
1.4
1WK
1.4
3
2WK
01JUN2022
1.5
1MO
1.6
4
2MO
06JUN2022
1.7
3MO
1.7
5
6MO
01JUN2022
1.4
9MO
1.8
6
1YR
01JUN2022
1.7
2YR
1.3
7
3YR
01JUN2022
1.2
4YR
1.2
8
5YR
01JUN2022
1.4
6YR
1.4
9
TDY
02JUN2022
1.2
TOM
1.2
10
SPT
02JUN2022
1.4
1WK
1.4
11
2WK
07JUN2022
1.5
1MO
1.6
12
2MO
02JUN2022
1.7
3MO
1.7
13
6MO
02JUN2022
1.4
9MO
1.8
14
1YR
02JUN2022
1.7
2YR
1.3
15
3YR
02JUN2022
1.2
4YR
1.2
16
5YR
07JUN2022
1.4
6YR
1.4
17
TDY
03JUN2022
1.2
TOM
1.2
18
SPT
03JUN2022
1.4
1WK
1.4
19
2WK
03JUN2022
1.5
1MO
1.6
20
2MO
09JUN2022
1.7
3MO
1.7
21
6MO
03JUN2022
1.4
9MO
1.8
22
1YR
02JUN2022
1.7
2YR
1.3
23
3YR
03JUN2022
1.2
4YR
1.2
24
5YR
03JUN2022
1.4
6YR
1.4
Output required
Date
TDY
TOM
SPT
1WK
2WK
1MO
2MO
3MO
6MO
9MO
1YR
2YR
3YR
4YR
5YR
6YR
01JUN2022
1.2
1.2
1.4
1.4
1.5
1.6
1.7
1.7
1.4
1.8
1.7
1.3
1.2
1.2
1.4
1.4
02JUN2022
1.2
1.2
1.4
1.4
1.5
1.6
1.7
1.7
1.4
1.8
1.7
1.3
1.2
1.2
1.4
1.4
03JUN2022
1.2
1.2
1.4
1.4
1.5
1.6
1.7
1.7
1.4
1.8
1.7
1.3
1.2
1.2
1.4
1.4
Answers:
Not sure if you wanted to keep that date format or not?
dates_df = pd.to_datetime(df.loc[df["0"] == "TDY"]["5"]).reset_index(drop=True)
column_order = ["TDY", "TOM", "SPT", "1WK", "2WK", "1MO", "2MO", "3MO",
"6MO", "9MO", "1YR", "2YR", "3YR", "4YR", "5YR", "6YR"]
header_df = df[["0", "43"]].melt(value_name="headers")
value_df = df[["36", "79"]].melt(value_name="values")
melted_df = pd.concat([header_df, value_df], axis=1)[["headers", "values"]]
grouped_mapping = melted_df.groupby("headers")["values"].apply(list).to_dict()
grouped_df = pd.DataFrame(grouped_mapping).reindex(columns=column_order)
final_df = pd.merge(dates_df, grouped_df, left_index=True, right_index=True).rename(columns={"5": "Dates"})
print(final_df)
Dates TDY TOM SPT 1WK 2WK 1MO ... 9MO 1YR 2YR 3YR 4YR 5YR 6YR
0 2022-06-01 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
1 2022-06-02 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
2 2022-06-03 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
I have a got below mentioned Dataframe extracted from an unstructured txt. file. I need it sorted as shown below mentioned example format in the exact order. Please also note that when capturing the date only the date next to(or following) TDY should be captured.
Please note this is a simplified version of the Data frame and the data continues for 3 months.
Dummy initial txt file
https://drive.google.com/drive/u/0/folders/1CRzCMRo_TV2B-6ZDPA0kb7S7DGuDg7IY
Dataframe
0 | 5 | 36 | 43 | 79 | |
---|---|---|---|---|---|
1 | TDY | 01JUN2022 | 1.2 | TOM | 1.2 |
2 | SPT | 05JUN2022 | 1.4 | 1WK | 1.4 |
3 | 2WK | 01JUN2022 | 1.5 | 1MO | 1.6 |
4 | 2MO | 06JUN2022 | 1.7 | 3MO | 1.7 |
5 | 6MO | 01JUN2022 | 1.4 | 9MO | 1.8 |
6 | 1YR | 01JUN2022 | 1.7 | 2YR | 1.3 |
7 | 3YR | 01JUN2022 | 1.2 | 4YR | 1.2 |
8 | 5YR | 01JUN2022 | 1.4 | 6YR | 1.4 |
9 | TDY | 02JUN2022 | 1.2 | TOM | 1.2 |
10 | SPT | 02JUN2022 | 1.4 | 1WK | 1.4 |
11 | 2WK | 07JUN2022 | 1.5 | 1MO | 1.6 |
12 | 2MO | 02JUN2022 | 1.7 | 3MO | 1.7 |
13 | 6MO | 02JUN2022 | 1.4 | 9MO | 1.8 |
14 | 1YR | 02JUN2022 | 1.7 | 2YR | 1.3 |
15 | 3YR | 02JUN2022 | 1.2 | 4YR | 1.2 |
16 | 5YR | 07JUN2022 | 1.4 | 6YR | 1.4 |
17 | TDY | 03JUN2022 | 1.2 | TOM | 1.2 |
18 | SPT | 03JUN2022 | 1.4 | 1WK | 1.4 |
19 | 2WK | 03JUN2022 | 1.5 | 1MO | 1.6 |
20 | 2MO | 09JUN2022 | 1.7 | 3MO | 1.7 |
21 | 6MO | 03JUN2022 | 1.4 | 9MO | 1.8 |
22 | 1YR | 02JUN2022 | 1.7 | 2YR | 1.3 |
23 | 3YR | 03JUN2022 | 1.2 | 4YR | 1.2 |
24 | 5YR | 03JUN2022 | 1.4 | 6YR | 1.4 |
Output required
Date | TDY | TOM | SPT | 1WK | 2WK | 1MO | 2MO | 3MO | 6MO | 9MO | 1YR | 2YR | 3YR | 4YR | 5YR | 6YR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01JUN2022 | 1.2 | 1.2 | 1.4 | 1.4 | 1.5 | 1.6 | 1.7 | 1.7 | 1.4 | 1.8 | 1.7 | 1.3 | 1.2 | 1.2 | 1.4 | 1.4 |
02JUN2022 | 1.2 | 1.2 | 1.4 | 1.4 | 1.5 | 1.6 | 1.7 | 1.7 | 1.4 | 1.8 | 1.7 | 1.3 | 1.2 | 1.2 | 1.4 | 1.4 |
03JUN2022 | 1.2 | 1.2 | 1.4 | 1.4 | 1.5 | 1.6 | 1.7 | 1.7 | 1.4 | 1.8 | 1.7 | 1.3 | 1.2 | 1.2 | 1.4 | 1.4 |
Not sure if you wanted to keep that date format or not?
dates_df = pd.to_datetime(df.loc[df["0"] == "TDY"]["5"]).reset_index(drop=True)
column_order = ["TDY", "TOM", "SPT", "1WK", "2WK", "1MO", "2MO", "3MO",
"6MO", "9MO", "1YR", "2YR", "3YR", "4YR", "5YR", "6YR"]
header_df = df[["0", "43"]].melt(value_name="headers")
value_df = df[["36", "79"]].melt(value_name="values")
melted_df = pd.concat([header_df, value_df], axis=1)[["headers", "values"]]
grouped_mapping = melted_df.groupby("headers")["values"].apply(list).to_dict()
grouped_df = pd.DataFrame(grouped_mapping).reindex(columns=column_order)
final_df = pd.merge(dates_df, grouped_df, left_index=True, right_index=True).rename(columns={"5": "Dates"})
print(final_df)
Dates TDY TOM SPT 1WK 2WK 1MO ... 9MO 1YR 2YR 3YR 4YR 5YR 6YR
0 2022-06-01 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
1 2022-06-02 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
2 2022-06-03 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4