How to convert columns to rows Datewise in Python
Question:
I have a data in the following format
Date
AA_ZZ_CR
AA_ZZ_BT
AA_XX_CR
AA_XX_BT
BB_ZZ_CR
BB_ZZ_BT
BB_XX_CR
BB_XX_BT
20230202
20
56
34
556
29
59
32
559
20230203
21
45
54
423
28
48
53
426
20230204
22
78
23
790
27
76
29
794
20230205
23
78
56
778
26
72
51
771
20230206
24
89
78
855
25
81
79
850
20230207
25
56
89
545
24
54
86
543
Want it converted into Date format.
Date
Data
ZZ_CR
ZZ_BT
XX_CR
XX_BT
20230202
AA
20
56
34
556
20230202
BB
29
59
32
559
20230203
AA
21
45
54
423
20230203
BB
28
48
53
426
Is there any way of doing that?
Answers:
You can easily get you desire output within one line if you have one similar category but here we have multiple so I have used loop, to just need to merge new column with Date.
With Pandas:
##Get the list of columns except unnecessary ones like date
columns = df.columns[1:]
#Get the set new columns headers like from AA_ZZ_BT split and get ZZ_BT
uni = set([i.split('_',1)[1] for i in df.columns[1:]])
#Now we know how many new column will need to loop over that set
for idx, u in enumerate(uni):
#Step 1 find the similar columns
find_col = columns[[u in i for i in columns]]
#Step 2 create new df with and include date columns as well
d = df[['Date'] + list(find_col)]
#step 3 using melt function pivot the table
d = d.melt(id_vars=['Date'], value_vars=find_col, var_name='Data', value_name=u)
#Now still we have to clean Data column by splitting AA from AA_ZZ_BT
d['Data'] = d['Data'].str.slice(stop=2)
#Final step if its first time then take whole d as new_df else concat the last col with new_df
new_df=d if idx==0 else pd.concat([new_df, d[u]], axis=1)
new_df #output
The problem consists mostly of manipulating and renaming some columns if you take the looping through columns approach. I am assuming your data is in a dataframe called df
.
data_entries = list(set(col[:2] for col in df.columns[1:]))
data_entries.sort()
dfs_split = []
for entry in data_entries:
# Get the Date and Data columns
cols = ['Date'] + [col for col in df.columns if entry in col]
df_data = df[cols]
# Add the Data column
df_data.insert(1, 'Data', entry)
# Take out the Data prefix on the columns
df_data = df_data.rename(lambda x: x.replace(f'{entry}_', ''), axis=1)
dfs_split.append(df_data)
df = pd.concat(dfs_split, axis=0)
df = df.sort_values(by='Date')
df
I have a data in the following format
Date | AA_ZZ_CR | AA_ZZ_BT | AA_XX_CR | AA_XX_BT | BB_ZZ_CR | BB_ZZ_BT | BB_XX_CR | BB_XX_BT |
---|---|---|---|---|---|---|---|---|
20230202 | 20 | 56 | 34 | 556 | 29 | 59 | 32 | 559 |
20230203 | 21 | 45 | 54 | 423 | 28 | 48 | 53 | 426 |
20230204 | 22 | 78 | 23 | 790 | 27 | 76 | 29 | 794 |
20230205 | 23 | 78 | 56 | 778 | 26 | 72 | 51 | 771 |
20230206 | 24 | 89 | 78 | 855 | 25 | 81 | 79 | 850 |
20230207 | 25 | 56 | 89 | 545 | 24 | 54 | 86 | 543 |
Want it converted into Date format.
Date | Data | ZZ_CR | ZZ_BT | XX_CR | XX_BT |
---|---|---|---|---|---|
20230202 | AA | 20 | 56 | 34 | 556 |
20230202 | BB | 29 | 59 | 32 | 559 |
20230203 | AA | 21 | 45 | 54 | 423 |
20230203 | BB | 28 | 48 | 53 | 426 |
Is there any way of doing that?
You can easily get you desire output within one line if you have one similar category but here we have multiple so I have used loop, to just need to merge new column with Date.
With Pandas:
##Get the list of columns except unnecessary ones like date
columns = df.columns[1:]
#Get the set new columns headers like from AA_ZZ_BT split and get ZZ_BT
uni = set([i.split('_',1)[1] for i in df.columns[1:]])
#Now we know how many new column will need to loop over that set
for idx, u in enumerate(uni):
#Step 1 find the similar columns
find_col = columns[[u in i for i in columns]]
#Step 2 create new df with and include date columns as well
d = df[['Date'] + list(find_col)]
#step 3 using melt function pivot the table
d = d.melt(id_vars=['Date'], value_vars=find_col, var_name='Data', value_name=u)
#Now still we have to clean Data column by splitting AA from AA_ZZ_BT
d['Data'] = d['Data'].str.slice(stop=2)
#Final step if its first time then take whole d as new_df else concat the last col with new_df
new_df=d if idx==0 else pd.concat([new_df, d[u]], axis=1)
new_df #output
The problem consists mostly of manipulating and renaming some columns if you take the looping through columns approach. I am assuming your data is in a dataframe called df
.
data_entries = list(set(col[:2] for col in df.columns[1:]))
data_entries.sort()
dfs_split = []
for entry in data_entries:
# Get the Date and Data columns
cols = ['Date'] + [col for col in df.columns if entry in col]
df_data = df[cols]
# Add the Data column
df_data.insert(1, 'Data', entry)
# Take out the Data prefix on the columns
df_data = df_data.rename(lambda x: x.replace(f'{entry}_', ''), axis=1)
dfs_split.append(df_data)
df = pd.concat(dfs_split, axis=0)
df = df.sort_values(by='Date')
df