How to convert columns to rows Datewise in Python

Question:

I have a data in the following format

Date AA_ZZ_CR AA_ZZ_BT AA_XX_CR AA_XX_BT BB_ZZ_CR BB_ZZ_BT BB_XX_CR BB_XX_BT
20230202 20 56 34 556 29 59 32 559
20230203 21 45 54 423 28 48 53 426
20230204 22 78 23 790 27 76 29 794
20230205 23 78 56 778 26 72 51 771
20230206 24 89 78 855 25 81 79 850
20230207 25 56 89 545 24 54 86 543

Want it converted into Date format.

Date Data ZZ_CR ZZ_BT XX_CR XX_BT
20230202 AA 20 56 34 556
20230202 BB 29 59 32 559
20230203 AA 21 45 54 423
20230203 BB 28 48 53 426

Is there any way of doing that?

Answers:

You can easily get you desire output within one line if you have one similar category but here we have multiple so I have used loop, to just need to merge new column with Date.

With Pandas:

##Get the list of columns except unnecessary ones like date
columns = df.columns[1:]

#Get the set new columns headers like from AA_ZZ_BT split and get ZZ_BT 
uni = set([i.split('_',1)[1] for i in df.columns[1:]])

#Now we know how many new column will need to loop over that set
for idx, u in enumerate(uni):
    
    #Step 1 find the similar columns
    find_col = columns[[u in i for i in columns]]

    #Step 2 create new df with and include date columns as well
    d = df[['Date'] + list(find_col)]

    #step 3 using melt function pivot the table
    d = d.melt(id_vars=['Date'], value_vars=find_col, var_name='Data', value_name=u)

    #Now still we have to clean Data column by splitting AA from AA_ZZ_BT
    d['Data'] = d['Data'].str.slice(stop=2)
    
    #Final step if its first time then take whole d as new_df else concat the last col with new_df
    new_df=d if idx==0 else pd.concat([new_df, d[u]], axis=1)

        
new_df  #output
Answered By: R. Baraiya

The problem consists mostly of manipulating and renaming some columns if you take the looping through columns approach. I am assuming your data is in a dataframe called df.

data_entries = list(set(col[:2] for col in df.columns[1:]))
data_entries.sort()

dfs_split = []
for entry in data_entries: 
    # Get the Date and Data columns 
    cols = ['Date'] + [col for col in df.columns if entry in col]    
    df_data = df[cols]
    # Add the Data column 
    df_data.insert(1, 'Data', entry)
    # Take out the Data prefix on the columns
    df_data = df_data.rename(lambda x: x.replace(f'{entry}_', ''), axis=1)
    dfs_split.append(df_data)
        
df = pd.concat(dfs_split, axis=0)    
df = df.sort_values(by='Date')
df
Answered By: Brener Ramos
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.