Python – How to remove duplicate columns and add as rows in csv

Question:

I have following csv

Name     Date     Qty   Date     Qty   Date     Qty
---------------------------------------------------
ABC       Jan 2023   10    Feb 2023    11    Mar 2023    12
XYZ       Jan 2023   20    Feb 2023    21    Mar 2023    22

I want output as follows in csv/dataframe

Name     Date     Qty
---------------------
ABC       Jan 2023   10
ABC       Feb 2023   11
ABC       Mar 2023   12
XYZ       Jan 2023   20
XYZ       Feb 2023   21
XYZ       Mar 2023   22

How I achieve this result?

Asked By: newbi

||

Answers:

A bit complicated but does the job. You can execute step by step to view the transformation:

>>> (df.melt('Name').assign(row=lambda x: x.groupby('variable').cumcount())
       .pivot(['row', 'Name'], 'variable', 'value')
       .reset_index('Name').rename_axis(index=None, columns=None))

  Name      Date Qty
0  ABC  Jan 2023  10
1  XYZ  Jan 2023  20
2  ABC  Feb 2023  11
3  XYZ  Feb 2023  21
4  ABC  Mar 2023  12
5  XYZ  Mar 2023  22
Answered By: Corralien

Less streamlined solution compared to @Corralien’s. Also uses melt and pivot.

import pandas as pd
import io

#-----------------------------------------------#
#Recreate OP's table with duplicate column names#
#-----------------------------------------------#
df = pd.read_csv(io.StringIO("""
ABC       Jan-2023   10    Feb-2023    11    Mar-2023    12
XYZ       Jan-2023   20    Feb-2023    21    Mar-2023    22
"""),header=None,delim_whitespace=True)

df.columns = ['Name','Date','Qty','Date','Qty','Date','Qty']


#-----------------#
#Start of solution#
#-----------------#
#melt from wide to long (maintains order)
melted_df = df.melt(
    id_vars='Name',
    var_name='col',
    value_name='val',
)

#add a number for Date1/Date2/Date3 to keep track of Qty1/Qty2/Qty3 etc
melted_df['col_number'] = melted_df.groupby(['Name','col']).cumcount()

#pivot back to wide form
wide_df = melted_df.pivot(
    index=['Name','col_number'],
    columns='col',
    values='val',
).reset_index().drop(columns=['col_number'])

wide_df.columns.name = None #remove column index name

#Final output
print(wide_df)

Output

  Name      Date Qty
0  ABC  Jan-2023  10
1  ABC  Feb-2023  11
2  ABC  Mar-2023  12
3  XYZ  Jan-2023  20
4  XYZ  Feb-2023  21
5  XYZ  Mar-2023  22
Answered By: mitoRibo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.