Removing duplicate row values but creating new columns

Question:

I have a table that looks similar to the following:

Miles  Side  Direction  Param1   Width  Height  Date
0.5    Left  Right      5        0.6    0.8     2023-01-04
0.5    Right Right      5        0.5    0.9     2023-01-04
1      Left  Left       4        0.3    0.3     2023-01-04
1      Right Left       4        0.5    0.5     2023-01-04

As can be seen from the table, there are duplicate values for Miles, Direction, Param1, and Date. However, Side, Width, and Height will vary. What I am trying to do is remove the duplicate values and make new columns for the values that vary. The table should look something like below:

Miles   Direction   Param1  Side1   Width1  Height1 Side2   Width2  Height2 Date
0.5     Right       5       Left    0.6     0.8     Right   0.5     0.9  2023-01-04
1       Left        4       Left    0.3     0.3     Right   0.5     0.5  2023-01-04

I have attempted to use the following:

  1. pivot function, but it doesn’t appear to work when there are multiple duplicate parameters
  2. pivot_table – it seems like this would work but I think I am missing something.

I tried something like:

df = pd.pivot_table(df, values=['Side','Width','Height'], index=['Miles, Direction','Param1','Date'], columns=None)

but I think something is missing here, as the data was displayed completely incorrect. Any help would be much appreciated – thank you!

Asked By: verynovice

||

Answers:

Try:

df['tmp'] = df.groupby(['Miles', 'Direction', 'Param1', 'Date']).cumcount() + 1

df = df.set_index(['Miles', 'Direction', 'Param1', 'Date', 'tmp'])
df = df.unstack('tmp')
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.reset_index()
print(df)

Prints:

   Miles Direction  Param1        Date Side1  Side2  Width1  Width2  Height1  Height2
0    0.5     Right       5  2023-01-04  Left  Right     0.6     0.5      0.8      0.9
1    1.0      Left       4  2023-01-04  Left  Right     0.3     0.5      0.3      0.5
Answered By: Andrej Kesely

A proposition using pandas.pivot_table :

dup_cols = ['Miles', 'Direction','Param1','Date']
var_cols = ['Side','Width','Height']
​
out = (
        pd.pivot_table(df.
                         assign(idx=df.groupby(dup_cols).cumcount()+1),
                                index=dup_cols,
                                values=var_cols,
                                columns='idx',
                                fill_value='',
                                aggfunc=lambda x: x)
            .pipe(lambda d: d.set_axis([f'{col}{num}' for col,num in d.columns], axis=1))
            .reset_index()
       )

# Output :

print(out)
​
   Miles Direction  Param1        Date  Height1  Height2 Side1  Side2  Width1  Width2
0    0.5     Right       5  2023-01-04      0.8      0.9  Left  Right     0.6     0.5
1    1.0      Left       4  2023-01-04      0.3      0.5  Left  Right     0.3     0.5
Answered By: abokey