Sorting and grouping pandas dataframe in a weird manner

Question:

List = [0, 3, 2, 1]

This is the pandas dataframe i have.

TIME DATA DMMA CODE
0 0 days 00:00:00 Layer1 B
1 0 days 00:00:00 Layer2 S
2 0 days 00:08:00 Layer3 B
3 0 days 00:16:00 Layer4 S
4 0 days 00:24:00 Layer5 B
5 0 days 00:53:00 Layer6 S
6 0 days 01:16:00 Layer7 B
7 0 days 01:29:00 Layer8 S

This is the end-result im trying to get.

TIME DATA
0 0 days 00:00:00 , 0 days 00:00:00 Layer1 , Layer2
1 0 days 00:08:00 , 0 days 01:29:00 Layer3 , Layer8
2 0 days 00:24:00 , 0 days 00:53:00 Layer5 , Layer6
3 0 days 01:16:00 , 0 days 00:16:00 Layer7 , Layer4

Basically index of the element inside "List" is the index of datas that have DMMA CODE of "B" and the element itself is the index of the datas that have DMMA CODE of "S".

First element of the list tells that "First CODE:B should be grouped with the first CODE:S"

Second element of the list tells that "Second CODE:B should be grouped with the fourth CODE:S" etc.

I am not familiar with pandas and couldn’t even find a way to approach this problem.

Asked By: Quante

||

Answers:

Your data is basically out of order, so let’s re-order it:

info = [0, 3, 2, 1]
mask = df['DMMA CODE'].eq('S')
df.loc[mask] = df[mask].iloc[info].values

# Then we can get what you want fairly easily:
groups = df.groupby('DMMA CODE').cumcount()
out = df.groupby(groups).agg(list)
print(out)

Output:

                                 TIME              DATA DMMA CODE
0  [0 days 00:00:00, 0 days 00:00:00]  [Layer1, Layer2]    [B, S]
1  [0 days 00:08:00, 0 days 01:29:00]  [Layer3, Layer8]    [B, S]
2  [0 days 00:24:00, 0 days 00:53:00]  [Layer5, Layer6]    [B, S]
3  [0 days 01:16:00, 0 days 00:16:00]  [Layer7, Layer4]    [B, S]
Answered By: BeRT2me

I have a suggestion to do the sorting using python dictionary, but there are some steps involving pandas as well. First I need to output a dictionary for each DMMA CODE "B" and "S", and each to have the natural order index 0,1,2,3. From the starting df

              TIME    DATA DMMA_CODE
0  0_days_00:00:00  Layer1         B
1  0_days_00:00:00  Layer2         S
2  0_days_00:08:00  Layer3         B
3  0_days_00:16:00  Layer4         S
4  0_days_00:24:00  Layer5         B
5  0_days_00:53:00  Layer6         S
6  0_days_01:16:00  Layer7         B
7  0_days_01:29:00  Layer8         S

Next I append the items according to the desired order, ie, for DMMA CODE "B" with order 0,1,2,3 whereas for DMMA CODE "S" with order 0,3,2,1.

import pandas as pd

df = df.sort_values('DMMA_CODE')
b = df[df['DMMA_CODE']=='B'].reset_index(drop=True).to_dict()
s = df[df['DMMA_CODE']=='S'].reset_index(drop=True).to_dict()

d = {}
time, data = [], []
order_list = [0,3,2,1]
for i, order in enumerate(order_list):
    time.append(b['TIME'][i] + ' , ' + s['TIME'][order])
    data.append(b['DATA'][i] + ' , ' + s['DATA'][order])
d['TIME'] = time
d['DATA'] = data
print(pd.DataFrame(d))

Output:

                                TIME             DATA
0  0_days_00:00:00 , 0_days_00:00:00  Layer1 , Layer2
1  0_days_00:08:00 , 0_days_01:29:00  Layer3 , Layer8
2  0_days_00:24:00 , 0_days_00:53:00  Layer5 , Layer6
3  0_days_01:16:00 , 0_days_00:16:00  Layer7 , Layer4
Answered By: perpetualstudent

Here is an alternative way using sort_values()

ndf = (df.sort_values('DMMA CODE')
.sort_values('DMMA CODE',key = lambda x: [i for i in list(range(0,4)) + list(np.array(l))]))
ndf.groupby(ndf['DMMA CODE'].eq('B').cumsum()).agg(', '.join)

or

df.loc[[j for i in zip(b,[i[-1] for i in sorted(zip(l,s))]) for j in i]].reset_index(drop=True).groupby(lambda x: x//2).agg(list)

or

df.groupby(df.groupby('DMMA CODE').cumcount().mask(df['DMMA CODE'].eq('S'),lambda x: x.map(dict(enumerate(l))))).agg(list)

Output:

                                       TIME            DATA DMMA CODE
DMMA CODE                                                            
1          0 days 00:00:00, 0 days 00:00:00  Layer1, Layer2      B, S
2          0 days 00:08:00, 0 days 01:29:00  Layer3, Layer8      B, S
3          0 days 00:24:00, 0 days 00:53:00  Layer5, Layer6      B, S
4          0 days 01:16:00, 0 days 00:16:00  Layer7, Layer4      B, S
Answered By: rhug123
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.