Group dataframes using series as a reference

Question

There is 1 Dataframe and 1 series

dataframe(df1) is information about the model and parts.
A model consists of several parts.

df1 = pd.DataFrame({'model':['A','A','A','A','A','B','B','B','B','B'],
                'part':['part_1','part_1','part_2','part_2','part_3','part_4','part_4','part_5','part_5','part_5']})

Looking at s1, "A" is the model and "B" is the part.

s1 = pd.Series({'A':'model', 'B':'part'})

Series as a reference
"A" is grouped between the same models, and "B" is grouped together with the same parts.
should be displayed in the "batch" column of df1

The desired output is df1

Asked By: ghost_like

||

Source

Answer 1

You can do it this way by creating a grouping column:

grp = pd.concat([df1[df1['model'] == k][v] for k, v in s1.items()])
df1['batch'] = 'batch_' + (grp != grp.shift()).cumsum().astype(str)
df1

Output:

  model    part    batch
0     A  part_1  batch_1
1     A  part_1  batch_1
2     A  part_2  batch_1
3     A  part_2  batch_1
4     A  part_3  batch_1
5     B  part_4  batch_2
6     B  part_4  batch_2
7     B  part_5  batch_3
8     B  part_5  batch_3
9     B  part_5  batch_3

Using list comprehension, build a list of series from df1 using the keys and values from s1. pd.concat this list of series together to create your groupings, then use string manipulation to create batch strings for each row and assign to new dataframe column header ‘batch’.

Update for @mozway comment.

df1['batch'] = 'batch_' + (df1.groupby(grp).ngroup()+1).astype(str)

Answered By: Scott Boston

Answer 2

If you really want to group by an arbitrary column mapped by a dictionary, you should use an indexing lookup:

# map column names per model
idx, cols = pd.factorize(df1['model'].map(s1))

# get column value
grp = df1.reindex(cols, axis=1).to_numpy()[np.arange(len(df1)), idx]

# use this as grouper
df1['batch'] = (df1.groupby(['model', grp])
                .ngroup().add(1)
                .astype(str).radd('batch_')
                )

If you don’t have values in "Model" that can be found in other columns, you can simplify the last part:

df1['batch'] = 'batch_' + pd.Series(pd.factorize(grp)[0]+1, index=df1.index).astype(str)

Output:


  model    part    batch
0     A  part_1  batch_1
1     A  part_1  batch_1
2     A  part_2  batch_1
3     A  part_2  batch_1
4     A  part_3  batch_1
5     B  part_4  batch_2
6     B  part_4  batch_2
7     B  part_5  batch_3
8     B  part_5  batch_3
9     B  part_5  batch_3

Intermediate grp:

['A' 'A' 'A' 'A' 'A' 'part_4' 'part_4' 'part_5' 'part_5' 'part_5']

Answered By: mozway

Group dataframes using series as a reference

Question:

Answers: