How to split the values and mark them in order from the first row, and if it's duplicate, do not mark duplicate

Question

How to split the values and mark them in order from the first row, and if it’s duplicate, do not mark duplicate

There is a dataframe have three columns: model, order, and item

df = pd.DataFrame({'model':['A','A','A','A','A','B','B','B','B','C','C','C','C','C'],
           'order':['aa/ab','aa/ab','aa/ab','aa/ab','aa/ab','ba','ba','ba','ba','ca/cab/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc'],
           'Item':['tissue','paper','tea_spon','chopstick','dish','horse','dog','cat','cat','tv','radio','audio','handphone','recoder']})

order column values, it is generally one but, but it may be combined into two or more.
In this case, it can be separated by "/".

For example, "aa/ab" means that the order "aa" and the order "ab" are combined.

If there is 1 order(like ba) per model, I want to display only the first row, and delete the rest,

If there are 2 orders(like aa/ab), only the first order is displayed in the first row, and only the second order is displayed in the second row like below picture

If it’s 3 orders, you can mark it with the same rule

Asked By: ghost_like

||

Source

Answer 1

Use custom lambda function in GroupBy.transform for split values with remove duplicates:

f = lambda x: (pd.Series(x.str.split('/').explode().to_numpy()[:len(x)])
                 .mask(lambda x: x.duplicated()))
df['order'] = df.groupby('model')['order'].transform(f)

Another idea:

f = lambda x: (pd.Series(n:=np.unique([z for y in x for z in y.split('/')]), 
                         index=x.index[:len(n)]).reindex(x.index))
df['order'] = df.groupby('model')['order'].transform(f)

print(df)
   model order       Item
0      A    aa     tissue
1      A    ab      paper
2      A   NaN   tea_spon
3      A   NaN  chopstick
4      A   NaN       dish
5      B    ba      horse
6      B   NaN        dog
7      B   NaN        cat
8      B   NaN        cat
9      C    ca         tv
10     C    cb      radio
11     C    cc      audio
12     C   NaN  handphone
13     C   NaN    recoder

Answered By: jezrael

Answer 2

Using groupby.transform with a custom function:

def split(g):
    l = g.iloc[0].split('/')
    return pd.Series(l, index=g.index[:len(l)])
    
df['order'] = df.groupby('model')['order'].transform(split)

Output:

   model order       Item
0      A    aa     tissue
1      A    ab      paper
2      A   NaN   tea_spon
3      A   NaN  chopstick
4      A   NaN       dish
5      B    ba      horse
6      B   NaN        dog
7      B   NaN        cat
8      B   NaN        cat
9      C    ca         tv
10     C    cb      radio
11     C    cc      audio
12     C   NaN  handphone
13     C   NaN    recoder

Answered By: mozway

How to split the values and mark them in order from the first row, and if it's duplicate, do not mark duplicate

Question:

Answers: