How to split the values and mark them in order from the first row, and if it's duplicate, do not mark duplicate

Question:

How to split the values and mark them in order from the first row, and if it’s duplicate, do not mark duplicate

There is a dataframe have three columns: model, order, and item

df = pd.DataFrame({'model':['A','A','A','A','A','B','B','B','B','C','C','C','C','C'],
           'order':['aa/ab','aa/ab','aa/ab','aa/ab','aa/ab','ba','ba','ba','ba','ca/cab/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc'],
           'Item':['tissue','paper','tea_spon','chopstick','dish','horse','dog','cat','cat','tv','radio','audio','handphone','recoder']})

order column values, it is generally one but, but it may be combined into two or more.
In this case, it can be separated by "/".

For example, "aa/ab" means that the order "aa" and the order "ab" are combined.

enter image description here

enter image description here

If there is 1 order(like ba) per model, I want to display only the first row, and delete the rest,

If there are 2 orders(like aa/ab), only the first order is displayed in the first row, and only the second order is displayed in the second row like below picture

If it’s 3 orders, you can mark it with the same rule

enter image description here

enter image description here

Asked By: ghost_like

||

Answers:

Use custom lambda function in GroupBy.transform for split values with remove duplicates:

f = lambda x: (pd.Series(x.str.split('/').explode().to_numpy()[:len(x)])
                 .mask(lambda x: x.duplicated()))
df['order'] = df.groupby('model')['order'].transform(f)

Another idea:

f = lambda x: (pd.Series(n:=np.unique([z for y in x for z in y.split('/')]), 
                         index=x.index[:len(n)]).reindex(x.index))
df['order'] = df.groupby('model')['order'].transform(f)

print(df)
   model order       Item
0      A    aa     tissue
1      A    ab      paper
2      A   NaN   tea_spon
3      A   NaN  chopstick
4      A   NaN       dish
5      B    ba      horse
6      B   NaN        dog
7      B   NaN        cat
8      B   NaN        cat
9      C    ca         tv
10     C    cb      radio
11     C    cc      audio
12     C   NaN  handphone
13     C   NaN    recoder
Answered By: jezrael

Using groupby.transform with a custom function:

def split(g):
    l = g.iloc[0].split('/')
    return pd.Series(l, index=g.index[:len(l)])
    
df['order'] = df.groupby('model')['order'].transform(split)

Output:

   model order       Item
0      A    aa     tissue
1      A    ab      paper
2      A   NaN   tea_spon
3      A   NaN  chopstick
4      A   NaN       dish
5      B    ba      horse
6      B   NaN        dog
7      B   NaN        cat
8      B   NaN        cat
9      C    ca         tv
10     C    cb      radio
11     C    cc      audio
12     C   NaN  handphone
13     C   NaN    recoder
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.