How to split the values and mark them in order from the first row, and if it's duplicate, do not mark duplicate
Question:
How to split the values and mark them in order from the first row, and if it’s duplicate, do not mark duplicate
There is a dataframe have three columns: model, order, and item
df = pd.DataFrame({'model':['A','A','A','A','A','B','B','B','B','C','C','C','C','C'],
'order':['aa/ab','aa/ab','aa/ab','aa/ab','aa/ab','ba','ba','ba','ba','ca/cab/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc'],
'Item':['tissue','paper','tea_spon','chopstick','dish','horse','dog','cat','cat','tv','radio','audio','handphone','recoder']})
order column values, it is generally one but, but it may be combined into two or more.
In this case, it can be separated by "/".
For example, "aa/ab" means that the order "aa" and the order "ab" are combined.
If there is 1 order(like ba) per model, I want to display only the first row, and delete the rest,
If there are 2 orders(like aa/ab), only the first order is displayed in the first row, and only the second order is displayed in the second row like below picture
If it’s 3 orders, you can mark it with the same rule
Answers:
Use custom lambda function in GroupBy.transform
for split values with remove duplicates:
f = lambda x: (pd.Series(x.str.split('/').explode().to_numpy()[:len(x)])
.mask(lambda x: x.duplicated()))
df['order'] = df.groupby('model')['order'].transform(f)
Another idea:
f = lambda x: (pd.Series(n:=np.unique([z for y in x for z in y.split('/')]),
index=x.index[:len(n)]).reindex(x.index))
df['order'] = df.groupby('model')['order'].transform(f)
print(df)
model order Item
0 A aa tissue
1 A ab paper
2 A NaN tea_spon
3 A NaN chopstick
4 A NaN dish
5 B ba horse
6 B NaN dog
7 B NaN cat
8 B NaN cat
9 C ca tv
10 C cb radio
11 C cc audio
12 C NaN handphone
13 C NaN recoder
Using groupby.transform
with a custom function:
def split(g):
l = g.iloc[0].split('/')
return pd.Series(l, index=g.index[:len(l)])
df['order'] = df.groupby('model')['order'].transform(split)
Output:
model order Item
0 A aa tissue
1 A ab paper
2 A NaN tea_spon
3 A NaN chopstick
4 A NaN dish
5 B ba horse
6 B NaN dog
7 B NaN cat
8 B NaN cat
9 C ca tv
10 C cb radio
11 C cc audio
12 C NaN handphone
13 C NaN recoder
How to split the values and mark them in order from the first row, and if it’s duplicate, do not mark duplicate
There is a dataframe have three columns: model, order, and item
df = pd.DataFrame({'model':['A','A','A','A','A','B','B','B','B','C','C','C','C','C'],
'order':['aa/ab','aa/ab','aa/ab','aa/ab','aa/ab','ba','ba','ba','ba','ca/cab/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc','ca/cb/cc'],
'Item':['tissue','paper','tea_spon','chopstick','dish','horse','dog','cat','cat','tv','radio','audio','handphone','recoder']})
order column values, it is generally one but, but it may be combined into two or more.
In this case, it can be separated by "/".
For example, "aa/ab" means that the order "aa" and the order "ab" are combined.
If there is 1 order(like ba) per model, I want to display only the first row, and delete the rest,
If there are 2 orders(like aa/ab), only the first order is displayed in the first row, and only the second order is displayed in the second row like below picture
If it’s 3 orders, you can mark it with the same rule
Use custom lambda function in GroupBy.transform
for split values with remove duplicates:
f = lambda x: (pd.Series(x.str.split('/').explode().to_numpy()[:len(x)])
.mask(lambda x: x.duplicated()))
df['order'] = df.groupby('model')['order'].transform(f)
Another idea:
f = lambda x: (pd.Series(n:=np.unique([z for y in x for z in y.split('/')]),
index=x.index[:len(n)]).reindex(x.index))
df['order'] = df.groupby('model')['order'].transform(f)
print(df)
model order Item
0 A aa tissue
1 A ab paper
2 A NaN tea_spon
3 A NaN chopstick
4 A NaN dish
5 B ba horse
6 B NaN dog
7 B NaN cat
8 B NaN cat
9 C ca tv
10 C cb radio
11 C cc audio
12 C NaN handphone
13 C NaN recoder
Using groupby.transform
with a custom function:
def split(g):
l = g.iloc[0].split('/')
return pd.Series(l, index=g.index[:len(l)])
df['order'] = df.groupby('model')['order'].transform(split)
Output:
model order Item
0 A aa tissue
1 A ab paper
2 A NaN tea_spon
3 A NaN chopstick
4 A NaN dish
5 B ba horse
6 B NaN dog
7 B NaN cat
8 B NaN cat
9 C ca tv
10 C cb radio
11 C cc audio
12 C NaN handphone
13 C NaN recoder