Populate Adjacent Value in Pandas Column When String is Duplicated

Question:

I’m attempting to overwrite the value named in a column called ‘Group’ when the value in a column called ‘Keyword’ is a duplicate with the adjacent value.

For example, because the string ‘commercial office cleaning services’ is duplicated, I’d like to overwrite the adjacent column to ‘commercial cleaning services’.

Example Data

enter image description here

Desired Output

enter image description here

Minimum Reproducible Example

import pandas as pd

data = [
    ["commercial cleaning services", "commercial cleaning services"],
    ["commercial office cleaning services", "commercial cleaning services"],
    ["janitorial cleaning services", "commercial cleaning services"],
    ["commercial office services", "commercial cleaning"],
]
df = pd.DataFrame(data, columns=["Keyword", "Group"])
print(df)

I’m fairly new to pandas and not sure where to start, I’ve reached a dead end Googling and searching stackoverflow.

Asked By: Lee Roy

||

Answers:

IIUC, use duplicated with mask and ffill :

#is the keyword duplicated ?
m = df['Keyword'].duplicated()

df['Group'] = df['Group'].mask(m).ffill()

# Output:

print(df)

                               Keyword                         Group
0         commercial cleaning services  commercial cleaning services
1  commercial office cleaning services  commercial cleaning services
2         janitorial cleaning services  commercial cleaning services
3  commercial office cleaning services  commercial cleaning services

NB: The reproducible example does not match the image of the input (https://i.stack.imgur.com/fPWPa.png)

Answered By: abokey
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.