Populate Adjacent Value in Pandas Column When String is Duplicated

Question

I’m attempting to overwrite the value named in a column called ‘Group’ when the value in a column called ‘Keyword’ is a duplicate with the adjacent value.

For example, because the string ‘commercial office cleaning services’ is duplicated, I’d like to overwrite the adjacent column to ‘commercial cleaning services’.

Example Data

Desired Output

Minimum Reproducible Example

import pandas as pd

data = [
    ["commercial cleaning services", "commercial cleaning services"],
    ["commercial office cleaning services", "commercial cleaning services"],
    ["janitorial cleaning services", "commercial cleaning services"],
    ["commercial office services", "commercial cleaning"],
]
df = pd.DataFrame(data, columns=["Keyword", "Group"])
print(df)

I’m fairly new to pandas and not sure where to start, I’ve reached a dead end Googling and searching stackoverflow.

Asked By: Lee Roy

||

Source

Answer 1

IIUC, use duplicated with mask and ffill :

#is the keyword duplicated ?
m = df['Keyword'].duplicated()

df['Group'] = df['Group'].mask(m).ffill()

# Output:

print(df)

                               Keyword                         Group
0         commercial cleaning services  commercial cleaning services
1  commercial office cleaning services  commercial cleaning services
2         janitorial cleaning services  commercial cleaning services
3  commercial office cleaning services  commercial cleaning services

NB: The reproducible example does not match the image of the input (https://i.stack.imgur.com/fPWPa.png)

Answered By: abokey

Populate Adjacent Value in Pandas Column When String is Duplicated

Question:

Answers:

# Output: