Most efficient way to find and replace 'None' cells with value from a similar cell pair in dataframe?

Question:

I’m trying to find the most efficient way to fill in cells with a value of ‘None’ in a dataframe based on a rule.

If column ‘A’ has a specific value, such as ‘Apple’, and column ‘B’ is empty, I want to find all other instances of column ‘A’ with value ‘Apple’ and fill it with whatever is in column ‘B’s if it is not empty.

For example, if the input is this:

Column A Column B
Apple None
Apple None
Orange Soda
Banana None
Apple Pie
Banana Bread
Orange None

Then it should output this:

Column A Column B
Apple Pie
Apple Pie
Orange Soda
Banana Bread
Apple Pie
Banana Bread
Orange Soda

You can assume that for a particular Column A & B pair, it will always be the same (e.g. for every ‘Apple’ in Column A, there will either be ‘None’ or ‘Pie’ in Column B).

I tried the below, and it seems to work on this small test dataset, but I’m wondering if someone could suggest a more efficient method that I could use on my actual dataset (~100K rows).

for ind in data.index:
    if data['Column B'][ind]=='None':
        temp_A = data['Column A'][ind]
        for ind2 in data.index:
            if (data['Column A'][ind2]==temp_A) & (data['Column B'][ind2]!='None'):
                data['Column B'][ind] = data['Column B'][ind2]
                break 
Asked By: brainssss

||

Answers:

Filter rows with no None values, remove duplicates and convert to Series, last mapping None values by Series.map:

m = df['Column B'].ne('None')
s = df[m].drop_duplicates('Column A').set_index('Column A')['Column B']
df.loc[~m, 'Column B'] = df.loc[~m, 'Column A'].map(s)

print (df)
  Column A Column B
0    Apple      Pie
1    Apple      Pie
2   Orange     Soda
3   Banana    Bread
4    Apple      Pie
5   Banana    Bread
6   Orange     Soda
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.