Most efficient way to find and replace 'None' cells with value from a similar cell pair in dataframe?

Question

I’m trying to find the most efficient way to fill in cells with a value of ‘None’ in a dataframe based on a rule.

If column ‘A’ has a specific value, such as ‘Apple’, and column ‘B’ is empty, I want to find all other instances of column ‘A’ with value ‘Apple’ and fill it with whatever is in column ‘B’s if it is not empty.

For example, if the input is this:

Column A	Column B
Apple	None
Apple	None
Orange	Soda
Banana	None
Apple	Pie
Banana	Bread
Orange	None

Then it should output this:

Column A	Column B
Apple	Pie
Apple	Pie
Orange	Soda
Banana	Bread
Apple	Pie
Banana	Bread
Orange	Soda

You can assume that for a particular Column A & B pair, it will always be the same (e.g. for every ‘Apple’ in Column A, there will either be ‘None’ or ‘Pie’ in Column B).

I tried the below, and it seems to work on this small test dataset, but I’m wondering if someone could suggest a more efficient method that I could use on my actual dataset (~100K rows).

for ind in data.index:
    if data['Column B'][ind]=='None':
        temp_A = data['Column A'][ind]
        for ind2 in data.index:
            if (data['Column A'][ind2]==temp_A) & (data['Column B'][ind2]!='None'):
                data['Column B'][ind] = data['Column B'][ind2]
                break

Asked By: brainssss

||

Source

Answer 1

Filter rows with no None values, remove duplicates and convert to Series, last mapping None values by Series.map:

m = df['Column B'].ne('None')
s = df[m].drop_duplicates('Column A').set_index('Column A')['Column B']
df.loc[~m, 'Column B'] = df.loc[~m, 'Column A'].map(s)

print (df)
  Column A Column B
0    Apple      Pie
1    Apple      Pie
2   Orange     Soda
3   Banana    Bread
4    Apple      Pie
5   Banana    Bread
6   Orange     Soda

Answered By: jezrael

Most efficient way to find and replace 'None' cells with value from a similar cell pair in dataframe?

Question:

Answers: