If a column has 'x' value, find max value from other columns in row, and change x to the value of max column name

Question:

I have a dataframe similar to the following:

data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'],
        'topic': [-1, 0, 1, 2, 0, -1, 2],
        '-1': [1, 0, 0, 0.1, 0, 0.99, 0],
        '0': [0, 1, 0, 0, 1, 0, 0.1],
        '1': [0, 0, 0.99, 0, 0, 0, 0],
        '2': [0.4, 0, 0, 0.7, 0, 0, 1],
        }

df = pd.DataFrame(data)

If the column topic value is -1, then I want to look in the same row of columns 0 to 2, and change the value in topic, to the header of the max value.

enter image description here

As an example, in the first row in the table above, the column topic has a value of -1, so I want to change that value to whatever is the header name of the max value in the row. So that would be column 2, which has a value of 0.4.

So now the table looks like this, where the topic value of ref 0001 has changed from -1 to 2

enter image description here

The other point is, that if like ref 0006, there is no value > 0 in the other columns despite having a topic value of -1, then it should be left alone.

I hope this makes sense. Struggling hard to get this done.

Thank you!

Asked By: Nicholas

||

Answers:

def correct_topic(row):
    max_value = row[['0','1','2']].max()
    if max_value > 0:
        return row[row == max_value].index[0]
    else:
        return row.topic
    
df['topic'] = df.apply(correct_topic, axis=1)

I think this is what you need 🙂

Setting up example:

import pandas as pd
data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'],
        'topic': [-1, 0, 1, 2, 0, -1, 2],
        '-1': [1, 0, 0, 0.1, 0, 0.99, 0],
        '0': [0, 1, 0, 0, 1, 0, 0.1],
        '1': [0, 0, 0.99, 0, 0, 0, 0],
        '2': [0.4, 0, 0, 0.7, 0, 0, 1],
        }

df = pd.DataFrame(data)

Making a boolean mask to set the rows:

cols = [str(x) for x in range(0, 3)]
condition = (df.topic == -1) & ((df[cols] > 0).any(axis=1))

0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: bool

using idxmax gets the names of the columns at the maximum values

df[cols].idxmax(axis=1)

0    2
1    0
2    1
3    2
4    0
5    0
6    2
dtype: object

Now you can use df.where and the boolean condition to set the topic values.

df["topic"] = df["topic"].where(~condition, df[cols].idxmax(axis=1)[condition])

Output:

    ref topic    -1    0     1    2
0  0001     2  1.00  0.0  0.00  0.4
1  0002     0  0.00  1.0  0.00  0.0
2  0003     1  0.00  0.0  0.99  0.0
3  0004     2  0.10  0.0  0.00  0.7
4  0005     0  0.00  1.0  0.00  0.0
5  0006    -1  0.99  0.0  0.00  0.0
6  0007     2  0.00  0.1  0.00  1.0
Answered By: Tom McLean
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.