Comparing two Dataframes and creating a third one where certain contions are met

Question:

I am trying to compare two different dataframe that have the same column names and indexes(not numerical) and I need to obtain a third df with the biggest value for the row with the same column name.

Example

df1=

|  | col_1 | col2 | col-3 |

| rft_12312 | 4 | 7 | 4 |

| rft_321321 | 3 | 4 | 1 |
df2=

|  | col_1 | col2 | col-3 |

| rft_12312 | 7 | 3 | 4 |

| rft_321321 | 3 | 7 | 6 |

Required result

|  | col_1 | col2 | col-3 |

| rft_12312 | 7 (because df2.value in this [row :column] >df1.value) | 7 | 4 |


| rft_321321 | 3(when they are equal doesn't matter from which column is the value) | 7 | 6 |

I’ve already tried pd.update with filter_func defined as:

def filtration_function(val1,val2):
if val1 >= val2:
return val1
else:
return val2

but is not working. I need the check for each column with same name.

also pd.compare but does not allow me to pick the right values.

Thank you in advance 🙂

Asked By: Iksy97

||

Answers:

I think one possibility would be to use "combine". This method generates an element-wise comparsion between the two dataframes and returns the maximum value of each element.

Example:

import pandas as pd

def filtration_function(val1, val2):
    return max(val1, val2)

result = df1.combine(df2, filtration_function)

I think method "where" can work to:

import pandas as pd

result = df1.where(df1 >= df2, df2)
Answered By: rzz