How do I print the cell values that cause pandas pandas.DataFrame.any to return True?

Question

The code below tells if a dataframe Df3 cell has the same value as another dataframe cell within an array, dataframe_arrays. However, I want to print the cell value and the specific dataframe within dataframe_arrays that have the same value as Df3. Here is what I have tried –

import pandas as pd
dataframe_arrays = []
Df1 = pd.DataFrame({'IDs': ['Marc', 'Jake', 'Sam', 'Brad']})
dataframe_arrays.append(Df1)
Df2 = pd.DataFrame({'IDs': ['TIm', 'Tom', 'harry', 'joe', 'bill']})
dataframe_arrays.append(Df2)
Df3 = pd.DataFrame({'IDs': ['kob', 'ham', 'konard', 'jupyter', 'Marc']})
repeat = False
for i in dataframe_arrays:
  repeat = Df3.IDs.isin(i.IDs).any()
  if repeat:
    print("i = ", i)
    break

My objective is to compare my current dataframe column with columns belonging to another set of dataframes and identify which values are repeating.

Asked By: desert_ranger

||

Source

Answer 1

If your data is not that large, you can simply use nested loop with .iterrows() to go through row by row and dataframe by dataframe. Also, you can use globals() to get the variable name of the dataframe that contains the duplicate.

def get_var_name(variable):
    globals_dict = globals()

    return [var_name for var_name in globals_dict if globals_dict[var_name] is variable]

for index, row in Df3.iterrows():
    for i in range(len(dataframe_arrays)):
        if row['IDs'] in dataframe_arrays[i]['IDs'].values:
            print("{} is in {}".format(row['IDs'], get_var_name(dataframe_arrays[i])[0]))

output:

> Marc is in Df1

Answered By: JayPeerachai

How do I print the cell values that cause pandas pandas.DataFrame.any to return True?

Question:

Answers: