How do I print the cell values that cause pandas pandas.DataFrame.any to return True?
Question:
The code below tells if a dataframe Df3
cell has the same value as another dataframe cell within an array, dataframe_arrays
. However, I want to print the cell value and the specific dataframe within dataframe_arrays
that have the same value as Df3
. Here is what I have tried –
import pandas as pd
dataframe_arrays = []
Df1 = pd.DataFrame({'IDs': ['Marc', 'Jake', 'Sam', 'Brad']})
dataframe_arrays.append(Df1)
Df2 = pd.DataFrame({'IDs': ['TIm', 'Tom', 'harry', 'joe', 'bill']})
dataframe_arrays.append(Df2)
Df3 = pd.DataFrame({'IDs': ['kob', 'ham', 'konard', 'jupyter', 'Marc']})
repeat = False
for i in dataframe_arrays:
repeat = Df3.IDs.isin(i.IDs).any()
if repeat:
print("i = ", i)
break
My objective is to compare my current dataframe column with columns belonging to another set of dataframes and identify which values are repeating.
Answers:
If your data is not that large, you can simply use nested loop
with .iterrows()
to go through row by row and dataframe by dataframe. Also, you can use globals()
to get the variable name of the dataframe that contains the duplicate.
def get_var_name(variable):
globals_dict = globals()
return [var_name for var_name in globals_dict if globals_dict[var_name] is variable]
for index, row in Df3.iterrows():
for i in range(len(dataframe_arrays)):
if row['IDs'] in dataframe_arrays[i]['IDs'].values:
print("{} is in {}".format(row['IDs'], get_var_name(dataframe_arrays[i])[0]))
output:
> Marc is in Df1
The code below tells if a dataframe Df3
cell has the same value as another dataframe cell within an array, dataframe_arrays
. However, I want to print the cell value and the specific dataframe within dataframe_arrays
that have the same value as Df3
. Here is what I have tried –
import pandas as pd
dataframe_arrays = []
Df1 = pd.DataFrame({'IDs': ['Marc', 'Jake', 'Sam', 'Brad']})
dataframe_arrays.append(Df1)
Df2 = pd.DataFrame({'IDs': ['TIm', 'Tom', 'harry', 'joe', 'bill']})
dataframe_arrays.append(Df2)
Df3 = pd.DataFrame({'IDs': ['kob', 'ham', 'konard', 'jupyter', 'Marc']})
repeat = False
for i in dataframe_arrays:
repeat = Df3.IDs.isin(i.IDs).any()
if repeat:
print("i = ", i)
break
My objective is to compare my current dataframe column with columns belonging to another set of dataframes and identify which values are repeating.
If your data is not that large, you can simply use nested loop
with .iterrows()
to go through row by row and dataframe by dataframe. Also, you can use globals()
to get the variable name of the dataframe that contains the duplicate.
def get_var_name(variable):
globals_dict = globals()
return [var_name for var_name in globals_dict if globals_dict[var_name] is variable]
for index, row in Df3.iterrows():
for i in range(len(dataframe_arrays)):
if row['IDs'] in dataframe_arrays[i]['IDs'].values:
print("{} is in {}".format(row['IDs'], get_var_name(dataframe_arrays[i])[0]))
output:
> Marc is in Df1