How to loop through and return any value if it is found inside any other column within a dataframe using pandas?

Question:

How to loop through and return any value if it is found inside any other column, and store it in a list using pandas? It doesn’t matter how many times it is found, just that it is found at least one more time in a different column. If the value has repeated within the same column, it’s not included in the list. Each value must be compared to every other value except from within the same column, if that makes sense.

combined_insp = []
test_df = pd.DataFrame({'area_1': ['John', 'Mike', 'Mary', 'Sarah'],
                        'area_2': ['John', 'Bob', 'Mary', 'Mary'],
                        'area_3': ['Jane', 'Sarah', 'David', 'Michael'],
                        'area_4': ['Diana', 'Mike', 'Bill', 'Bill']})

Expected output would be

combined_insp = [‘John’, ‘Mary’, ‘Sarah’, ‘Mike’]

Asked By: Anthony Stokes

||

Answers:

You can use pandas.apply(set) for removing duplicated elements in each list. Then You can use itertools.chain.from_iterable to flatten all elements to one list. At the end, you can use collections.Counter for counting elements and returning elements that have count > 1. (type of the result of Counter is dict and you can iterate over dict with dict.items().)

from itertools import chain
from collections import Counter
combined_insp = [k for k,v in Counter(chain.from_iterable(test_df.apply(set))).items() if v>1]
print(combined_insp)

['Sarah', 'Mike', 'Mary', 'John']
Answered By: I'mahdi

A solution with itertools and set algebra:

from itertools import combinations

combined_insp = set.union(*[set(test_df[c1]).intersection(test_df[c2]) 
                            for (c1, c2) in combinations(test_df.columns, 2)])

For each unique combination of columns we take the intersection of the values. Then we take the union of all the results.

Answered By: Josh Friedlander

here is one way to do it

# pd.melt to flatted the table, then use groupby and take the names that appear more than once

g=df.melt(value_name='area').drop_duplicates().groupby('area')
[key for key, group in g if (group.count() > 1).all() ]
['John', 'Mary', 'Mike', 'Sarah']
Answered By: Naveed
counts = df.melt().drop_duplicates()['value'].value_counts()
answer = counts[counts > 1].index.to_list()
Answered By: Steven Rumbalski
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.