Check if there is any repeated element in a dataframe (excluding empty cells)

Question:

Hi I am trying to get "True" as output if there is a repeated element in a dataframe, and "False" if there is no repeated element. This should not take the empty cells into account.

Example 1:

import pandas as pd
data_df = {'col1': ['A','B', 'C', 'D'],

           'col2': ['E','F', '', 'G'],

           'col3': ['H', '', '  ', '  ']}

df = pd.DataFrame(data_df)

enter image description here

Should display "False"

Example 2:

enter image description here

Should display "True"

Example3:

enter image description here

Should display "True"

Asked By: Miguel Gonzalez

||

Answers:

You can use:

out = (df.stack()  # stack to Series, removing NaNs
         # remove empty/space strings
         .loc[lambda s: s.str.strip().ne('')]
         # is there any duplicate?
         .duplicated().any()
      )

Outputs:

False
True
True
Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.