Compare and find duplicate

Question:

From the table I need to find out the keys (different keys) that have duplicate longitude and latitude. If a key having any of sequence has duplicate longitude and latitude all the sequences of the same key should be shown duplicate. (comparison between same key sequences does not happen). The Matched_with column shows the index of the Key that were matched

Asked By: RobertB

||

Answers:

You can use a selective merge combined with groupby.agg:

cols = ['Key', 'Coordinate', 'Index']
df['Matched_with'] = df['Index'].map(df[cols]
 .merge(df[cols]
        .astype({'Index': 'str'})
        .rename(columns={'Index': 'Matched_with'}),
        on='Coordinate')
 .loc[lambda d: d['Key_x'].ne(d['Key_y'])]
 .groupby('Index')['Matched_with'].agg(','.join)
)

df['Duplicate'] = np.where(df['Matched_with'].notna(), 'Yes', 'No')

output:

    Index   Key  Sequence  Coordinate Matched_with Duplicate
0       1  1001         1  18.2, 14.2     13,14,15       Yes
1       2  1001         2  18.2, 14.2     13,14,15       Yes
2       3  1001         3  18.2, 14.2     13,14,15       Yes
3       4  2001         1  25.6, 22.8        7,8,9       Yes
4       5  2001         2  25.6, 22.8        7,8,9       Yes
5       6  2001         3  25.6, 22.8        7,8,9       Yes
6       7  5004         1  25.6, 22.8        4,5,6       Yes
7       8  5004         2  25.6, 22.8        4,5,6       Yes
8       9  5004         3  25.6, 22.8        4,5,6       Yes
9      10  6895         1  36.2, 17.4          NaN        No
10     11  6895         2  36.2, 17.4          NaN        No
11     12  6895         3  36.2, 17.4          NaN        No
12     13  6650         1  18.2, 14.2        1,2,3       Yes
13     14  6650         2  18.2, 14.2        1,2,3       Yes
14     15  6650         3  18.2, 14.2        1,2,3       Yes
intermediate merge:
    Key_x  Coordinate  Index  Key_y Matched_with
0    1001  18.2, 14.2      1   1001            1  # these will be removed
1    1001  18.2, 14.2      1   1001            2  #
2    1001  18.2, 14.2      1   1001            3  #
3    1001  18.2, 14.2      1   6650           13      # these will be kept
4    1001  18.2, 14.2      1   6650           14      #
..    ...         ...    ...    ...          ...
76   6895  36.2, 17.4     11   6895           11
77   6895  36.2, 17.4     11   6895           12
78   6895  36.2, 17.4     12   6895           10
79   6895  36.2, 17.4     12   6895           11
80   6895  36.2, 17.4     12   6895           12

[81 rows x 5 columns]
Answered By: mozway
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.