Check if a row in column is unique python Dataframe

Question:

I have the following Dataframe:


| id1      | result         |
| -------- | -------------- |
| 2        |         0.5    |
| 3        |         1.4    |
| 4        |         1.4    |
| 7        |         3.4    |
| 2        |         1.4    |

I want to check for every row in the column [‘id1’] if the value is unique

The output should be:

False
True
True
True
False

The first and the last are False because id 2 exists twice.

I used this method:

bool = df["id1"].is_unique`

but that checks if the whole column is unique. I want to check it for each row

Asked By: hubi3012

||

Answers:

df['id1'].map(~(df.groupby('id1').size() > 1))
Output
0    False
1     True
2     True
3     True
4    False
Name: id1, dtype: bool
Answered By: DiMithras

Since I saw you tagged this question with pandas, I’m assuming you’re using the pandas package.
You can create an array with a bunch of id1 here, then use pd.Series.duplicated method like the following example.
You can get the pandas docs here.

import pandas as pd
check_id1_duplicate = pd.Index([2, 3, 4, 7, 2])
check_id1_duplicate.duplicated(keep=False)
# Results would be array([True, False, False, False, True])
Answered By: ShiriNmi

To add to @ShiriNmi’s answer, the duplicated solution is more intuitive and about 8 times faster, while returning the same result.

%timeit -n 10_000 df['id1'].map(~(df.groupby('id1').size() > 1))
# 697 µs ± 60.3 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit ~df['id1'].duplicated(keep=False)
# 89.5 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Answered By: npetrov937
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.