Pandas df return indices of duplicated elements as a list

Question:

I’d like to have the indexes of duplicated column elements as a list. So far, the way I found is

test = ['a', 'a', 'b', 'c', 'b']
testdf = pd.DataFrame(test, columns=['test'])
np.asarray(np.where(list(testdf['test'].duplicated()))).tolist()[0]
# [1, 4]

Which seems ridiculously convoluted.

Any better way?

Asked By: gaut

||

Answers:

Try this just indexing the index:

testdf.index[testdf['test'].duplicated()]

add to_list:

testdf.index[testdf['test'].duplicated()].to_list()

Output:

[1, 4]
Answered By: Scott Boston

you can use .duplicated() with .tolist()

testdf.index[testdf.test.duplicated()].tolist()
Answered By: bitflip
%%time

test = ['a', 'a', 'b', 'c', 'b']
testdf = pd.DataFrame(test, columns=['test'])
testdf[testdf.test.duplicated()].index.to_list()

# Wall time: 2 ms
# [1, 4]
Answered By: Sumaia A
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.