Pandas df return indices of duplicated elements as a list
Question:
I’d like to have the indexes of duplicated column elements as a list. So far, the way I found is
test = ['a', 'a', 'b', 'c', 'b']
testdf = pd.DataFrame(test, columns=['test'])
np.asarray(np.where(list(testdf['test'].duplicated()))).tolist()[0]
# [1, 4]
Which seems ridiculously convoluted.
Any better way?
Answers:
Try this just indexing the index:
testdf.index[testdf['test'].duplicated()]
add to_list
:
testdf.index[testdf['test'].duplicated()].to_list()
Output:
[1, 4]
you can use .duplicated()
with .tolist()
testdf.index[testdf.test.duplicated()].tolist()
%%time
test = ['a', 'a', 'b', 'c', 'b']
testdf = pd.DataFrame(test, columns=['test'])
testdf[testdf.test.duplicated()].index.to_list()
# Wall time: 2 ms
# [1, 4]
I’d like to have the indexes of duplicated column elements as a list. So far, the way I found is
test = ['a', 'a', 'b', 'c', 'b']
testdf = pd.DataFrame(test, columns=['test'])
np.asarray(np.where(list(testdf['test'].duplicated()))).tolist()[0]
# [1, 4]
Which seems ridiculously convoluted.
Any better way?
Try this just indexing the index:
testdf.index[testdf['test'].duplicated()]
add to_list
:
testdf.index[testdf['test'].duplicated()].to_list()
Output:
[1, 4]
you can use .duplicated()
with .tolist()
testdf.index[testdf.test.duplicated()].tolist()
%%time
test = ['a', 'a', 'b', 'c', 'b']
testdf = pd.DataFrame(test, columns=['test'])
testdf[testdf.test.duplicated()].index.to_list()
# Wall time: 2 ms
# [1, 4]