Using Pandas, i'm trying to keep on my DataFrame only 100 rows of each value of my column "neighborhood"

Question:

I have a super large dataset that i’m trying to shrink.
My idea is to keep 100 rows by neighborhood.

Here’s an overview of my data :

index name neighborhood
0 name 1 neighborhood A
1 name 2 neighborhood A
2 name 3 neighborhood B
3 name 4 neighborhood B
4 name 5 neighborhood C
5 name 6 neighborhood C
6 name 7 neighborhood D
7 name 8 neighborhood D
8 name 9 neighborhood E
9 name 10 neighborhood E

What is the more efficient way to do so ?

Thanks in advance

I’m expecting to create something that looks like :

index name neighborhood
0 name 1 neighborhood A
1 name 3 neighborhood B
2 name 5 neighborhood C
3 name 7 neighborhood D
4 name 9 neighborhood E
Asked By: Julien8

||

Answers:

i think, you can use groupby and *nth:

dfx=df.groupby('neighborhood').nth[:100]
Answered By: Clegane

It depends how you want to select the rows.

first n with groupby.head:

n = 100
out = df.groupby('neighborhood').head(n)

random n rows with groupby.sample:

n = 100
out = df.groupby('neighborhood').sample(n=n)
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.