Select and remove rows in pandas dataframe


I have a dataframe and want to randomly select 5 rows from it and have them in another dataframe, then remove those rows from my first dataframe.

For example in this dataframe:

enter image description here

Randomly select these rows:

enter image description here

And have first dataframe without them :

enter image description here

How Can I do this action?

Asked By: ariankazemi



Sample the rows then drop the rows from original dataframe corresponding to the index of sample

df1 = df.sample(n=5)
df2 = df.drop(df1.index)
Answered By: Shubham Sharma

You can use sample:

df1 = df.sample(n=5)
df2 = df.drop(df1.index)


>>> df1
   id color  price
9  10     b     20
6   7     b     20
8   9     g     20
7   8     g     20
1   2     r     20

>>> df2
   id color  price
0   1     r     20
2   3     y     20
3   4     y     20
4   5     g     20
5   6     r     20

If your goal is to split your dataframe in 2 equal parts, you can do:

import numpy as np

df1, df2 = np.array_split(df.sample(frac=1), 2)
Answered By: Corralien

You can do outer join between the two, and the keep only entries that are present in first DataFrame.

import pandas as pd

df2 = df1.sample(5).copy().reset_index(drop=True)

Answered By: Rahul Vishwakarma

You can use .sample to get n-5 records rather than getting 5 and dropping them following way

import pandas as pd
df = pd.DataFrame({"col1":[1,2,3,4,5,6,7,8,9,10,11,12]})
df2 = df.sample(len(df)-5).sort_index()

possible outcome

0     1
1     2
2     3
3     4
4     5
5     6
7     8

sort_index is used to restore original order, you might do not use it if you do not care about order.

Answered By: Daweo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.