Select and remove rows in pandas dataframe

Question

I have a dataframe and want to randomly select 5 rows from it and have them in another dataframe, then remove those rows from my first dataframe.

For example in this dataframe:

Randomly select these rows:

And have first dataframe without them :

How Can I do this action?

Asked By: ariankazemi

||

Source

Answer 1

Sample the rows then drop the rows from original dataframe corresponding to the index of sample

df1 = df.sample(n=5)
df2 = df.drop(df1.index)

Answered By: Shubham Sharma

Answer 2

You can use sample:

df1 = df.sample(n=5)
df2 = df.drop(df1.index)

Output:

>>> df1
   id color  price
9  10     b     20
6   7     b     20
8   9     g     20
7   8     g     20
1   2     r     20

>>> df2
   id color  price
0   1     r     20
2   3     y     20
3   4     y     20
4   5     g     20
5   6     r     20

If your goal is to split your dataframe in 2 equal parts, you can do:

import numpy as np

df1, df2 = np.array_split(df.sample(frac=1), 2)

Answered By: Corralien

Answer 3

You can do outer join between the two, and the keep only entries that are present in first DataFrame.

import pandas as pd

df1 = YOUR DATAFRAME
df2 = df1.sample(5).copy().reset_index(drop=True)

df=pd.merge(df1,df2,on=df1.columns.tolist(),how="outer",indicator=True)
df=df[df['_merge']=='left_only'].drop(columns=["_merge"])

Answered By: Rahul Vishwakarma

Answer 4

You can use .sample to get n-5 records rather than getting 5 and dropping them following way

import pandas as pd
df = pd.DataFrame({"col1":[1,2,3,4,5,6,7,8,9,10,11,12]})
df2 = df.sample(len(df)-5).sort_index()
print(df2)

possible outcome

sort_index is used to restore original order, you might do not use it if you do not care about order.

Answered By: Daweo

Select and remove rows in pandas dataframe

Question:

Answers: