Select and remove rows in pandas dataframe
Question:
Answers:
Sample the rows then drop the rows from original dataframe corresponding to the index of sample
df1 = df.sample(n=5)
df2 = df.drop(df1.index)
You can use sample
:
df1 = df.sample(n=5)
df2 = df.drop(df1.index)
Output:
>>> df1
id color price
9 10 b 20
6 7 b 20
8 9 g 20
7 8 g 20
1 2 r 20
>>> df2
id color price
0 1 r 20
2 3 y 20
3 4 y 20
4 5 g 20
5 6 r 20
If your goal is to split your dataframe in 2 equal parts, you can do:
import numpy as np
df1, df2 = np.array_split(df.sample(frac=1), 2)
You can do outer join between the two, and the keep only entries that are present in first DataFrame.
import pandas as pd
df1 = YOUR DATAFRAME
df2 = df1.sample(5).copy().reset_index(drop=True)
df=pd.merge(df1,df2,on=df1.columns.tolist(),how="outer",indicator=True)
df=df[df['_merge']=='left_only'].drop(columns=["_merge"])
You can use .sample
to get n-5
records rather than getting 5 and dropping them following way
import pandas as pd
df = pd.DataFrame({"col1":[1,2,3,4,5,6,7,8,9,10,11,12]})
df2 = df.sample(len(df)-5).sort_index()
print(df2)
possible outcome
col1
0 1
1 2
2 3
3 4
4 5
5 6
7 8
sort_index
is used to restore original order, you might do not use it if you do not care about order.
Sample the rows then drop the rows from original dataframe corresponding to the index of sample
df1 = df.sample(n=5)
df2 = df.drop(df1.index)
You can use sample
:
df1 = df.sample(n=5)
df2 = df.drop(df1.index)
Output:
>>> df1
id color price
9 10 b 20
6 7 b 20
8 9 g 20
7 8 g 20
1 2 r 20
>>> df2
id color price
0 1 r 20
2 3 y 20
3 4 y 20
4 5 g 20
5 6 r 20
If your goal is to split your dataframe in 2 equal parts, you can do:
import numpy as np
df1, df2 = np.array_split(df.sample(frac=1), 2)
You can do outer join between the two, and the keep only entries that are present in first DataFrame.
import pandas as pd
df1 = YOUR DATAFRAME
df2 = df1.sample(5).copy().reset_index(drop=True)
df=pd.merge(df1,df2,on=df1.columns.tolist(),how="outer",indicator=True)
df=df[df['_merge']=='left_only'].drop(columns=["_merge"])
You can use .sample
to get n-5
records rather than getting 5 and dropping them following way
import pandas as pd
df = pd.DataFrame({"col1":[1,2,3,4,5,6,7,8,9,10,11,12]})
df2 = df.sample(len(df)-5).sort_index()
print(df2)
possible outcome
col1
0 1
1 2
2 3
3 4
4 5
5 6
7 8
sort_index
is used to restore original order, you might do not use it if you do not care about order.