How to divide a dataframe based on categorical variables?

Question:

I have a dataset where for some people credit card application is accepted while for others it is declined.

I want to divide the dataset into two datasets; one for which all the credit cards are accepted(card=’yes’) and the other for which all the credit cards are declined(card=’no’).

The dataset is as shown below:

enter image description here

How can I do that?

Asked By: Soumee

||

Answers:

this should work…

df1=credit5[credit5['card']=='yes'] #gets the subset of the df where all 'card' entries are yes

df2=credit5[credit5['card']=='no'] #gets the subset of the df where all 'card' entries are no
Answered By: Derek Eden

One option is to perform a groupby operation inside a dict comprehension. This has the added benefit of working for an arbitrary number of categories.

dfs_by_card = {
   accepted: sub_df
   for accepted, sub_df in credit5.groupby("card")
}
Answered By: PMende

Here is another solution, not much different from @Derek Eden solution.

 credit5=pd.DataFrame({'Card':['Yes','Yes','Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No'],'Age':[36, 35, 38, 38, 37, 37, 30, 30, 30, 33],'Income':[4.520, 2.420, 4.500, 2.540, 9.788, 5.268, 6.879, 7.852, 5.562, 4.789]}) #This is creating a dataframe

Actual dataframe:

   Card Age Income
0   Yes 36  4.520
1   Yes 35  2.420
2   Yes 38  4.500
3   No  38  2.540
4   No  37  9.788

credit_no = credit5[(credit5['Card'] == 'No')]

output: ‘No’

   Card Age Income
3   No  38  2.540
4   No  37  9.788
7   No  30  7.852
8   No  30  5.562
9   No  33  4.789

credit_yes = credit5[(credit5['Card'] == 'Yes')]

output: ‘Yes’

   Card Age Income
0   Yes 36  4.520
1   Yes 35  2.420
2   Yes 38  4.500
5   Yes 37  5.268
6   Yes 30  6.879

Let me know if this helps.

Answered By: Vishwas

Adding on to @Vishwas’s answer, you can get a minor speed boost by reversing the boolean mask.

credit_no = credit5[(credit5['Card'] == 'No')]
credit_yes = ~credit_no
Answered By: aa20896
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.