Grouping of subset of multiple Column in pandas
Question:
Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method.
Need to group the value ‘Tom’ from name and ‘UK’ from Country and fill the missing value in value using mode.
Name
Country
Value
Tom
USA
30.0
Tom
UK
20.0
Tom
UK
NaN
nick
USA
19.0
nick
USA
18.0
Tom
India
23.0
nick
USA
15.0
nick
USA
20.0
jack
India
NaN
Tom
UK
23.0
Tom
UK
23.0
I tried the code below.
np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])
Answers:
here is one way using group by
and ffill
:
df["Value"] = df.groupby(["Name", "Country"])["Value"].ffill()
output :
Name Country Value
0 Tom USA 20.0
1 Tom UK 20.0
2 Tom UK 20.0
3 nick USA 19.0
4 nick USA 18.0
5 Tom India 23.0
6 nick USA 15.0
7 nick USA 20.0
8 jack India NaN
9 Tom UK 23.0
10 Tom UK 23.0
I use .eq() .lt() .ge()
etc, to avoid excess ()
everywhere when combining conditions~
Also, since this isn’t an if...else
, we can just use pd.DataFrame.mask
instead of np.where
~
I also assumed that you only want the mode
of the same group~ rather than the whole column. If you want the whole column change data.loc[mask, 'Value']
to just data['Value']
mask = (data['Name'].eq('Tom') &
data['Country'].eq('UK'))
data['Value'] = data['Value'].mask(mask & data['Value'].isna(),
data.loc[mask, 'Value'].mode()[0])
Name Country Value
0 Tom USA 30.0
1 Tom UK 20.0
2 Tom UK 23.0
3 nick USA 19.0
4 nick USA 18.0
5 Tom India 23.0
6 nick USA 15.0
7 nick USA 20.0
8 jack India NaN
9 Tom UK 23.0
10 Tom UK 23.0
You don’t really need to go as far as masking the data for the mode you can simply try:
data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])
Edit:
If you’d like to address only those Null values where the Name is Tom and Country UK, you can add an additional criteria:
data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK') & (data['Value'].isna()),data['Value'].mode()[0],data['Value'])
Having a data set as below.Here I need to group the subset in column and fill the missing values using mode method.
Need to group the value ‘Tom’ from name and ‘UK’ from Country and fill the missing value in value using mode.
Name | Country | Value |
---|---|---|
Tom | USA | 30.0 |
Tom | UK | 20.0 |
Tom | UK | NaN |
nick | USA | 19.0 |
nick | USA | 18.0 |
Tom | India | 23.0 |
nick | USA | 15.0 |
nick | USA | 20.0 |
jack | India | NaN |
Tom | UK | 23.0 |
Tom | UK | 23.0 |
I tried the code below.
np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])
here is one way using group by
and ffill
:
df["Value"] = df.groupby(["Name", "Country"])["Value"].ffill()
output :
Name Country Value
0 Tom USA 20.0
1 Tom UK 20.0
2 Tom UK 20.0
3 nick USA 19.0
4 nick USA 18.0
5 Tom India 23.0
6 nick USA 15.0
7 nick USA 20.0
8 jack India NaN
9 Tom UK 23.0
10 Tom UK 23.0
I use .eq() .lt() .ge()
etc, to avoid excess ()
everywhere when combining conditions~
Also, since this isn’t an if...else
, we can just use pd.DataFrame.mask
instead of np.where
~
I also assumed that you only want the mode
of the same group~ rather than the whole column. If you want the whole column change data.loc[mask, 'Value']
to just data['Value']
mask = (data['Name'].eq('Tom') &
data['Country'].eq('UK'))
data['Value'] = data['Value'].mask(mask & data['Value'].isna(),
data.loc[mask, 'Value'].mode()[0])
Name Country Value
0 Tom USA 30.0
1 Tom UK 20.0
2 Tom UK 23.0
3 nick USA 19.0
4 nick USA 18.0
5 Tom India 23.0
6 nick USA 15.0
7 nick USA 20.0
8 jack India NaN
9 Tom UK 23.0
10 Tom UK 23.0
You don’t really need to go as far as masking the data for the mode you can simply try:
data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK'),data['Value'].fillna(data['Value'].mode()[0]),data['Value'])
Edit:
If you’d like to address only those Null values where the Name is Tom and Country UK, you can add an additional criteria:
data['Value'] = np.where((data['Name'] == 'Tom') & (data['Country']=='UK') & (data['Value'].isna()),data['Value'].mode()[0],data['Value'])