Grouping values in a column by a criteria and getting their mean using Python / Pandas
Question:
I have data on movies and all movies have IMDB score, however some do not have a meta critic score
Eg:
Name
IMDB Score
Meta Score
B
8
86
C
8
90
D
8
null
E
8
91
F
7
66
G
3
44
I want to fill in the null values in the meta critic score with the mean of the values of movies that have the same IMDB score
so the null value in this table should be replaced by the mean of movies B,C,E
How would I achieve this with Numpy / Pandas?
I looked up online and the closest solution I could find was averaging all the metacritic scores and replacing the null values with that Average.
Answers:
groupby
+ fillna
df.groupby('IMDB Score')['Meta Score'].apply(lambda x: x.fillna(x.mean()))
output:
0 86.0
1 90.0
2 89.0
3 91.0
4 66.0
5 44.0
Name: Meta Score, dtype: float64
make result to Meta Score
column
You can sort the columns with missing values then do a forward fill:
df['Meta Score'] = df.groupby(['Name','IMDB Score'])['Meta Score'].ffill()
The following code first does the group by, then transform it by calculating mean.
df.groupby('IMDB Score')['Meta Score'].transform(lambda value: value.fillna(value.mean()))
And then output is:
0 86.0
1 90.0
2 89.0
3 91.0
4 66.0
5 44.0
Name: Meta Score, dtype: float64
you can also replace the Meta Score
column:
df["Meta Score"] = df.groupby("IMDB Score")["Meta Score"].transform(
lambda value: value.fillna(value.mean())
)
Name IMDB Score Meta Score
0 B 8 86.0
1 C 8 90.0
2 D 8 89.0
3 E 8 91.0
4 F 7 66.0
5 G 3 44.0
I have data on movies and all movies have IMDB score, however some do not have a meta critic score
Eg:
Name | IMDB Score | Meta Score |
---|---|---|
B | 8 | 86 |
C | 8 | 90 |
D | 8 | null |
E | 8 | 91 |
F | 7 | 66 |
G | 3 | 44 |
I want to fill in the null values in the meta critic score with the mean of the values of movies that have the same IMDB score
so the null value in this table should be replaced by the mean of movies B,C,E
How would I achieve this with Numpy / Pandas?
I looked up online and the closest solution I could find was averaging all the metacritic scores and replacing the null values with that Average.
groupby
+ fillna
df.groupby('IMDB Score')['Meta Score'].apply(lambda x: x.fillna(x.mean()))
output:
0 86.0
1 90.0
2 89.0
3 91.0
4 66.0
5 44.0
Name: Meta Score, dtype: float64
make result to Meta Score
column
You can sort the columns with missing values then do a forward fill:
df['Meta Score'] = df.groupby(['Name','IMDB Score'])['Meta Score'].ffill()
The following code first does the group by, then transform it by calculating mean.
df.groupby('IMDB Score')['Meta Score'].transform(lambda value: value.fillna(value.mean()))
And then output is:
0 86.0
1 90.0
2 89.0
3 91.0
4 66.0
5 44.0
Name: Meta Score, dtype: float64
you can also replace the Meta Score
column:
df["Meta Score"] = df.groupby("IMDB Score")["Meta Score"].transform(
lambda value: value.fillna(value.mean())
)
Name IMDB Score Meta Score
0 B 8 86.0
1 C 8 90.0
2 D 8 89.0
3 E 8 91.0
4 F 7 66.0
5 G 3 44.0