pandas groupby mean with nan

Question:

I have the following dataframe:

date id  cars
2012 1    4  
2013 1    6
2014 1    NaN    
2012 2    10 
2013 2    20 
2014 2    NaN  

Now, I want to get the mean of cars over the years for each id ignoring the NaN’s. The result should be like this:

date id  cars  result
2012 1    4      5
2013 1    6      5
2014 1    NaN    5
2012 2    10     15
2013 2    20     15
2014 2    NaN    15

I have the following command:

df["result"]=df.groupby("id")["cars"].mean()

The command runs without errors, but the result column only has NaN’s.
What did I do wrong?

Asked By: freddy888

||

Answers:

Use transform, this returns a series the same size as the original:

df["result"]=df.groupby("id")["cars"].transform('mean')
print (df)
   date  id  cars  result
0  2012   1   4.0     5.0
1  2013   1   6.0     5.0
2  2014   1   NaN     5.0
3  2012   2  10.0    15.0
4  2013   2  20.0    15.0
5  2014   2   NaN    15.0
Answered By: jezrael

Hello good old 2017 question. This is just another way with a lot of overhead. You write about getting only NaN values as the mean (as soon as one of the numbers is NaN), with df["result"]=df.groupby("id")["cars"].mean(). In 2023, I did not run into this problem. Perhaps, this has been fixed in later versions? Anyway, if you face this in whatever time and space again, you might want to know in the first place how to get the mean per id without NaN weighing out everything:

import numpy as np
np.seterr(divide='ignore', invalid='ignore')
df.groupby(['id']).apply(lambda x: np.average(x['cars'].dropna()))

After this, join on the id:s. I do not take the time to show this since this answer has a lot of overhead for your question at hand and should not be put to work. There might just be some who search for a way to get the means without NaNs in the first place.