Why is pandas.Series.std() different from numpy.std()?
Question:
This is what I am trying to explain:
>>> a = pd.Series([7, 20, 22, 22])
>>> a.std()
7.2284161474004804
>>> np.std(a)
6.2599920127744575
I have data about many different restaurants. For simplicity I have extracted just one restaurant with four items:
>>> df
restaurant_id price
id
1 10407 7
3 10407 20
6 10407 22
13 10407 22
For each restaurant, I want to get the standard deviation, however, Pandas returns wrong values.
>>> df.groupby('restaurant_id').std()
price
restaurant_id
10407 7.228416
We can get the correct value with np.std()
:
>>> np.std(df['price'])
6.2599920127744575
But obviously, this is not a solution when I have more than one restaurant. How do I do this properly?
Just to make sure, I checked that df['price'].mean() == np.mean(df['price'])
.
There is a related discussion here, but their suggestions do not work either.
Answers:
Pandas std is using Bessel’s correction by default — that is, the standard deviation formula with N-1
instead of N
in the denominator. To use N-0
:
a.std(ddof=0) == np.std(a)
This is what I am trying to explain:
>>> a = pd.Series([7, 20, 22, 22])
>>> a.std()
7.2284161474004804
>>> np.std(a)
6.2599920127744575
I have data about many different restaurants. For simplicity I have extracted just one restaurant with four items:
>>> df
restaurant_id price
id
1 10407 7
3 10407 20
6 10407 22
13 10407 22
For each restaurant, I want to get the standard deviation, however, Pandas returns wrong values.
>>> df.groupby('restaurant_id').std()
price
restaurant_id
10407 7.228416
We can get the correct value with np.std()
:
>>> np.std(df['price'])
6.2599920127744575
But obviously, this is not a solution when I have more than one restaurant. How do I do this properly?
Just to make sure, I checked that df['price'].mean() == np.mean(df['price'])
.
There is a related discussion here, but their suggestions do not work either.
Pandas std is using Bessel’s correction by default — that is, the standard deviation formula with N-1
instead of N
in the denominator. To use N-0
:
a.std(ddof=0) == np.std(a)