mean calculation in pandas excluding zeros
Question:
Is there a direct way to calculate the mean of a dataframe column in pandas but not taking into account data that has zero as a value? Like a parameter inside the .mean() function?
Was currently doing it like this:
x = df[df[A]!=0]
x.mean()
Answers:
It also depends on the meaning of 0 in your data.
- If these are indeed ‘0’ values, then your approach is good
-
If ‘0’ is a placeholder for a value that was not measured (i.e. ‘NaN’), then it might make more sense to replace all ‘0’ occurrences
with ‘NaN’ first. Calculation of the mean then by default exclude NaN
values.
df = pd.DataFrame([1, 0, 2, 3, 0], columns=['a'])
df = df.replace(0, np.NaN)
df.mean()
Very late to the discussion, but you can also do:
df[df["Column_name"] != 0].mean()
You can convert df to numpy array and use numpy.nanmean()
import numpy as np
df = pd.DataFrame(data=np.array([[1, 2],
[3, 4],
[6, 7],
[8, np.nan],
[np.nan, 11]]),
columns=['A', 'B'])
df_col_means = numpy.nanmean(df.values) # by columns
df_row_means = numpy.nanmean(df.values, axis=1) # by rows
col_A_mean = numpy.nanmean(df['A'].values) # particular column mean
df[df["Column_name"] != 0]["Column_name"].mean()
or if your column name does not contain space char
df[df.Column_Name != 0].Column_Name.mean()
hopefully it can be included as a parameter in the next "mean" object version
.mean(exclude=0) #wondering in next versions
Is there a direct way to calculate the mean of a dataframe column in pandas but not taking into account data that has zero as a value? Like a parameter inside the .mean() function?
Was currently doing it like this:
x = df[df[A]!=0]
x.mean()
It also depends on the meaning of 0 in your data.
- If these are indeed ‘0’ values, then your approach is good
-
If ‘0’ is a placeholder for a value that was not measured (i.e. ‘NaN’), then it might make more sense to replace all ‘0’ occurrences
with ‘NaN’ first. Calculation of the mean then by default exclude NaN
values.df = pd.DataFrame([1, 0, 2, 3, 0], columns=['a']) df = df.replace(0, np.NaN) df.mean()
Very late to the discussion, but you can also do:
df[df["Column_name"] != 0].mean()
You can convert df to numpy array and use numpy.nanmean()
import numpy as np
df = pd.DataFrame(data=np.array([[1, 2],
[3, 4],
[6, 7],
[8, np.nan],
[np.nan, 11]]),
columns=['A', 'B'])
df_col_means = numpy.nanmean(df.values) # by columns
df_row_means = numpy.nanmean(df.values, axis=1) # by rows
col_A_mean = numpy.nanmean(df['A'].values) # particular column mean
df[df["Column_name"] != 0]["Column_name"].mean()
or if your column name does not contain space char
df[df.Column_Name != 0].Column_Name.mean()
hopefully it can be included as a parameter in the next "mean" object version
.mean(exclude=0) #wondering in next versions