Calculate column average row by row using pandas

Question:

I have the following pandas DF:

    val
1   10
2   20
3   30
4   40
5   30

I want to get two output columns: avg and avg_sep

avg should be the average calculated row by row.

avg_sep should be the average calculated row by row until a certain condition (i.e. until row 3 I calculate one average, before row 3 I start calculating another average), my expected output is:

    val  avg  avg_sep
1   10   10   10
2   20   15   15
3   30   20   20
4   40   25   40
5   30   26   35

I know I can use df.mean(axis=0) to get the average of the column. But how can I get the expected output?

Asked By: OdiumPura

||

Answers:

From the discussion in the comments:

import pandas as pd
import numpy as np

# Building frame:
df = pd.DataFrame(
    data={"val": [10, 20, 30, 40, 30]},
    index=[1, 2, 3, 4, 5]
)

# Solution:
df["avg"] = df["val"].cumsum() / np.arange(1, 6) # or `/ df.index`
df.loc[:3, "avg_sep"] = df.loc[:3, "val"].cumsum() / np.arange(1, 4)
df.loc[4:, "avg_sep"] = df.loc[4:, "val"].cumsum() / np.arange(1, 3)
Answered By: Chrysophylaxs

Use expanding with mean():

df = pd.DataFrame(data=[[10],[20],[30],[40],[30]], columns=["val"])

df["avg"] = df["val"].expanding().mean()

split_at = 3
df["sep_flag"] = pd.concat([df["val"][:split_at,].expanding().mean(), df["val"][split_at:,].expanding().mean()])

[Out]:
   val   avg  sep_flag
0   10  10.0      10.0
1   20  15.0      15.0
2   30  20.0      20.0
3   40  25.0      40.0
4   30  26.0      35.0
Answered By: Azhar Khan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.