pandas calculate mean of column that has lists instead of single value

Question

I have a pandas dataframe that has one column and it has a list of values in each row. I need to calculate the mean using the corresponding values from each row. That is I need the mean for eight values in the list. each element in the list is the value of a variable

>>> df_ex
0    [1, 2, 3, 4, 5, 6, 7, 8]
1    [2, 3, 4, 5, 6, 7, 8, 1]

I tried converting it to numpy array and then taking the means but I keep getting an error TypeError: unsupported operand type(s) for /: 'list' and 'int'. I understand that instead of using lists, I should convert it to columns, but that in my context won’t be possible. Any idea on how I could accomplish this?

Asked By: Clock Slave

||

Source

Answer 1

You can convert to nested lists first and then to array and then calculate the mean:

a = np.array(df_ex.tolist())
print (a)
[[1 2 3 4 5 6 7 8]
 [2 3 4 5 6 7 8 1]]
 
# Mean of all values
print (a.mean())
4.5

# Specify row-wise mean
print (a.mean(axis=1))
[ 4.5  4.5]

# Specify column-wise mean
print (a.mean(axis=0))
[ 1.5  2.5  3.5  4.5  5.5  6.5  7.5  4.5]

Answered By: jezrael

Answer 2

You can call on np.mean by passing nested lists and specifying an axis.

Setup

df_ex = pd.DataFrame(dict(
    col1=[[1, 2, 3, 4, 5, 6, 7, 8],
          [2, 3, 4, 5, 6, 7, 8, 1]]))

df_ex

                       col1
0  [1, 2, 3, 4, 5, 6, 7, 8]
1  [2, 3, 4, 5, 6, 7, 8, 1]

Solution

np.mean(df_ex['col1'].tolist(), axis=1)

array([ 4.5,  4.5])

Or

np.mean(df_ex['col1'].tolist(), axis=0)

array([ 1.5,  2.5,  3.5,  4.5,  5.5,  6.5,  7.5,  4.5])

Answered By: piRSquared

Answer 3

Easiest way:

col.apply(np.mean)

Answered By: keramat

Answer 4

from ast import literal_eval  
import pandas as pd  
df=pd.read_csv("yourfile.csv", converters={"listcol": pd.eval})
def getMean(t:list[int]):  
    return sum(t)/len(t)
df["mean of listcol"]=df.apply(lambda row: getMean(row["listcol"]), axis=1)

#To get mean of column, where each row is a list, take np.sum(df["mean of listcol"])/ len(df)

Answered By: zhc_96

pandas calculate mean of column that has lists instead of single value

Question:

Answers: