pandas calculate mean of column that has lists instead of single value
Question:
I have a pandas dataframe that has one column and it has a list of values in each row. I need to calculate the mean using the corresponding values from each row. That is I need the mean for eight values in the list. each element in the list is the value of a variable
>>> df_ex
0 [1, 2, 3, 4, 5, 6, 7, 8]
1 [2, 3, 4, 5, 6, 7, 8, 1]
I tried converting it to numpy array and then taking the means but I keep getting an error TypeError: unsupported operand type(s) for /: 'list' and 'int'
. I understand that instead of using lists, I should convert it to columns, but that in my context won’t be possible. Any idea on how I could accomplish this?
Answers:
You can convert to nested lists first and then to array
and then calculate the mean
:
a = np.array(df_ex.tolist())
print (a)
[[1 2 3 4 5 6 7 8]
[2 3 4 5 6 7 8 1]]
# Mean of all values
print (a.mean())
4.5
# Specify row-wise mean
print (a.mean(axis=1))
[ 4.5 4.5]
# Specify column-wise mean
print (a.mean(axis=0))
[ 1.5 2.5 3.5 4.5 5.5 6.5 7.5 4.5]
You can call on np.mean
by passing nested lists and specifying an axis.
Setup
df_ex = pd.DataFrame(dict(
col1=[[1, 2, 3, 4, 5, 6, 7, 8],
[2, 3, 4, 5, 6, 7, 8, 1]]))
df_ex
col1
0 [1, 2, 3, 4, 5, 6, 7, 8]
1 [2, 3, 4, 5, 6, 7, 8, 1]
Solution
np.mean(df_ex['col1'].tolist(), axis=1)
array([ 4.5, 4.5])
Or
np.mean(df_ex['col1'].tolist(), axis=0)
array([ 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 4.5])
Easiest way:
col.apply(np.mean)
from ast import literal_eval
import pandas as pd
df=pd.read_csv("yourfile.csv", converters={"listcol": pd.eval})
def getMean(t:list[int]):
return sum(t)/len(t)
df["mean of listcol"]=df.apply(lambda row: getMean(row["listcol"]), axis=1)
#To get mean of column, where each row is a list, take np.sum(df["mean of listcol"])/ len(df)
I have a pandas dataframe that has one column and it has a list of values in each row. I need to calculate the mean using the corresponding values from each row. That is I need the mean for eight values in the list. each element in the list is the value of a variable
>>> df_ex
0 [1, 2, 3, 4, 5, 6, 7, 8]
1 [2, 3, 4, 5, 6, 7, 8, 1]
I tried converting it to numpy array and then taking the means but I keep getting an error TypeError: unsupported operand type(s) for /: 'list' and 'int'
. I understand that instead of using lists, I should convert it to columns, but that in my context won’t be possible. Any idea on how I could accomplish this?
You can convert to nested lists first and then to array
and then calculate the mean
:
a = np.array(df_ex.tolist())
print (a)
[[1 2 3 4 5 6 7 8]
[2 3 4 5 6 7 8 1]]
# Mean of all values
print (a.mean())
4.5
# Specify row-wise mean
print (a.mean(axis=1))
[ 4.5 4.5]
# Specify column-wise mean
print (a.mean(axis=0))
[ 1.5 2.5 3.5 4.5 5.5 6.5 7.5 4.5]
You can call on np.mean
by passing nested lists and specifying an axis.
Setup
df_ex = pd.DataFrame(dict(
col1=[[1, 2, 3, 4, 5, 6, 7, 8],
[2, 3, 4, 5, 6, 7, 8, 1]]))
df_ex
col1
0 [1, 2, 3, 4, 5, 6, 7, 8]
1 [2, 3, 4, 5, 6, 7, 8, 1]
Solution
np.mean(df_ex['col1'].tolist(), axis=1)
array([ 4.5, 4.5])
Or
np.mean(df_ex['col1'].tolist(), axis=0)
array([ 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 4.5])
Easiest way:
col.apply(np.mean)
from ast import literal_eval
import pandas as pd
df=pd.read_csv("yourfile.csv", converters={"listcol": pd.eval})
def getMean(t:list[int]):
return sum(t)/len(t)
df["mean of listcol"]=df.apply(lambda row: getMean(row["listcol"]), axis=1)
#To get mean of column, where each row is a list, take np.sum(df["mean of listcol"])/ len(df)