Averaging of several values
Question:
I have a dataset (df3) with five columns x, y, r, g and b, although I only need to work with x, y and r. I want to find the average of all the consecutive rows in which the value of r is equal and store it in a database (df_final). To do this, I have generated a code that stores all the values in which r is equal to the one in previous row in a temporary database (df_inter), to later store the average of all the values in the final database (df_final). The code is this one:
d = {'x':[1,2,3,4,5,6,7],'y':[1,1,1,1,1,1,1],'r':[2,2,2,1,1,3,2]}
df3 = pd.Dataframe(data=d)
for i in range(len(df3)):
if df3.iloc[i,3] == df3.iloc[i-1,3]:
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
df_inter.to_csv(f'Resultados/df_inter.csv', index=False, sep=',')
else:
df_final.append(df_inter['x'].mean(),df_inter['y'].mean(),df_inter['r'].mean())
del [[df_inter]]
gc.collect()
df_inter=pd.DataFrame()
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')
The objective is from a dataset for example like this:
x
y
r
1
1
2
2
1
2
3
1
2
4
1
1
5
1
1
6
1
3
7
1
2
Get something like this:
x
y
r
2
1
2
4.5
1
1
6
1
3
7
1
2
Nevertheless, when I execute the code I get this error message:
TypeError: cannot concatenate object of type '<class 'numpy.int64'>'; only Series and DataFrame objs are valid
I’m not sure what the problem is or even if there is a code more efficient for the purpose. Please, I would be grateful if you could help me. Thank you in advance.
Irene
I solved it. The right code for my purpose would be:
d = {'x':[1,2,3,4,5,6,7],'y':[1,1,1,1,1,1,1],'r':[2,2,2,1,1,3,2]}
df3 = pd.Dataframe(data=d)
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_final = pd.DataFrame(columns=['x','y','r'])
for i in df3.index.values:
if df3.iloc[i,2] == df3.iloc[i-1,2]:
df_inter = df_inter.append({'x':df3.iloc[i,0],'y':df3.iloc[i,1],'r':df3.iloc[i,2]}, ignore_index=True)
else:
df_final = df_final.append({'x':df_inter['x'].mean(),'y':df_inter['y'].mean(),'r':df_inter['r'].mean()}, ignore_index=True)
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter = df_inter.append({'x':df3.iloc[i,0],'y':df3.iloc[i,1],'r':df3.iloc[i,2]}, ignore_index=True)
df_final = df_final.append({'x':df_inter['x'].mean(),'y':df_inter['y'].mean(),'r':df_inter['r'].mean()}, ignore_index=True)
df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')
Answers:
You may want to append to the end of the dataframe using
df_inter = df_inter.append({'x':df3.iloc[i,1],'y':df3.iloc[i,2],'r':df3.iloc[i,3]}, ignore_index=True)
If you have some knowledge of SQL
, it can be intuitively done using sqldf
and pandas
:
import sqldf
import pandas as pd
df = pd.DataFrame({"class":[1,1,1,2,2,2,1,2,2,1],"value":[10,10,12,11,15,17,98,23,22,0]})
averages = sqldf.run("""
SELECT class,AVG(value)
FROM df
GROUP BY class
""")
The output being:
class AVG(value)
0 1 26.0
1 2 17.6
Is that what you are looking for ?
I have a dataset (df3) with five columns x, y, r, g and b, although I only need to work with x, y and r. I want to find the average of all the consecutive rows in which the value of r is equal and store it in a database (df_final). To do this, I have generated a code that stores all the values in which r is equal to the one in previous row in a temporary database (df_inter), to later store the average of all the values in the final database (df_final). The code is this one:
d = {'x':[1,2,3,4,5,6,7],'y':[1,1,1,1,1,1,1],'r':[2,2,2,1,1,3,2]}
df3 = pd.Dataframe(data=d)
for i in range(len(df3)):
if df3.iloc[i,3] == df3.iloc[i-1,3]:
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
df_inter.to_csv(f'Resultados/df_inter.csv', index=False, sep=',')
else:
df_final.append(df_inter['x'].mean(),df_inter['y'].mean(),df_inter['r'].mean())
del [[df_inter]]
gc.collect()
df_inter=pd.DataFrame()
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')
The objective is from a dataset for example like this:
x | y | r |
---|---|---|
1 | 1 | 2 |
2 | 1 | 2 |
3 | 1 | 2 |
4 | 1 | 1 |
5 | 1 | 1 |
6 | 1 | 3 |
7 | 1 | 2 |
Get something like this:
x | y | r |
---|---|---|
2 | 1 | 2 |
4.5 | 1 | 1 |
6 | 1 | 3 |
7 | 1 | 2 |
Nevertheless, when I execute the code I get this error message:
TypeError: cannot concatenate object of type '<class 'numpy.int64'>'; only Series and DataFrame objs are valid
I’m not sure what the problem is or even if there is a code more efficient for the purpose. Please, I would be grateful if you could help me. Thank you in advance.
Irene
I solved it. The right code for my purpose would be:
d = {'x':[1,2,3,4,5,6,7],'y':[1,1,1,1,1,1,1],'r':[2,2,2,1,1,3,2]}
df3 = pd.Dataframe(data=d)
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_final = pd.DataFrame(columns=['x','y','r'])
for i in df3.index.values:
if df3.iloc[i,2] == df3.iloc[i-1,2]:
df_inter = df_inter.append({'x':df3.iloc[i,0],'y':df3.iloc[i,1],'r':df3.iloc[i,2]}, ignore_index=True)
else:
df_final = df_final.append({'x':df_inter['x'].mean(),'y':df_inter['y'].mean(),'r':df_inter['r'].mean()}, ignore_index=True)
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter = df_inter.append({'x':df3.iloc[i,0],'y':df3.iloc[i,1],'r':df3.iloc[i,2]}, ignore_index=True)
df_final = df_final.append({'x':df_inter['x'].mean(),'y':df_inter['y'].mean(),'r':df_inter['r'].mean()}, ignore_index=True)
df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')
You may want to append to the end of the dataframe using
df_inter = df_inter.append({'x':df3.iloc[i,1],'y':df3.iloc[i,2],'r':df3.iloc[i,3]}, ignore_index=True)
If you have some knowledge of SQL
, it can be intuitively done using sqldf
and pandas
:
import sqldf
import pandas as pd
df = pd.DataFrame({"class":[1,1,1,2,2,2,1,2,2,1],"value":[10,10,12,11,15,17,98,23,22,0]})
averages = sqldf.run("""
SELECT class,AVG(value)
FROM df
GROUP BY class
""")
The output being:
class AVG(value)
0 1 26.0
1 2 17.6
Is that what you are looking for ?