Averaging of several values

Question:

I have a dataset (df3) with five columns x, y, r, g and b, although I only need to work with x, y and r. I want to find the average of all the consecutive rows in which the value of r is equal and store it in a database (df_final). To do this, I have generated a code that stores all the values in which r is equal to the one in previous row in a temporary database (df_inter), to later store the average of all the values in the final database (df_final). The code is this one:

d = {'x':[1,2,3,4,5,6,7],'y':[1,1,1,1,1,1,1],'r':[2,2,2,1,1,3,2]}
df3 = pd.Dataframe(data=d)
for i in range(len(df3)):
  if df3.iloc[i,3] == df3.iloc[i-1,3]:
    df_inter = pd.DataFrame(columns=['x','y', 'r'])
    df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
    df_inter.to_csv(f'Resultados/df_inter.csv', index=False, sep=',')
  else:
    df_final.append(df_inter['x'].mean(),df_inter['y'].mean(),df_inter['r'].mean())
    del [[df_inter]]
    gc.collect()
    df_inter=pd.DataFrame()
    df_inter = pd.DataFrame(columns=['x','y', 'r'])
    df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
    df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')

The objective is from a dataset for example like this:

x y r
1 1 2
2 1 2
3 1 2
4 1 1
5 1 1
6 1 3
7 1 2

Get something like this:

x y r
2 1 2
4.5 1 1
6 1 3
7 1 2

Nevertheless, when I execute the code I get this error message:

TypeError: cannot concatenate object of type '<class 'numpy.int64'>'; only Series and DataFrame objs are valid

I’m not sure what the problem is or even if there is a code more efficient for the purpose. Please, I would be grateful if you could help me. Thank you in advance.

Irene


I solved it. The right code for my purpose would be:

d = {'x':[1,2,3,4,5,6,7],'y':[1,1,1,1,1,1,1],'r':[2,2,2,1,1,3,2]}
df3 = pd.Dataframe(data=d)
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_final = pd.DataFrame(columns=['x','y','r'])

for i in df3.index.values:
  if df3.iloc[i,2] == df3.iloc[i-1,2]:
    df_inter = df_inter.append({'x':df3.iloc[i,0],'y':df3.iloc[i,1],'r':df3.iloc[i,2]}, ignore_index=True)
  else:
    df_final = df_final.append({'x':df_inter['x'].mean(),'y':df_inter['y'].mean(),'r':df_inter['r'].mean()}, ignore_index=True)
    df_inter = pd.DataFrame(columns=['x','y', 'r'])
    df_inter = df_inter.append({'x':df3.iloc[i,0],'y':df3.iloc[i,1],'r':df3.iloc[i,2]}, ignore_index=True)

df_final = df_final.append({'x':df_inter['x'].mean(),'y':df_inter['y'].mean(),'r':df_inter['r'].mean()}, ignore_index=True)
df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')
Asked By: ISanram

||

Answers:

You may want to append to the end of the dataframe using


df_inter = df_inter.append({'x':df3.iloc[i,1],'y':df3.iloc[i,2],'r':df3.iloc[i,3]}, ignore_index=True)

Answered By: ScottC

If you have some knowledge of SQL, it can be intuitively done using sqldf and pandas:

import sqldf
import pandas as pd

df = pd.DataFrame({"class":[1,1,1,2,2,2,1,2,2,1],"value":[10,10,12,11,15,17,98,23,22,0]})

averages = sqldf.run("""
                     SELECT class,AVG(value)
                     FROM df
                     GROUP BY class
""")

The output being:

    class  AVG(value)

0      1        26.0
1      2        17.6

Is that what you are looking for ?

Answered By: GaëtanLF
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.