Sum values of two Dataframe with Conditions

Question:

it’s the situation:
in a "for" loop i’m reading csv files as dataframe and add them to a main big df.
if new data is not already in main df, it will be simply concat to main df.
if new data has already a same value in main df, it has to sum with the pervious one.

an example:

main_df=
         building,  day,  kw/h
         1,         1,    50
         1,         2,    55
         2,         1,    30
         2,         2,    40

new_df_1=
         building,  day,  kw/h
         3,         1,    55
         3,         2,    58

new_df_2=

         building,  day,  kw/h
         2,         1,    15
         2,         2,    19
         2,         3,    14

new_df_2 will simply concat to main_df, but new_df_2 has to sum with existed data. So the desire answer will be:

         building,  day,  kw/h
         1,         1,    50
         1,         2,    55
         2,         1,    45
         2,         2,    59
         2,         3,    14
         3,         1,    55
         3,         2,    58

Question: 1. how can i check that, is there already same building and same day in main_df?
2. how can i sum the values of "KW/h" coulmns?
i mean:

for csv in path_list:
read= pd.read_csv(csv)
   if ( **already exist** ):
       **sum with old values**
   else:
       main_df= pd.concat([main_df,read])
      
  

i was searching for some function like "merge" with indicator "on=" for automaticly find the right values and sum them. but i couldn’t find any thing.
Any idea would help.

Asked By: M.A.Sedigh

||

Answers:

concat the three DF and then sum the kw/h on building and day

df=pd.concat([main_df, new_df1, new_df2])
df=df.groupby(['building','day'], as_index=False)['kw/h'].sum()
df
    building    day     kw/h
0          1      1     50
1          1      2     55
2          2      1     45
3          2      2     59
4          2      3     14
5          3      1     55
6          3      2     58
Answered By: Naveed
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.