Pandas compare rows and columns from different excel files, update value or append value

Question:

currently i have two cvs files, one is the temporary file (df1) which has 10+ columns, the other is the Master file (df2) and has only 2 columns. I would like to iterate over rows and compare values from a column that is in both files (UserName) and if the value of UserName is already present in the Master File, add the value of the other column that appears in both files (Score) to the value of Score in Master File (df2) for that specific user. If on the other hand, the value of UserName from the temporary file is not present in the Master File, just append row to the Master Table as new row.

example Master file (df2):

masterTable

example temp file (df1):

enter image description here

new master file i would like to obtain after comparison:

enter image description here

i have the following code but currently it appends all rows every time a comparison is made between the 2 files, could use some help to determine if it’s even a good approach for the described problem:

import os
import win32com.client
import pandas as pd
import numpy as np

path = os.path.expanduser("Attachments")
MasterFile=os.path.expanduser("MasterFile.csv")
fields = ['UserName', 'Score']

df1 = pd.read_csv(zipfilepath, skipinitialspace=True, usecols=fields)
df2 = pd.read_csv(MasterFile, skipinitialspace=True, usecols=fields)
comparison_values = df1['UserName'].values == df2['UserName'].values
print(comparison_values)
rows = np.where(comparison_values == False)
for item in comparison_values:
    if item==True:
        df2['Score']=df2['Score']+df1['Score']
    else:
        df2 = df2.append(df1[{'UserName', 'Score'}], ignore_index=True)
        df2.to_csv(MasterFile, mode='a', index=False, header=False)

EDIT**

what about a mix of integers and strings in the 2 files? Example

Example Master File (df2):

MasterFile

Example temp file (df1):

TempFile

new master file i would like to obtain after comparison:

newMaster

Asked By: Angelo Pitarresi

||

Answers:

IIUC, you can use

df = pd.concat([df1, df2]).groupby('UserName', as_index=False).sum()
Answered By: Ynjxsjmh
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.