Fetching data from a csv with python pandas, if value found in one column, replace the whole row

Question:

I have a small script that asks three things to the user, let’s call them A, B, and C. I need to save this data to a CSV file like such:

,A,B,C
0,a,b,c
1,d,e,f
2,g,h,i

If the value of A is already on the CSV, I need to update values B and C on the same row as A.
If the value of A is not on the CSV, I need to append at the end of the CSV.

So, if the value of A is d, B is x, and C is y, the CSV should be updated to:

,A,B,C
0,a,b,c
1,d,x,y
2,g,h,i

And if the value of A is j, B is x, and C is y, the CSV should be updated to:

,A,B,C
0,a,b,c
1,d,e,f
2,g,h,i
3,j,x,y

So, this is what I came up with so far, but I think I just don’t know how to make this better:

def save_data(a, b, c):
    data = {'A': a, 'B': b, 'C': c}
    try:
        df = pd.read_csv('my_data.csv')
        saved_data = df.to_dict(orient='list')
        if a in saved_data['A']:
            position = saved_data['A'].index(a)
            saved_data['B'][position] = b
            saved_data['C'][position] = c
        else:
            first_key = next(iter(saved_data))
            saved_data.pop(first_key)
            for k, v in data.items():
                saved_data[k].append(v)
        df = df.from_dict(saved_data)
        df.to_csv('my_data.csv')

    except FileNotFoundError:
        df = pd.DataFrame([v for _, v in data.items()], columns=['A', 'B', 'C'])
        df.to_csv('my_data.csv')

I tried finding more optimal solutions to replace a full row on a DataFrame, but couldn’t find something quite like I need to do. I am wondering if I am overdoing it here or if there’s any more efficient way to tackle this problem.

Thank you for your help!

Asked By: Dani

||

Answers:

Concise solution using combine_first to update/append the values

def update_data(a, b, c):
    df = pd.DataFrame([{'A': a, 'B': b, 'C': c}])
    df = df.set_index('A')

    if os.path.exists('data.csv'):
        old_df = pd.read_csv('data.csv', index_col=['A'])
        df = df.combine_first(old_df)
    
    df.to_csv('data.csv')

Sample run

update_data(1, 2, 3)
# A,B,C
# 1,2,3

update_data(4, 5, 6)
# A,B,C
# 1,2,3
# 4,5,6

update_data(1, 25, 100)
# A,B,C
# 1,25,100
# 4,5,6

update_data(7, 8, 9)
# A,B,C
# 1,25,100
# 4,5,6
# 7,8,9
Answered By: Shubham Sharma
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.