Fetching data from a csv with python pandas, if value found in one column, replace the whole row
Question:
I have a small script that asks three things to the user, let’s call them A, B, and C. I need to save this data to a CSV file like such:
,A,B,C
0,a,b,c
1,d,e,f
2,g,h,i
If the value of A is already on the CSV, I need to update values B and C on the same row as A.
If the value of A is not on the CSV, I need to append at the end of the CSV.
So, if the value of A is d, B is x, and C is y, the CSV should be updated to:
,A,B,C
0,a,b,c
1,d,x,y
2,g,h,i
And if the value of A is j, B is x, and C is y, the CSV should be updated to:
,A,B,C
0,a,b,c
1,d,e,f
2,g,h,i
3,j,x,y
So, this is what I came up with so far, but I think I just don’t know how to make this better:
def save_data(a, b, c):
data = {'A': a, 'B': b, 'C': c}
try:
df = pd.read_csv('my_data.csv')
saved_data = df.to_dict(orient='list')
if a in saved_data['A']:
position = saved_data['A'].index(a)
saved_data['B'][position] = b
saved_data['C'][position] = c
else:
first_key = next(iter(saved_data))
saved_data.pop(first_key)
for k, v in data.items():
saved_data[k].append(v)
df = df.from_dict(saved_data)
df.to_csv('my_data.csv')
except FileNotFoundError:
df = pd.DataFrame([v for _, v in data.items()], columns=['A', 'B', 'C'])
df.to_csv('my_data.csv')
I tried finding more optimal solutions to replace a full row on a DataFrame, but couldn’t find something quite like I need to do. I am wondering if I am overdoing it here or if there’s any more efficient way to tackle this problem.
Thank you for your help!
Answers:
Concise solution using combine_first
to update/append the values
def update_data(a, b, c):
df = pd.DataFrame([{'A': a, 'B': b, 'C': c}])
df = df.set_index('A')
if os.path.exists('data.csv'):
old_df = pd.read_csv('data.csv', index_col=['A'])
df = df.combine_first(old_df)
df.to_csv('data.csv')
Sample run
update_data(1, 2, 3)
# A,B,C
# 1,2,3
update_data(4, 5, 6)
# A,B,C
# 1,2,3
# 4,5,6
update_data(1, 25, 100)
# A,B,C
# 1,25,100
# 4,5,6
update_data(7, 8, 9)
# A,B,C
# 1,25,100
# 4,5,6
# 7,8,9
I have a small script that asks three things to the user, let’s call them A, B, and C. I need to save this data to a CSV file like such:
,A,B,C
0,a,b,c
1,d,e,f
2,g,h,i
If the value of A is already on the CSV, I need to update values B and C on the same row as A.
If the value of A is not on the CSV, I need to append at the end of the CSV.
So, if the value of A is d, B is x, and C is y, the CSV should be updated to:
,A,B,C
0,a,b,c
1,d,x,y
2,g,h,i
And if the value of A is j, B is x, and C is y, the CSV should be updated to:
,A,B,C
0,a,b,c
1,d,e,f
2,g,h,i
3,j,x,y
So, this is what I came up with so far, but I think I just don’t know how to make this better:
def save_data(a, b, c):
data = {'A': a, 'B': b, 'C': c}
try:
df = pd.read_csv('my_data.csv')
saved_data = df.to_dict(orient='list')
if a in saved_data['A']:
position = saved_data['A'].index(a)
saved_data['B'][position] = b
saved_data['C'][position] = c
else:
first_key = next(iter(saved_data))
saved_data.pop(first_key)
for k, v in data.items():
saved_data[k].append(v)
df = df.from_dict(saved_data)
df.to_csv('my_data.csv')
except FileNotFoundError:
df = pd.DataFrame([v for _, v in data.items()], columns=['A', 'B', 'C'])
df.to_csv('my_data.csv')
I tried finding more optimal solutions to replace a full row on a DataFrame, but couldn’t find something quite like I need to do. I am wondering if I am overdoing it here or if there’s any more efficient way to tackle this problem.
Thank you for your help!
Concise solution using combine_first
to update/append the values
def update_data(a, b, c):
df = pd.DataFrame([{'A': a, 'B': b, 'C': c}])
df = df.set_index('A')
if os.path.exists('data.csv'):
old_df = pd.read_csv('data.csv', index_col=['A'])
df = df.combine_first(old_df)
df.to_csv('data.csv')
Sample run
update_data(1, 2, 3)
# A,B,C
# 1,2,3
update_data(4, 5, 6)
# A,B,C
# 1,2,3
# 4,5,6
update_data(1, 25, 100)
# A,B,C
# 1,25,100
# 4,5,6
update_data(7, 8, 9)
# A,B,C
# 1,25,100
# 4,5,6
# 7,8,9