Change specific value in CSV file via Python
Question:
I need a way to change the specific value of a column of a CSV file. For example I have this CSV file:
"Ip","Sites"
"127.0.0.1",10
"127.0.0.2",23
"127.0.0.3",50
and I need to change the value 23 to 30 of the row “127.0.0.2”.
I use csv library: import csv
Answers:
You can’t really replace values in the existing file. Instead, you need to:
- read in existing file
- alter file in memory
- write out new file (overwriting existing file)
What you can also do is read in the existing file line by line, writing it out to a new file, while replacing values on the fly. When done, close both files, delete the original and rename the new file.
This is the solution opening the csv file, changing the values in memory and then writing back the changes to disk.
r = csv.reader(open('/tmp/test.csv')) # Here your csv file
lines = list(r)
Content of lines:
[['Ip', 'Sites'],
['127.0.0.1', '10'],
['127.0.0.2', '23'],
['127.0.0.3', '50']]
Modifying the values:
lines[2][1] = '30'
Content of lines:
[['Ip', 'Sites'],
['127.0.0.1', '10'],
['127.0.0.2', '30'],
['127.0.0.3', '50']]
Now we only have to write it back to a file
writer = csv.writer(open('/tmp/output.csv', 'w'))
writer.writerows(lines)
You can use very powerful library called pandas. Here is the example.
import pandas as pd
df = pd.read_csv("test.csv")
df.head(3) #prints 3 heading rows
Output:
Ip Sites
0 127.0.0.1 10
1 127.0.0.2 23
2 127.0.0.3 50
Now if you want to change the value in the ‘Sites’ column in the 1st row, run:
df.set_value(1, "Sites", 30)
If you want to change all the values, where ‘Ip’ is equal to 127.0.0.2, run:
df.loc[df["Ip"]=="127.0.0.2", "Sites"] = 30
Finally, to save the values:
df.to_csv("test.csv", index=False)
An alternative to the accepted answer is to:
- Use
fileinput
with inplace=True
to modify the file in-place
- Use
csv.DictReader
to access the column via header instead of indices (This only works if the CSV has headers)
Test CSV:
Ip,Sites
127.0.0.1,10
127.0.0.2,23
127.0.0.3,50
Test Code:
import fileinput
with fileinput.input(files=('test.csv'), inplace=True, mode='r') as f:
reader = csv.DictReader(f)
print(",".join(reader.fieldnames)) # print back the headers
for row in reader:
if row["Ip"] == "127.0.0.2":
row["Sites"] = "30"
print(",".join([row["Ip"], row["Sites"]]))
The main difference is that you don’t have to manually open the input file and create the output file, as inplace=True
already does that behind-the-scenes:
Optional in-place filtering: if the keyword argument inplace=True
is
passed to fileinput.input()
or to the FileInput
constructor, the file
is moved to a backup file and standard output is directed to the input
file (if a file of the same name as the backup file already exists, it
will be replaced silently). This makes it possible to write a filter
that rewrites its input file in place.
The loop goes over the CSV row-by-row (except for the header row), so you can do whatever processing you need on each row.
If you still want to retain the original, you could pass in a backup=".backup"
so that fileinput
creates a test.csv.backup file.
Also, note that with in-place editing, "standard output is directed to the input file", so print(..)
prints it out to the file instead of to the command line. If you want to actually print to the console, you need to specify stderr
as in print(..., file=sys.stderr)
.
As an SO noob, I can’t comment on @Gino Memphin’s alternative solution, which provides a dynamic approach to reading / overwriting specific values in a CSV file based on string literal comparisons. So I’ll post my (hopefully) pertinent question here:
Coming from a Postgres background, where the list of IP addresses might number in the tens of thousands, it seems one enhancement to Gino’s solution would be to invoke regex as a way to find a series of IP addresses that meet the updating criteria, rather than having to manually type in the string (e.g. "127.0.0.2")
To find single-digit fourth-octet occurrence, instead of using the string literal
if row["Ip"] == "127.0.0.2"
one could use import re at the outset and a properly formed regex (e.g., in Python regex, confirmed on RegEx101.com) with the idea that
r".d$"
will find IP addresses ending in .1, .2, etc but not .10, .11, etc.
However, and here’s the question. As a Python noob, I can’t figure out where to place the regex. Putting it here
if row["Ip"] == r".d$"
doesn’t work, nor does putting it in its own regex line
regex = r".d$"
and then calling it in the "if row" line, e.g.,
if row["Ip"] == regex
In reading various comments around Python and the apparent need to pre-compile regex (re.compile) I confirmed in IDLE that
regex = re.compile(r".d$")
properly compiles to
re.compile('\.\d$')
in IDLE and works which also works in standalone tests, but not in the filereader approach that Gino took. My final attempt was to again try
if row["Ip"] == regex
which also fails to convert the "23" to "30" in the same way the string literal from Gino.
Is there some special way to combine regex with fileinput in Python?
I need a way to change the specific value of a column of a CSV file. For example I have this CSV file:
"Ip","Sites"
"127.0.0.1",10
"127.0.0.2",23
"127.0.0.3",50
and I need to change the value 23 to 30 of the row “127.0.0.2”.
I use csv library: import csv
You can’t really replace values in the existing file. Instead, you need to:
- read in existing file
- alter file in memory
- write out new file (overwriting existing file)
What you can also do is read in the existing file line by line, writing it out to a new file, while replacing values on the fly. When done, close both files, delete the original and rename the new file.
This is the solution opening the csv file, changing the values in memory and then writing back the changes to disk.
r = csv.reader(open('/tmp/test.csv')) # Here your csv file
lines = list(r)
Content of lines:
[['Ip', 'Sites'],
['127.0.0.1', '10'],
['127.0.0.2', '23'],
['127.0.0.3', '50']]
Modifying the values:
lines[2][1] = '30'
Content of lines:
[['Ip', 'Sites'],
['127.0.0.1', '10'],
['127.0.0.2', '30'],
['127.0.0.3', '50']]
Now we only have to write it back to a file
writer = csv.writer(open('/tmp/output.csv', 'w'))
writer.writerows(lines)
You can use very powerful library called pandas. Here is the example.
import pandas as pd
df = pd.read_csv("test.csv")
df.head(3) #prints 3 heading rows
Output:
Ip Sites
0 127.0.0.1 10
1 127.0.0.2 23
2 127.0.0.3 50
Now if you want to change the value in the ‘Sites’ column in the 1st row, run:
df.set_value(1, "Sites", 30)
If you want to change all the values, where ‘Ip’ is equal to 127.0.0.2, run:
df.loc[df["Ip"]=="127.0.0.2", "Sites"] = 30
Finally, to save the values:
df.to_csv("test.csv", index=False)
An alternative to the accepted answer is to:
- Use
fileinput
withinplace=True
to modify the file in-place - Use
csv.DictReader
to access the column via header instead of indices (This only works if the CSV has headers)
Test CSV:
Ip,Sites
127.0.0.1,10
127.0.0.2,23
127.0.0.3,50
Test Code:
import fileinput
with fileinput.input(files=('test.csv'), inplace=True, mode='r') as f:
reader = csv.DictReader(f)
print(",".join(reader.fieldnames)) # print back the headers
for row in reader:
if row["Ip"] == "127.0.0.2":
row["Sites"] = "30"
print(",".join([row["Ip"], row["Sites"]]))
The main difference is that you don’t have to manually open the input file and create the output file, as inplace=True
already does that behind-the-scenes:
Optional in-place filtering: if the keyword argument
inplace=True
is
passed tofileinput.input()
or to theFileInput
constructor, the file
is moved to a backup file and standard output is directed to the input
file (if a file of the same name as the backup file already exists, it
will be replaced silently). This makes it possible to write a filter
that rewrites its input file in place.
The loop goes over the CSV row-by-row (except for the header row), so you can do whatever processing you need on each row.
If you still want to retain the original, you could pass in a backup=".backup"
so that fileinput
creates a test.csv.backup file.
Also, note that with in-place editing, "standard output is directed to the input file", so print(..)
prints it out to the file instead of to the command line. If you want to actually print to the console, you need to specify stderr
as in print(..., file=sys.stderr)
.
As an SO noob, I can’t comment on @Gino Memphin’s alternative solution, which provides a dynamic approach to reading / overwriting specific values in a CSV file based on string literal comparisons. So I’ll post my (hopefully) pertinent question here:
Coming from a Postgres background, where the list of IP addresses might number in the tens of thousands, it seems one enhancement to Gino’s solution would be to invoke regex as a way to find a series of IP addresses that meet the updating criteria, rather than having to manually type in the string (e.g. "127.0.0.2")
To find single-digit fourth-octet occurrence, instead of using the string literal
if row["Ip"] == "127.0.0.2"
one could use import re at the outset and a properly formed regex (e.g., in Python regex, confirmed on RegEx101.com) with the idea that
r".d$"
will find IP addresses ending in .1, .2, etc but not .10, .11, etc.
However, and here’s the question. As a Python noob, I can’t figure out where to place the regex. Putting it here
if row["Ip"] == r".d$"
doesn’t work, nor does putting it in its own regex line
regex = r".d$"
and then calling it in the "if row" line, e.g.,
if row["Ip"] == regex
In reading various comments around Python and the apparent need to pre-compile regex (re.compile) I confirmed in IDLE that
regex = re.compile(r".d$")
properly compiles to
re.compile('\.\d$')
in IDLE and works which also works in standalone tests, but not in the filereader approach that Gino took. My final attempt was to again try
if row["Ip"] == regex
which also fails to convert the "23" to "30" in the same way the string literal from Gino.
Is there some special way to combine regex with fileinput in Python?