How do I read and write CSV files with Python?
Question:
How do I read the following CSV file?
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3
How do I write the following data
to a CSV file?
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]
Answers:
Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.
Python 3: Reading a CSV file
Pure Python
import csv
# Define data
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]
# Write CSV file
with open("test.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(data)
# Read CSV file
with open("test.csv") as fp:
reader = csv.reader(fp, delimiter=",", quotechar='"')
# next(reader, None) # skip the headers
data_read = [row for row in reader]
print(data_read)
After that, the contents of data_read
are
[['1', 'A towel,', '1.0'],
['42', ' it says, ', '2.0'],
['1337', 'is about the most ', '-1'],
['0', 'massively useful thing ', '123'],
['-2', 'an interstellar hitchhiker can have.', '3']]
Please note that CSV reads only strings. You need to convert to the column types manually.
A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.
Related
- How do I write data into csv format as string (not file)?
- How can I use io.StringIO() with the csv module?: This is interesting if you want to serve a CSV on-the-fly with Flask, without actually storing the CSV on the server.
mpu
Have a look at my utility package mpu
for a super simple and easy to remember one:
import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)
Pandas
import pandas as pd
# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)
# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]
# or export it as a list of dicts
dicts = df.to_dict().values()
See read_csv
docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.
If you haven’t heard of Seaborn, I recommend having a look at it.
Other
Reading CSV files is supported by a bunch of other libraries, for example:
Created CSV file
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3
Common file endings
.csv
Working with the data
After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.
Alternatives
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- CSV: Super simple format (read & write)
- YAML: Nice to read, similar to JSON (read & write)
- pickle: A Python serialization format (read & write)
- MessagePack (Python package): More compact representation (read & write)
- HDF5 (Python package): Nice for matrices (read & write)
- XML: exists too *sigh* (read & write)
For your application, the following might be important:
- Support by other programming languages
- Reading / writing performance
- Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Writing a CSV file
First you need to import csv
For example:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
import csv
with open(fileLocation+'example.csv',newline='') as File: #the csv file is stored in a File object
reader=csv.reader(File) #csv.reader is used to read a file
for row in reader:
print(row)
To read a csv file using Pandas
use pd.read_csv("D:\sample.csv")
using only python :
fopen=open("D:\sample.csv","r")
print(fopen.read())
To create and write into a csv file
The below example demonstrate creating and writing a csv file. to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference Ex:
with open("D:sample.csv","w",newline="") as file_writer
Here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and w
represents write, if you want to read a file then replace w
with r
or to append to existing file then a
.
newline=""
specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline=""
, create some field names(column names) using list like:
fields=["Names","Age","Class"]
Then apply to writer instance like:
writer=csv.DictWriter(file_writer,fieldnames=fields)
Here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader()
and to write values we use writer.writerow({"Names":"John","Age":20,"Class":"12A"})
,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value.
Import csv:
with open("D:sample.csv","w",newline="") as file_writer:
fields=["Names","Age","Class"]
writer=csv.DictWriter(file_writer,fieldnames=fields)
writer.writeheader()
writer.writerow({"Names":"John","Age":21,"Class":"12A"})
If needed- read a csv file without using the csv module:
rows = []
with open('test.csv') as f:
for line in f:
# strip whitespace
line = line.strip()
# separate the columns
line = line.split(',')
# save the line for use later
rows.append(line)
If you are working with CSV data and want a solution with a smaller footprint than pandas
, you can try my package, littletable. Can be pip-installed, or just dropped in as a single .py file with your own code, so very portable and suitable for serverless apps.
Reading CSV data is as simple as calling csv_import
:
data = """
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3"""
import littletable as lt
tbl = lt.Table().csv_import(data, fieldnames="number1,words,number2".split(','))
tbl.present()
Prints:
Number1 Words Number2
──────────────────────────────────────────────────────────
1 A towel, 1.0
42 it says, 2.0
1337 is about the most -1
0 massively useful thing 123
-2 an interstellar hitchhiker can have. 3
(littletable
uses the rich module for presenting Tables.)
littletable
doesn’t automatically try to convert numeric data, so a numeric transform function is needed for the numeric columns.
def get_numeric(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
return s
tbl = lt.Table().csv_import(
data,
fieldnames="number1,words,number2".split(','),
transforms={}.fromkeys("number1 number2".split(), get_numeric)
)
tbl.present()
This gives:
Number1 Words Number2
──────────────────────────────────────────────────────────
1 A towel, 1.0
42 it says, 2.0
1337 is about the most -1
0 massively useful thing 123
-2 an interstellar hitchhiker can have. 3
The numeric columns are right-justified instead of left-justified.
littletable
also has other ORM-ish features, such as indexing, joining, pivoting, and full-text search. Here is a table of statistics on the numeric columns:
tbl.stats("number1 number2".split()).present()
Name Mean Min Max Variance Std_Dev Count Missing
────────────────────────────────────────────────────────────────────────────────
number1 275.6 -2 1337 352390.3 593.6247130974249 5 0
number2 25.6 -1 123 2966.8 54.468339427597755 5 0
or transposed:
tbl.stats("number1 number2".split(), by_field=False).present()
Stat Number1 Number2
───────────────────────────────────────────────────
mean 275.6 25.6
min -2 -1
max 1337 123
variance 352390.3 2966.8
std_dev 593.6247130974249 54.468339427597755
count 5 5
missing 0 0
Other formats can be output too, such as Markdown:
print(tbl.stats("number1 number2".split(), by_field=False).as_markdown())
| stat | number1 | number2 |
|---|---:|---:|
| mean | 275.6 | 25.6 |
| min | -2 | -1 |
| max | 1337 | 123 |
| variance | 352390.3 | 2966.8 |
| std_dev | 593.6247130974249 | 54.468339427597755 |
| count | 5 | 5 |
| missing | 0 | 0 |
Which would render from Markdown as
stat
number1
number2
mean
275.6
25.6
min
-2
-1
max
1337
123
variance
352390.3
2966.8
std_dev
593.6247130974249
54.468339427597755
count
5
5
missing
0
0
Lastly, here is a text search on the words for any entry with the word "hitchhiker":
tbl.create_search_index("words")
for match, score in tbl.search.words("hitchhiker"):
print(match)
Prints:
namespace(number1=-2, words=’an interstellar hitchhiker can have.’, number2=3)
I wrote a question asking something similar. So, to keep everything in one place, here are my 2 cents for a very quick & dirty solution.
This code is meant to read from one CSV file and write to another. The format of the output row is fixed, I could have used csv.write with the correct delimiter, but in this case I would have to do extra work to specify the blanks. But it works nicely to show how to output text using the good old print() function:
#! /usr/bin/env python3
def main():
parser = argparse.ArgumentParser(
description='',
usage="""myparser [-h] print this help and exit
""",
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument('-f', '--file',help='CSV input file', required=True)
args = parser.parse_args()
with open("output.file", "w") as outfile:
with open(args.file) as csvfile:
csv_reader = csv.reader(csvfile, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
line_count += 1
elif args.archive:
print(f'no:{row[0]}:{row[1]}::0:0:0:/bin/bash:0:0:{row[2]}:{row[3]}:{row[4]}:archive', file=outfile)
line_count += 1
return sys.exit(EXIT_SUCCESS)
if __name__ == '__main__':
main()
Sorry for the indentation.
This code opens a CSV file for reading and uses the print() function to write a formatted string that will look like:
no:Xoero:ToelAs:xtoelas:0:0:0:/bin/bash:0:0:[email protected]:00311234567890:nl:archive
How do I read the following CSV file?
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3
How do I write the following data
to a CSV file?
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]
Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.
Python 3: Reading a CSV file
Pure Python
import csv
# Define data
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]
# Write CSV file
with open("test.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(data)
# Read CSV file
with open("test.csv") as fp:
reader = csv.reader(fp, delimiter=",", quotechar='"')
# next(reader, None) # skip the headers
data_read = [row for row in reader]
print(data_read)
After that, the contents of data_read
are
[['1', 'A towel,', '1.0'],
['42', ' it says, ', '2.0'],
['1337', 'is about the most ', '-1'],
['0', 'massively useful thing ', '123'],
['-2', 'an interstellar hitchhiker can have.', '3']]
Please note that CSV reads only strings. You need to convert to the column types manually.
A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.
Related
- How do I write data into csv format as string (not file)?
- How can I use io.StringIO() with the csv module?: This is interesting if you want to serve a CSV on-the-fly with Flask, without actually storing the CSV on the server.
mpu
Have a look at my utility package mpu
for a super simple and easy to remember one:
import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)
Pandas
import pandas as pd
# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)
# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]
# or export it as a list of dicts
dicts = df.to_dict().values()
See read_csv
docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.
If you haven’t heard of Seaborn, I recommend having a look at it.
Other
Reading CSV files is supported by a bunch of other libraries, for example:
Created CSV file
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3
Common file endings
.csv
Working with the data
After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.
Alternatives
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- CSV: Super simple format (read & write)
- YAML: Nice to read, similar to JSON (read & write)
- pickle: A Python serialization format (read & write)
- MessagePack (Python package): More compact representation (read & write)
- HDF5 (Python package): Nice for matrices (read & write)
- XML: exists too *sigh* (read & write)
For your application, the following might be important:
- Support by other programming languages
- Reading / writing performance
- Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Writing a CSV file
First you need to import csv
For example:
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
import csv
with open(fileLocation+'example.csv',newline='') as File: #the csv file is stored in a File object
reader=csv.reader(File) #csv.reader is used to read a file
for row in reader:
print(row)
To read a csv file using Pandas
use pd.read_csv("D:\sample.csv")
using only python :
fopen=open("D:\sample.csv","r")
print(fopen.read())
To create and write into a csv file
The below example demonstrate creating and writing a csv file. to make a dynamic file writer we need to import a package import csv, then need to create an instance of the file with file reference Ex:
with open("D:sample.csv","w",newline="") as file_writer
Here if the file does not exist with the mentioned file directory then python will create a same file in the specified directory, and w
represents write, if you want to read a file then replace w
with r
or to append to existing file then a
.
newline=""
specifies that it removes an extra empty row for every time you create row so to eliminate empty row we use newline=""
, create some field names(column names) using list like:
fields=["Names","Age","Class"]
Then apply to writer instance like:
writer=csv.DictWriter(file_writer,fieldnames=fields)
Here using Dictionary writer and assigning column names, to write column names to csv we use writer.writeheader()
and to write values we use writer.writerow({"Names":"John","Age":20,"Class":"12A"})
,while writing file values must be passed using dictionary method , here the key is column name and value is your respective key value.
Import csv:
with open("D:sample.csv","w",newline="") as file_writer:
fields=["Names","Age","Class"]
writer=csv.DictWriter(file_writer,fieldnames=fields)
writer.writeheader()
writer.writerow({"Names":"John","Age":21,"Class":"12A"})
If needed- read a csv file without using the csv module:
rows = []
with open('test.csv') as f:
for line in f:
# strip whitespace
line = line.strip()
# separate the columns
line = line.split(',')
# save the line for use later
rows.append(line)
If you are working with CSV data and want a solution with a smaller footprint than pandas
, you can try my package, littletable. Can be pip-installed, or just dropped in as a single .py file with your own code, so very portable and suitable for serverless apps.
Reading CSV data is as simple as calling csv_import
:
data = """
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3"""
import littletable as lt
tbl = lt.Table().csv_import(data, fieldnames="number1,words,number2".split(','))
tbl.present()
Prints:
Number1 Words Number2
──────────────────────────────────────────────────────────
1 A towel, 1.0
42 it says, 2.0
1337 is about the most -1
0 massively useful thing 123
-2 an interstellar hitchhiker can have. 3
(littletable
uses the rich module for presenting Tables.)
littletable
doesn’t automatically try to convert numeric data, so a numeric transform function is needed for the numeric columns.
def get_numeric(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
return s
tbl = lt.Table().csv_import(
data,
fieldnames="number1,words,number2".split(','),
transforms={}.fromkeys("number1 number2".split(), get_numeric)
)
tbl.present()
This gives:
Number1 Words Number2
──────────────────────────────────────────────────────────
1 A towel, 1.0
42 it says, 2.0
1337 is about the most -1
0 massively useful thing 123
-2 an interstellar hitchhiker can have. 3
The numeric columns are right-justified instead of left-justified.
littletable
also has other ORM-ish features, such as indexing, joining, pivoting, and full-text search. Here is a table of statistics on the numeric columns:
tbl.stats("number1 number2".split()).present()
Name Mean Min Max Variance Std_Dev Count Missing
────────────────────────────────────────────────────────────────────────────────
number1 275.6 -2 1337 352390.3 593.6247130974249 5 0
number2 25.6 -1 123 2966.8 54.468339427597755 5 0
or transposed:
tbl.stats("number1 number2".split(), by_field=False).present()
Stat Number1 Number2
───────────────────────────────────────────────────
mean 275.6 25.6
min -2 -1
max 1337 123
variance 352390.3 2966.8
std_dev 593.6247130974249 54.468339427597755
count 5 5
missing 0 0
Other formats can be output too, such as Markdown:
print(tbl.stats("number1 number2".split(), by_field=False).as_markdown())
| stat | number1 | number2 |
|---|---:|---:|
| mean | 275.6 | 25.6 |
| min | -2 | -1 |
| max | 1337 | 123 |
| variance | 352390.3 | 2966.8 |
| std_dev | 593.6247130974249 | 54.468339427597755 |
| count | 5 | 5 |
| missing | 0 | 0 |
Which would render from Markdown as
stat | number1 | number2 |
---|---|---|
mean | 275.6 | 25.6 |
min | -2 | -1 |
max | 1337 | 123 |
variance | 352390.3 | 2966.8 |
std_dev | 593.6247130974249 | 54.468339427597755 |
count | 5 | 5 |
missing | 0 | 0 |
Lastly, here is a text search on the words for any entry with the word "hitchhiker":
tbl.create_search_index("words")
for match, score in tbl.search.words("hitchhiker"):
print(match)
Prints:
namespace(number1=-2, words=’an interstellar hitchhiker can have.’, number2=3)
I wrote a question asking something similar. So, to keep everything in one place, here are my 2 cents for a very quick & dirty solution.
This code is meant to read from one CSV file and write to another. The format of the output row is fixed, I could have used csv.write with the correct delimiter, but in this case I would have to do extra work to specify the blanks. But it works nicely to show how to output text using the good old print() function:
#! /usr/bin/env python3
def main():
parser = argparse.ArgumentParser(
description='',
usage="""myparser [-h] print this help and exit
""",
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument('-f', '--file',help='CSV input file', required=True)
args = parser.parse_args()
with open("output.file", "w") as outfile:
with open(args.file) as csvfile:
csv_reader = csv.reader(csvfile, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
line_count += 1
elif args.archive:
print(f'no:{row[0]}:{row[1]}::0:0:0:/bin/bash:0:0:{row[2]}:{row[3]}:{row[4]}:archive', file=outfile)
line_count += 1
return sys.exit(EXIT_SUCCESS)
if __name__ == '__main__':
main()
Sorry for the indentation.
This code opens a CSV file for reading and uses the print() function to write a formatted string that will look like:
no:Xoero:ToelAs:xtoelas:0:0:0:/bin/bash:0:0:[email protected]:00311234567890:nl:archive