sort csv by column
Question:
I want to sort a CSV table by date. Started out being a simple task:
import sys
import csv
reader = csv.reader(open("files.csv"), delimiter=";")
for id, path, title, date, author, platform, type, port in reader:
print date
I used Python’s CSV module to read in a file with that structure:
id;file;description;date;author;platform;type;port
- The date is ISO-8601, therefore I can sort it quite easily without parsing: 2003-04-22 e. g.
- I want to sort the by date, newest entries first
- How do I get this reader into a sortable data-structure? I think with some effort I could make a datelist: datelist += date, split and sort. However I have to re-identify the complete entry in the CSV table. It’s not just sorting a list of things.
- csv doesn’t seem to have a built in sorting function
The optimal solution would be to have a CSV client that handles the file like a database. I didn’t find anything like that.
I hope somebody knows some nice sorting magic here 😉
Answers:
Since 'date'
in column has index 3,
import operator
sortedlist = sorted(reader, key=operator.itemgetter(3), reverse=True)
or use lambda
sortedlist = sorted(reader, key=lambda row: row[3], reverse=True)
The reader acts like a generator. On a file with some fake data:
>>> import sys, csv
>>> data = csv.reader(open('data.csv'),delimiter=';')
>>> data
<_csv.reader object at 0x1004a11a0>
>>> data.next()
['a', ' b', ' c']
>>> data.next()
['x', ' y', ' z']
>>> data.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Using operator.itemgetter as Ignacio suggests:
>>> data = csv.reader(open('data.csv'),delimiter=';')
>>> import operator
>>> sortedlist = sorted(data, key=operator.itemgetter(2), reverse=True)
>>> sortedlist
[['x', ' y', ' z'], ['a', ' b', ' c']]
To sort by MULTIPLE COLUMN (Sort by column_1
, and then sort by column_2
)
with open('unsorted.csv',newline='') as csvfile:
spamreader = csv.DictReader(csvfile, delimiter=";")
sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)
with open('sorted.csv', 'w') as f:
fieldnames = ['column_1', 'column_2', column_3]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in sortedlist:
writer.writerow(row)
for sorting csv by column, i would use something like this
import pandas
csvData = pandas.read_csv('myfile.csv')
csvData.sort_values(["date"], axis=0, ascending=[False], inplace=True)
print(csvData)
you can do it with pandas and its easy
import pandas as pd
df = pd.read_csv("File.csv")
sorted_df = df.sort_values(by=["price","title",...], ascending=False)
sorted_df.to_csv('homes_sorted.csv', index=False)
the .sort_values method returns a new dataframe, so make sure to assign this to a new variable.
Combining the answers given by Ignacio Vazquez-Abram and by Tiina:
fieldnames = [ 'id', 'path', 'title', 'date', 'author', 'platform', 'type', 'port' ]
# this means: order by 'id', 'path', ..., 'port'
items = ('id', 'path', 'title', 'date', 'author', 'platform', 'type', 'port')
with open('unsorted.csv',newline='') as csvfile:
spamreader = csv.DictReader(csvfile, delimiter=";")
import operator
sortedlist = sorted(reader, key=operator.itemgetter(*items), reverse=True)
with open('sorted.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in sortedlist:
writer.writerow(row)
With this, you can
1-order the rows by multiple columns.
2-change the number of columns you want to order the rows by, without having to use the lambda expression
sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)
and specially, without having to add and remove the columns patterns inside the lambda expression, in case in the future you want to order other csv files by a different columns order.
e.g.
items = ('path', 'title')
items = ('id', 'path', 'title', 'date')
items = ('author', 'date', 'title')
instead of
sortedlist = sorted(spamreader, key=lambda row:(row['column_2'],row['column_3']), reverse=False)
sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2'],,row['column_3'],row['column_4']), reverse=False)
sortedlist = sorted(spamreader, key=lambda row:(row['column_5'],row['column_4'],row['column_3']), reverse=False)
I want to sort a CSV table by date. Started out being a simple task:
import sys
import csv
reader = csv.reader(open("files.csv"), delimiter=";")
for id, path, title, date, author, platform, type, port in reader:
print date
I used Python’s CSV module to read in a file with that structure:
id;file;description;date;author;platform;type;port
- The date is ISO-8601, therefore I can sort it quite easily without parsing: 2003-04-22 e. g.
- I want to sort the by date, newest entries first
- How do I get this reader into a sortable data-structure? I think with some effort I could make a datelist: datelist += date, split and sort. However I have to re-identify the complete entry in the CSV table. It’s not just sorting a list of things.
- csv doesn’t seem to have a built in sorting function
The optimal solution would be to have a CSV client that handles the file like a database. I didn’t find anything like that.
I hope somebody knows some nice sorting magic here 😉
Since 'date'
in column has index 3,
import operator
sortedlist = sorted(reader, key=operator.itemgetter(3), reverse=True)
or use lambda
sortedlist = sorted(reader, key=lambda row: row[3], reverse=True)
The reader acts like a generator. On a file with some fake data:
>>> import sys, csv
>>> data = csv.reader(open('data.csv'),delimiter=';')
>>> data
<_csv.reader object at 0x1004a11a0>
>>> data.next()
['a', ' b', ' c']
>>> data.next()
['x', ' y', ' z']
>>> data.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Using operator.itemgetter as Ignacio suggests:
>>> data = csv.reader(open('data.csv'),delimiter=';')
>>> import operator
>>> sortedlist = sorted(data, key=operator.itemgetter(2), reverse=True)
>>> sortedlist
[['x', ' y', ' z'], ['a', ' b', ' c']]
To sort by MULTIPLE COLUMN (Sort by column_1
, and then sort by column_2
)
with open('unsorted.csv',newline='') as csvfile:
spamreader = csv.DictReader(csvfile, delimiter=";")
sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)
with open('sorted.csv', 'w') as f:
fieldnames = ['column_1', 'column_2', column_3]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in sortedlist:
writer.writerow(row)
for sorting csv by column, i would use something like this
import pandas
csvData = pandas.read_csv('myfile.csv')
csvData.sort_values(["date"], axis=0, ascending=[False], inplace=True)
print(csvData)
you can do it with pandas and its easy
import pandas as pd
df = pd.read_csv("File.csv")
sorted_df = df.sort_values(by=["price","title",...], ascending=False)
sorted_df.to_csv('homes_sorted.csv', index=False)
the .sort_values method returns a new dataframe, so make sure to assign this to a new variable.
Combining the answers given by Ignacio Vazquez-Abram and by Tiina:
fieldnames = [ 'id', 'path', 'title', 'date', 'author', 'platform', 'type', 'port' ]
# this means: order by 'id', 'path', ..., 'port'
items = ('id', 'path', 'title', 'date', 'author', 'platform', 'type', 'port')
with open('unsorted.csv',newline='') as csvfile:
spamreader = csv.DictReader(csvfile, delimiter=";")
import operator
sortedlist = sorted(reader, key=operator.itemgetter(*items), reverse=True)
with open('sorted.csv', 'w') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in sortedlist:
writer.writerow(row)
With this, you can
1-order the rows by multiple columns.
2-change the number of columns you want to order the rows by, without having to use the lambda expression
sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)
and specially, without having to add and remove the columns patterns inside the lambda expression, in case in the future you want to order other csv files by a different columns order.
e.g.
items = ('path', 'title')
items = ('id', 'path', 'title', 'date')
items = ('author', 'date', 'title')
instead of
sortedlist = sorted(spamreader, key=lambda row:(row['column_2'],row['column_3']), reverse=False)
sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2'],,row['column_3'],row['column_4']), reverse=False)
sortedlist = sorted(spamreader, key=lambda row:(row['column_5'],row['column_4'],row['column_3']), reverse=False)