What is the pythonic way to read CSV file data as rows of namedtuples?

Question:

What is the best way to take a data file that contains a header row and read this row into a named tuple so that the data rows can be accessed by header name?

I was attempting something like this:

import csv
from collections import namedtuple

with open('data_file.txt', mode="r") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", ", ".join(i for i in reader[0]))
    next(reader)
    for row in reader:
        data = Data(*row)

The reader object is not subscriptable, so the above code throws a TypeError. What is the pythonic way to reader a file header into a namedtuple?

Asked By: drbunsen

||

Answers:

Use:

Data = namedtuple("Data", next(reader))

and omit the line:

next(reader)

Combining this with an iterative version based on martineau’s comment below, the example becomes for Python 2

import csv
from collections import namedtuple
from itertools import imap

with open("data_file.txt", mode="rb") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in imap(Data._make, reader):
        print data.foo
        # ...further processing of a line...

and for Python 3

import csv
from collections import namedtuple

with open("data_file.txt", newline="") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in map(Data._make, reader):
        print(data.foo)
        # ...further processing of a line...
Answered By: Sven Marnach

Please have a look at csv.DictReader. Basically, it provides the ability to get the column names from the first row as you’re looking for and, after that, lets you access to each column in a row by name using a dictionary.

If for some reason you still need to access the rows as a collections.namedtuple, it should be easy to transform the dictionaries to named tuples as follows:

with open('data_file.txt') as infile:
    reader = csv.DictReader(infile)
    Data = collections.namedtuple('Data', reader.fieldnames)
    tuples = [Data(**row) for row in reader]
Answered By: jcollado

I’d suggest this approach:

import csv
from collections import namedtuple

with open("data.csv", 'r') as f:
        reader = csv.reader(f, delimiter=',')
        Row = namedtuple('Row', next(reader))
        rows = [Row(*line) for line in reader]

If you work with Pandas, the solution becomes even more elegant:

import pandas as pd
from collections import namedtuple

data = pd.read_csv("data.csv")
Row = namedtuple('Row', data.columns)
rows = [Row(*row) for index, row in data.iterrows()]

In both cases you can interact with the records by field names:

for row in rows:
    print(row.foo)
Answered By: Roman
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.