Read a CSV two rows at a time, combine values, write to new CSV

Question:

How do I read an input CSV two rows at a time, combine some values into a new single row, and then write that row to a new CSV?

In the input below I want to read two rows, take price1 from the second row and make it price2 in a new, combined, row, then repeat for the next two rows:

input

date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03

new csv

date, name, qt, price1, price2
9/12/22, AB, 2, 5.00, 5.08
9/12/22, BC, 1, 2.00, 2.03
import csv

data = []

with open('test.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        data.append({'date': row[0],  'qt': row[3], 'name': row[5], 'price':  row[10]})

 #data.append(myClass(row[0], row[2], row[3],  row[5],  row[10]))

for x in data:
    print(x)

Adrian’s Answer is missing:

  • import csv
  • #newline requires python 3
  • csv must be UTF-8 and headers must match
Asked By: rrrfusco

||

Answers:

cat price_old.csv                                                                                                                                                         
date,name,qt,price1
9/12/22,AB,2,5.00
9/12/22,AB,2,5.08
9/12/22,BC,1,2.00
9/12/22,BC,1,2.03 

with open("price_old.csv") as old_csv:
    new_list = []
    dt, nm = None, None
    c_reader = csv.DictReader(old_csv)
    for row in c_reader:
        if row['date'] != dt and row['name'] != nm:
            dt, nm = row['date'], row['name']
            price_list = [row['price1']]
        else:
            price_list.append(row['price1'])
            row.update({'price1': price_list[0], 'price2': price_list[1]})
            new_list.append(row)
            price_list = []
            dt, nm = None, None
    with open('price_new.csv', 'w', newline='') as new_csv:
        c_writer = csv.DictWriter(new_csv, fieldnames= ['date', 'name','qt', 'price1', 'price2'])
        c_writer.writeheader()
        c_writer.writerows(new_list)

cat price_new.csv

date,name,qt,price1,price2
9/12/22,AB,2,5.00,5.08
9/12/22,BC,1,2.00,2.03 

Answered By: Adrian Klaver

Using pandas will make your life easier.

import pandas as pd

df = pd.read_csv('test.csv', delimiter=', ')

# Group by the first 3 columns, and use the rows to form the columns (unstack)
df = df.groupby(['date','name','qt'])['price1'].apply(
    lambda df: df.reset_index(drop=True)).unstack().reset_index()

# Rename the columns
df.columns = [*df.columns[:-2], 'price1', 'price2']

df.to_csv('output.csv', sep=',', index=False)

The output.csv is:

date,name,qt,price1,price2
9/12/22,AB,2,5.0,5.08
9/12/22,BC,1,2.0,2.03
Answered By: Timmy Chan

Pythons csv readers can read any number of rows at a time with ease.

The readers (both csv.reader and csv.DictReader) are iterators, which means we can use the next() function to manually get a row, just like we can use elem = next(my_iterator) to get the next element in any iterator:

import csv
import pprint

new_rows = []
with open("input.csv", newline="") as f:
    reader = csv.reader(f)

    header = next(reader)
    new_rows.append(header + [" price 2"])

    row1 = next(reader)
    row2 = next(reader)
    new_rows.append(row1 + row2[:-1])

    row1 = next(reader)
    row2 = next(reader)
    new_rows.append(row1 + row2[:-1])

pprint.pprint(new_rows)

Gives us:

[
    ['date',    ' name', ' qt', ' price1', ' price 2'],
    ['9/12/22',   ' AB',  ' 2',   ' 5.00',    ' 5.08'],
    ['9/12/22',   ' BC',  ' 1',   ' 2.00',    ' 2.03']
]

That approach can be adapted to using a for-loop to drive the iteration. The for-loop will always give us the "first row" in the current group of rows we want. Inside the loop we ask for as many of the next rows as are in the group:

with open("input.csv", newline="") as f:
    reader = csv.reader(f)

    header = next(reader)
    new_rows.append(header + [" price 2"])

    for row1 in reader:
        row2 = next(reader)
        new_rows.append(row1 + row2[-1:])

This approach assumes your input has a header, and then after that the row count is an even multiple of the group size. If your input doesn’t conform to that, you’ll get a StopIteration exception, that’ll look something like (in 3.11):

  File "/Users/zyoung/develop/StackOverflow/./main.py", line 31, in <module>
    row2 = next(reader)
           ^^^^^^^^^^^^
StopIteration

If that’s a possibility, you can add exception handling:

for row1 in reader:
    try:
        row2 = next(reader)
    except StopIteration as e:
        print(f"encountered an odd row on line {reader.line_num}; stopping reading, moving on to writing")
        break

Finally, we can extend this idea to any number of rows in a logical group, like 3 rows per group:

date, name, qt, price1
9/12/22, AB, 2, 5.00
9/12/22, AB, 2, 5.08
9/12/22, AB, 2, 5.12
9/12/22, BC, 1, 2.00
9/12/22, BC, 1, 2.03
9/12/22, BC, 1, 2.06
header = next(reader)
new_rows.append(header + [" price 2", " price 3"])

for row1 in reader:
    row2 = next(reader)
    row3 = next(reader)

    new_rows.append(row1 + row2[-1:] + row3[-1:])
Answered By: Zach Young
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.