Create multiple files from unique values of a column using inbuilt libraries of python

Question:

I started learning python and was wondering if there was a way to create multiple files from unique values of a column. I know there are 100’s of ways of getting it done through pandas. But I am looking to have it done through inbuilt libraries. I couldn’t find a single example where its done through inbuilt libraries.

Here is the sample csv file data:

uniquevalue|count
a|123
b|345
c|567
d|789
a|123
b|345
c|567

Sample output file:

a.csv
    uniquevalue|count
    a|123
    a|123

b.csv
    b|345
    b|345

I am struggling with looping on unique values in a column and then print them out. Can someone explain with logic how to do it ? That will be much appreciated. Thanks.

Asked By: BigFerry

||

Answers:

import csv
with open('sample.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        with open(f"{row[0]}.csv", 'a') as inner:
            writer = csv.writer(
                inner, delimiter='|',
                fieldnames=('uniquevalue', 'count')
            )
            writer.writerow(row)
            

Answered By: Wonhyeong Seo
import csv
from collections import defaultdict

header = []
data = defaultdict(list)

DELIMITER = "|"

with open("inputfile.csv", newline="") as csvfile:
    reader = csv.reader(csvfile, delimiter=DELIMITER)

    for i, row in enumerate(reader):
        if i == 0:
            header = row
        else:
            key = row[0]
            data[key].append(row)


for key, value in data.items():
    filename = f"{key}.csv"
    with open(filename, "w", newline="") as f:
        writer = csv.writer(f, delimiter=DELIMITER)

        rows = [header] + value
        writer.writerows(rows)
Answered By: crissal

the task can also be done without using csv module. the lines of the file are read, and with read_file.read().splitlines()[1:] the newline characters are stripped off, also skipping the header line of the csv file. with a set a unique collection of inputdata is created, that is used to count number of duplicates and to create the output files.

with open("unique_sample.csv", "r") as read_file:
    items = read_file.read().splitlines()[1:]
    for line in set(items):    
        with open(line[:line.index('|')] + '.csv', 'w') as output:
            output.write((line + 'n') * items.count(line))
Answered By: lroth
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.