Issues with pandas and writing to csv file

Question:

I am having an issue with pandas and writing to CSV file. When I run the python scripts I either run out of memory or my computer starts running slow after script is done running. Is there any way to chunk up the data in pieces and write the chunks to CSV? I am bit new to programing in Python.

import itertools, hashlib, pandas as pd,time
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
rows = []
for combination in itertools.combinations_with_replacement(chars, 10):
        for A in numbers_list:
            pure = str(A) + ':' + str(combination) 
            B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "") 
            C = hashlib.sha256(B.encode('utf-8')).hexdigest()
            rows.append([A , B, C])
t0 = time.time()
df = pd.DataFrame(data=rows, columns=['A', 'B', 'C'])
df.to_csv('data.csv', index=False)
tdelta = time.time() - t0
print(tdelta)

I would be really appreciative the help! Thank you!

Asked By: Juan Soto Valdez

||

Answers:

Since you are only using the dataframe to write to a file, skip it completely. You build the full data set into memory in a python list and then again in the dataframe, needlessly eating RAM. The csv module in the standard lib lets you write line by line.

import itertools, hashlib, time, csv
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
with open('test.csv', 'w', newline='') as fileobj:
    writer = csv.writer(fileobj)
    for combination in itertools.combinations_with_replacement(chars, 10):
        for A in numbers_list:
            pure = str(A) + ':' + str(combination) 
            B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "") 
            C = hashlib.sha256(B.encode('utf-8')).hexdigest()
            writer.writerow([A , B, C])

This will go fast until you’ve filled up the RAM cache that fronts your storage, and then will go at whatever speed the OS can get data to disk.

Answered By: tdelaney
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.