Summing numbers in two diffrent .txt file in Python

Question

I am currently trying to sum two .txt files containing each other over 35 millions value and put the result in a third file.

File 1 :

File 2 :

1.483429484776452
2.2403221757269196
1.101004844694236
1.6119626937837102

File 3 :

Any idea to do that with python ?

Asked By: Nicolas Guibal

||

Source

Answer 1

You can use numpy for speed. It will be much faster than pure python. Numpy uses C/C++ for a lot of it’s operations.

import numpy
import os

path = os.path.dirname(os.path.realpath(__file__))

file_name_1 = path + '/values_1.txt'
file_name_2 = path + '/values_2.txt'

a = numpy.loadtxt(file_name_1, dtype=float)
b = numpy.loadtxt(file_name_2, dtype=float)
c = a + b
precision = 10
numpy.savetxt(path + '/sum.txt', c, fmt=f'%-.{precision}f')

This assumes your .txt files are located where your python script is located.

Answered By: alvrm

Answer 2

You can use pandas.read_csv to read, sum, and then write chunks of your file.
Presumably all 35 million records do not stay in memory. You need to read the file by chunk. In this way you read one chunk at a time, and load into memory only one chunk (2 actually one for file1 and one for file2), do the sum and write into memory one chunk at a time in append mode on file3.

In this dummy example I put as chunksize=2, because doing tests on your inputs that are 4 long. It depends on the server you are working on, do some tests and see what is the best size for your problem (50k, 100k, 500k, 1kk etc).

import pandas as pd

chunksize = 2

with pd.read_csv("file1.txt", chunksize=chunksize, header=None) as reader1, pd.read_csv("file2.txt", chunksize=chunksize, header=None) as reader2:
    for chunk1, chunk2 in zip(reader1, reader2):
        (chunk1 + chunk2).to_csv("file3.txt", index=False, header=False, mode='a')

Answered By: Massifox

Summing numbers in two diffrent .txt file in Python

Question:

Answers: