How to store a tuple of floats into a file, open and read it to extract the mean per column?

Question:

I am computing the following scores iteratively which generates a new set of the following scores
in each iteration:

add_score, keep_score, del_score = get_corpus_sari_operation_scores(sources, prediction, references)

I first want to store them into a file, currently I add them as a tuple to a list and store the list (~9000 lines) in a file:

stat = add_score, keep_score, del_score
stats.append(stat)
f = open("./resources/outputs/generate/stats.txt", "w")
    for stat in stats:
        print('stat type', type(stat))
        f.write(stat)
        f.write("n")
    f.close()

the values in the stats.txt file look as follows:

(2.0, 28.25187646117879,  69.96132803170339) 
(0.0, 23.357228195937875, 50.342178147056195) 
(1.7241379310344827, 25.888065422949147, 40.21927767354597) 
(0.0, 47.375201191814064, 16.312725613543307) 
(1.7857142857142856, 14.565677966101696, 54.81682319618366) 
(0.0, 63.79656946826759, 9.200422070604626)

What i wanna do is to reaccess this data in another method and read from the file. My goal is to calculate the mean per colum, thus mean(add_score), mean(keep_score), mean(del_score).

However, the values of the file get accessed as tuples/Series.
I tried to convert the tuples into a dataframe to then use the mean() method per colum but I struggle with the conversion of the tuples to a dataframe.

Does anyone have a better idea on how to handle this data? I wondering if there is a better way to store all scoring results in one file and then calculate the mean per each column.

Asked By: resei09

||

Answers:

… struggle with the conversion of the tuples to a dataframe.

You are complaining that the file format is inconvenient.
So use the familiar CSV
format instead.

import csv

with open("resources/outputs/generate/stats.txt", "w") as f:
    sheet = csv.writer(f)
    sheet.writerow(('add', 'keep', 'del'))
    for stat in stats:
        sheet.writerow(stat)

Then later a simple df = pd.read_csv('stats.txt') should suffice.


Alternatively, assign df = pd.DataFrame(stats, columns=('add', 'keep', 'del')) and then df.write_csv('stats.txt') instead
of creating a CSV Writer or DictWriter.

Answered By: J_H

So in the end, the solution of J_H worked like a marvel. I implemented:

with open("resources/outputs/generate/stats.txt", "w", newline='') as f:
        sheet = csv.writer(f)
        sheet.writerow(('add', 'keep', 'del'))
        for stat in stats:
            sheet.writerow(stat)

and added newline='' to avoid printing empty rows.

I then accessed my data and calculated my means without the dtype as follows:

avg_add= df['add'].mean().item()

Answered By: resei09
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.