Editing the Text file's content based on value of the first 3 colume

Question:

I am using python 3.9 for below task.
I have text file with following contents(input file);

heading1,  heading2,    heading3,   heading4,   heading5,
10,        15,          20,         30,         40
10,        14,          20,         20,         39
10,        15,          20,         10,         29
9,         3,           13,         12,         13
10,        14,          20,          2,          3
10,        15,          20,          10,        10 

Now using python I am trying to make following new file(output file)

heading1,  heading2,    heading3,   heading4,   heading5,
10,        15,          20,         50,         79
10,        14,          20,         22,         42
9,         3,           13,         12,         13

Here in the second file(output file) I am comparing first three column(heading1,heading2 and heading3) and if these same values are presented in other rows I added the heading 4 and heading 5 and make unique row.

Asked By: Devenepali

||

Answers:

Here’s a quick hack using pandas.

# Read input file
import pandas as pd
df = pd.read_csv("input.txt")

# Dropping unnamed column due to trailing comma in your file
drop_ = [col for col in df.columns if 'Unnamed' in col]
df.drop(drop_, axis=1, inplace=True)
df.columns = [col.strip() for col in df.columns]

# Grouping and transformation
df['heading4'] = df.groupby(['heading1', 'heading2', 'heading3'])['heading4'].transform('sum')
df['heading5'] = df.groupby(['heading1', 'heading2', 'heading3'])['heading5'].transform('sum')
df.drop_duplicates(keep="first", inplace=True)

# Exporting
df.to_csv("output.txt", index=None)
Answered By: Suraj

Your text file appears to represent CSV.

You can read the CSV and make keys from the values in heading1 and heading2 (as a tuple) then use a dictionary to accumulate changes as follows:

import csv

result = {}

with open('input.txt') as infile:
    data = csv.reader(infile)
    columns = [c.strip() for c in next(data) if c]
    for row in data:
        t1, t2, *r = map(int, row)
        vals = result.setdefault((t1,t2), [0]*len(r))
        for i, v in enumerate(r):
            vals[i] += v


print(*columns, sep=',')

for k, v in result.items():
    print(*(list(k)+v), sep=',')

Output:

heading1,heading2,heading3,heading4,heading5
10,15,60,50,79
10,14,40,22,42
9,3,13,12,13
Answered By: Vlad