How can I get the frequency and its percentage in the same row as alphabet and sub alphabet?

Question

I want to get the frequency and its percentage in the same row as alphabet and sub alphabet.

I have a .csv file as follows:

Alphabet	Sub alphabet	Value
A	B	1
A	C	1
A	E	2
A	F	3
D	B	1
D	C	2
D	E	2
D	F	3

I want it to return result like this:

Alphabet	Value	Frequency	%
A	1	2	50%
A	2	1	25%
A	3	1	25%
D	1	1	25%
D	2	2	50%
D	3	1	25%

Believably expected table above is self-explanatory. The percentage refers to the corresponding row’s frequency divided by total frequency.

My code:

import csv

with open("/Users/name/Desktop/path/alphabetical_list.csv") as alphabetical_list_file:
    csv_reader = csv.reader(alphabetical_list_file, delimiter=',')

Feel free to leave a comment if you need more information.

How can I get the frequency and its percentage? I would appreciate any help. Thank you in advance!

Asked By: My Car

||

Source

Answer 1

You could try:

import csv
from collections import Counter
from itertools import groupby
from operator import itemgetter

with open("data.csv", "r") as fin,
     open("result.csv", "w") as fout:
    next(fin)  # Skip header row
    writer = csv.writer(fout)
    writer.writerow(["Alphabet", "Value", "Frequency", "%"])  # Write header
    for key, group in groupby(csv.reader(fin), key=itemgetter(0)):
        frequencies = Counter(map(itemgetter(2), group))
        total = 100 / sum(frequencies.values())
        writer.writerows(
            [key, value, frequency, frequency * total]
            for value, frequency in frequencies.items()
        )

Assumptions:

data.csv is the input file. Its first row contains the headers.
The first column comes in groups.

Result for sample data.csv

Alphabet,Sub alphabet,Value
A,B,1
A,C,1
A,E,2
A,F,3
D,B,1
D,C,2
D,E,2
D,F,3

is

Alphabet,Value,Frequency,%
A,1,2,50.0
A,2,1,25.0
A,3,1,25.0
D,1,1,25.0
D,2,2,50.0
D,3,1,25.0

You could also use Pandas:

import pandas as pd

df = pd.read_csv("data.csv")
df = df.groupby(["Alphabet", "Value"], as_index=False).agg(Frequency=("Value", "count"))
df["%"] = df["Frequency"] / df.groupby("Alphabet")["Frequency"].transform("sum") * 100
df.to_csv("result.csv", index=None)

First group the dataframe df by the columns Alphabet and Value, count the number of items in each group, and name the resulting new column Frequency via .agg.
Then add a new column by normalising Frequency for each Alphabet group: group df by Alphabet, get column Frequency, and sum the values. The .transform makes sure that the result keeps its original shape. Then divide the Frequency column by the result.
Finally write df into a csv-file, without the indices.

Answered By: Timus

How can I get the frequency and its percentage in the same row as alphabet and sub alphabet?

Question:

Answers: