How can I get the frequency and its percentage in the same row as alphabet and sub alphabet?

Question:

I want to get the frequency and its percentage in the same row as alphabet and sub alphabet.

I have a .csv file as follows:

Alphabet Sub alphabet Value
A B 1
A C 1
A E 2
A F 3
D B 1
D C 2
D E 2
D F 3

I want it to return result like this:

Alphabet Value Frequency %
A 1 2 50%
A 2 1 25%
A 3 1 25%
D 1 1 25%
D 2 2 50%
D 3 1 25%

Believably expected table above is self-explanatory. The percentage refers to the corresponding row’s frequency divided by total frequency.

My code:

import csv

with open("/Users/name/Desktop/path/alphabetical_list.csv") as alphabetical_list_file:
    csv_reader = csv.reader(alphabetical_list_file, delimiter=',')

Feel free to leave a comment if you need more information.

How can I get the frequency and its percentage? I would appreciate any help. Thank you in advance!

Asked By: My Car

||

Answers:

You could try:

import csv
from collections import Counter
from itertools import groupby
from operator import itemgetter

with open("data.csv", "r") as fin,
     open("result.csv", "w") as fout:
    next(fin)  # Skip header row
    writer = csv.writer(fout)
    writer.writerow(["Alphabet", "Value", "Frequency", "%"])  # Write header
    for key, group in groupby(csv.reader(fin), key=itemgetter(0)):
        frequencies = Counter(map(itemgetter(2), group))
        total = 100 / sum(frequencies.values())
        writer.writerows(
            [key, value, frequency, frequency * total]
            for value, frequency in frequencies.items()
        )

Assumptions:

  • data.csv is the input file. Its first row contains the headers.
  • The first column comes in groups.

Result for sample data.csv

Alphabet,Sub alphabet,Value
A,B,1
A,C,1
A,E,2
A,F,3
D,B,1
D,C,2
D,E,2
D,F,3

is

Alphabet,Value,Frequency,%
A,1,2,50.0
A,2,1,25.0
A,3,1,25.0
D,1,1,25.0
D,2,2,50.0
D,3,1,25.0

You could also use Pandas:

import pandas as pd

df = pd.read_csv("data.csv")
df = df.groupby(["Alphabet", "Value"], as_index=False).agg(Frequency=("Value", "count"))
df["%"] = df["Frequency"] / df.groupby("Alphabet")["Frequency"].transform("sum") * 100
df.to_csv("result.csv", index=None)
  • First group the dataframe df by the columns Alphabet and Value, count the number of items in each group, and name the resulting new column Frequency via .agg.
  • Then add a new column by normalising Frequency for each Alphabet group: group df by Alphabet, get column Frequency, and sum the values. The .transform makes sure that the result keeps its original shape. Then divide the Frequency column by the result.
  • Finally write df into a csv-file, without the indices.
Answered By: Timus
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.