How can I get the frequency and its percentage in the same row as alphabet and sub alphabet?
Question:
I want to get the frequency and its percentage in the same row as alphabet and sub alphabet.
I have a .csv
file as follows:
Alphabet
Sub alphabet
Value
A
B
1
A
C
1
A
E
2
A
F
3
D
B
1
D
C
2
D
E
2
D
F
3
I want it to return result like this:
Alphabet
Value
Frequency
%
A
1
2
50%
A
2
1
25%
A
3
1
25%
D
1
1
25%
D
2
2
50%
D
3
1
25%
Believably expected table above is self-explanatory. The percentage refers to the corresponding row’s frequency divided by total frequency.
My code:
import csv
with open("/Users/name/Desktop/path/alphabetical_list.csv") as alphabetical_list_file:
csv_reader = csv.reader(alphabetical_list_file, delimiter=',')
Feel free to leave a comment if you need more information.
How can I get the frequency and its percentage? I would appreciate any help. Thank you in advance!
Answers:
You could try:
import csv
from collections import Counter
from itertools import groupby
from operator import itemgetter
with open("data.csv", "r") as fin,
open("result.csv", "w") as fout:
next(fin) # Skip header row
writer = csv.writer(fout)
writer.writerow(["Alphabet", "Value", "Frequency", "%"]) # Write header
for key, group in groupby(csv.reader(fin), key=itemgetter(0)):
frequencies = Counter(map(itemgetter(2), group))
total = 100 / sum(frequencies.values())
writer.writerows(
[key, value, frequency, frequency * total]
for value, frequency in frequencies.items()
)
Assumptions:
data.csv
is the input file. Its first row contains the headers.
- The first column comes in groups.
Result for sample data.csv
Alphabet,Sub alphabet,Value
A,B,1
A,C,1
A,E,2
A,F,3
D,B,1
D,C,2
D,E,2
D,F,3
is
Alphabet,Value,Frequency,%
A,1,2,50.0
A,2,1,25.0
A,3,1,25.0
D,1,1,25.0
D,2,2,50.0
D,3,1,25.0
You could also use Pandas:
import pandas as pd
df = pd.read_csv("data.csv")
df = df.groupby(["Alphabet", "Value"], as_index=False).agg(Frequency=("Value", "count"))
df["%"] = df["Frequency"] / df.groupby("Alphabet")["Frequency"].transform("sum") * 100
df.to_csv("result.csv", index=None)
- First group the dataframe
df
by the columns Alphabet
and Value
, count the number of items in each group, and name the resulting new column Frequency
via .agg
.
- Then add a new column by normalising
Frequency
for each Alphabet
group: group df
by Alphabet
, get column Frequency
, and sum the values. The .transform
makes sure that the result keeps its original shape. Then divide the Frequency
column by the result.
- Finally write
df
into a csv-file, without the indices.
I want to get the frequency and its percentage in the same row as alphabet and sub alphabet.
I have a .csv
file as follows:
Alphabet | Sub alphabet | Value |
---|---|---|
A | B | 1 |
A | C | 1 |
A | E | 2 |
A | F | 3 |
D | B | 1 |
D | C | 2 |
D | E | 2 |
D | F | 3 |
I want it to return result like this:
Alphabet | Value | Frequency | % |
---|---|---|---|
A | 1 | 2 | 50% |
A | 2 | 1 | 25% |
A | 3 | 1 | 25% |
D | 1 | 1 | 25% |
D | 2 | 2 | 50% |
D | 3 | 1 | 25% |
Believably expected table above is self-explanatory. The percentage refers to the corresponding row’s frequency divided by total frequency.
My code:
import csv
with open("/Users/name/Desktop/path/alphabetical_list.csv") as alphabetical_list_file:
csv_reader = csv.reader(alphabetical_list_file, delimiter=',')
Feel free to leave a comment if you need more information.
How can I get the frequency and its percentage? I would appreciate any help. Thank you in advance!
You could try:
import csv
from collections import Counter
from itertools import groupby
from operator import itemgetter
with open("data.csv", "r") as fin,
open("result.csv", "w") as fout:
next(fin) # Skip header row
writer = csv.writer(fout)
writer.writerow(["Alphabet", "Value", "Frequency", "%"]) # Write header
for key, group in groupby(csv.reader(fin), key=itemgetter(0)):
frequencies = Counter(map(itemgetter(2), group))
total = 100 / sum(frequencies.values())
writer.writerows(
[key, value, frequency, frequency * total]
for value, frequency in frequencies.items()
)
Assumptions:
data.csv
is the input file. Its first row contains the headers.- The first column comes in groups.
Result for sample data.csv
Alphabet,Sub alphabet,Value
A,B,1
A,C,1
A,E,2
A,F,3
D,B,1
D,C,2
D,E,2
D,F,3
is
Alphabet,Value,Frequency,%
A,1,2,50.0
A,2,1,25.0
A,3,1,25.0
D,1,1,25.0
D,2,2,50.0
D,3,1,25.0
You could also use Pandas:
import pandas as pd
df = pd.read_csv("data.csv")
df = df.groupby(["Alphabet", "Value"], as_index=False).agg(Frequency=("Value", "count"))
df["%"] = df["Frequency"] / df.groupby("Alphabet")["Frequency"].transform("sum") * 100
df.to_csv("result.csv", index=None)
- First group the dataframe
df
by the columnsAlphabet
andValue
, count the number of items in each group, and name the resulting new columnFrequency
via.agg
. - Then add a new column by normalising
Frequency
for eachAlphabet
group: groupdf
byAlphabet
, get columnFrequency
, and sum the values. The.transform
makes sure that the result keeps its original shape. Then divide theFrequency
column by the result. - Finally write
df
into a csv-file, without the indices.