How to get the sum of the same position in a tuple output of a for loop in python?
Question:
I wrote a definition to iterate over 200 files and calculate the number of transitions and transversions in a DNA sequence. Now I want to sum up the first column of the output of this for loop together and the second column together.
this is the output that I get repeated 200 times because I have 200 files, I want to get the sum of the first column (0+1+1+1+1+…)and the second column (1+0+0+0+….)
(0, 1) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (0, 1) (1, 0) (0, 1) (1, 0) (0, 1) (1, 0)
I tried to print the definition as a list and then sum up the lists, but those lists are not defined as they are just a for loop output, so I couldn’t sum them up.
print([dna_comparison(wild_type= f, mut_seq= h)])
Result:
[(0, 1)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
Answers:
As the other answers show, you have multiple solutions to this.
I think the most convenient way to handle this is through numpy, especially if you then use these tuples also for furhter processing.
For example, imagine t
is your collection of tuples, then you can transform it into numpy.array
and access it like a matrix, i.e. with row and column indexes:
t = [(0, 1), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0)]
t_array = np.array(t)
t_array[:, 1]
>>> array([1, 0, 0, 0, 0, 0, 0, 0])
At this point you can simply sum the elements by column:
t_array.sum(axis=0)
>>> array([7, 1])
I think list comprehension could be used in this case.
If you put all these tuple pairs into a list as such (probably by using a for loop):
outputs = [(0, 1), (1, 0), (1, 0), ....]
you could do something like
sum_totals = ( sum([x[0] for x in outputs]), sum([x[1] for x in outputs]) )
and sum_totals will look like (sum first column, sum second column)
Can be achieved by the below code:
arr = [[(0, 1)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)]]
firstcolumn = 0
secondcolumn = 0
for i in arr:
for v in i:
print(v[0], ' ', v[1])
firstcolumn = firstcolumn + v[0]
secondcolumn = secondcolumn + v[1]
print(firstcolumn, secondcolumn)
# 7 1
I don’t think it’s clear enough how you are getting the data. I mean, are tuples, but how are we reading those tuples?
You say you are reading the tuples from different files, so I guess aren’t already in a list.
For example:
import random
data_number = 200 # Simulating n number of data in files
wild_type: int = 0
mut_seq: int = 0
for _ in range(data_number):
data = (random.randint(0, 1), random.randint(0, 1)) # Simulating the tuple reading from a file
wild_type += data[0]
mut_seq += data[1]
print(f'wild_type {wild_type} times. mut_seq {mut_seq} times.')
One solution using itertools accumulate
, I think it is pretty and clean:
from itertools import accumulate
your_list = [(0, 1), (1, 0), (1, 0), ....]
*_, sum_ = accumulate(your_list, lambda x,y: (x[0]+y[0],x[1]+y[1]))
print(sum_)
Less clean, more python magic and only really relevant for code golf, but not importing anything:
tuple(map(sum, zip(*your_lst)))
I wrote a definition to iterate over 200 files and calculate the number of transitions and transversions in a DNA sequence. Now I want to sum up the first column of the output of this for loop together and the second column together.
this is the output that I get repeated 200 times because I have 200 files, I want to get the sum of the first column (0+1+1+1+1+…)and the second column (1+0+0+0+….)
(0, 1) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (0, 1) (1, 0) (0, 1) (1, 0) (0, 1) (1, 0)
I tried to print the definition as a list and then sum up the lists, but those lists are not defined as they are just a for loop output, so I couldn’t sum them up.
print([dna_comparison(wild_type= f, mut_seq= h)])
Result:
[(0, 1)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
[(1, 0)]
As the other answers show, you have multiple solutions to this.
I think the most convenient way to handle this is through numpy, especially if you then use these tuples also for furhter processing.
For example, imagine t
is your collection of tuples, then you can transform it into numpy.array
and access it like a matrix, i.e. with row and column indexes:
t = [(0, 1), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0), (1, 0)]
t_array = np.array(t)
t_array[:, 1]
>>> array([1, 0, 0, 0, 0, 0, 0, 0])
At this point you can simply sum the elements by column:
t_array.sum(axis=0)
>>> array([7, 1])
I think list comprehension could be used in this case.
If you put all these tuple pairs into a list as such (probably by using a for loop):
outputs = [(0, 1), (1, 0), (1, 0), ....]
you could do something like
sum_totals = ( sum([x[0] for x in outputs]), sum([x[1] for x in outputs]) )
and sum_totals will look like (sum first column, sum second column)
Can be achieved by the below code:
arr = [[(0, 1)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)],
[(1, 0)]]
firstcolumn = 0
secondcolumn = 0
for i in arr:
for v in i:
print(v[0], ' ', v[1])
firstcolumn = firstcolumn + v[0]
secondcolumn = secondcolumn + v[1]
print(firstcolumn, secondcolumn)
# 7 1
I don’t think it’s clear enough how you are getting the data. I mean, are tuples, but how are we reading those tuples?
You say you are reading the tuples from different files, so I guess aren’t already in a list.
For example:
import random
data_number = 200 # Simulating n number of data in files
wild_type: int = 0
mut_seq: int = 0
for _ in range(data_number):
data = (random.randint(0, 1), random.randint(0, 1)) # Simulating the tuple reading from a file
wild_type += data[0]
mut_seq += data[1]
print(f'wild_type {wild_type} times. mut_seq {mut_seq} times.')
One solution using itertools accumulate
, I think it is pretty and clean:
from itertools import accumulate
your_list = [(0, 1), (1, 0), (1, 0), ....]
*_, sum_ = accumulate(your_list, lambda x,y: (x[0]+y[0],x[1]+y[1]))
print(sum_)
Less clean, more python magic and only really relevant for code golf, but not importing anything:
tuple(map(sum, zip(*your_lst)))