Find the percentage of presence of some characters in string

Question:

i need to count string and Find the percentage of presence of some characters in string But My code doesn’t work properly and shows the percentage line by line when I want the final result
this is my code :

from os import cpu_count
import re

with open('ros_gc_948_1_dataset.txt', 'r') as fp:
    # read all lines in a list
    lines = fp.readlines()
    c_count=0
    g_count=0
    a_count=0
    t_count=0
    for line in lines:

        # check if string present on a current line
        if  re.findall(r"bRos_[0-9][0-9][0-9][0-9]" , line):
            line.write()
            line.replace(">","")
            print(line)

            c_count=0
            g_count=0
            a_count=0
            t_count=0
        else:
           c_count+=line.count("C")
           g_count+=line.count("G")
           a_count+=line.count("A")
           t_count+=line.count("T")
           
        tocag=c_count+g_count
        toac=c_count + g_count + a_count + t_count
        if toac !=0:
         avrg=float(tocag/toac * 100)
         print(avrg)

and the result :

>Ros_7657

53.333333333333336
43.333333333333336
47.22222222222222
48.333333333333336
49.0
49.72222222222222
49.76190476190476
48.541666666666664
48.333333333333336
48.833333333333336
49.24242424242424
49.30555555555556
48.717948717948715
49.25925925925926
>Ros_3487

53.333333333333336
47.5
46.666666666666664
48.333333333333336
47.0
46.94444444444444
48.095238095238095
47.291666666666664
46.111111111111114
47.333333333333336
46.96969696969697
46.52777777777778
46.666666666666664
46.785714285714285
47.368421052631575

My expected result :

Ros_7657
49.25925925925926`
Ros_3487
47.368421052631575

I have a text file containing several lines of these entries that I just want their names and percentages

ex input :

>Ros_2115
GAGGCAATGGTTATCAACCCCTGATTTACGAATGACCTAACAACTCCTTAGAATTTAATC
GTTATGTGAATTAAGCAACGCTCGCGAATTGCTATGTTAATTCGCACTGTAAGGTGTCGA
ACGAAATCCACTGTTCCTTTTCTAATTTCTTTCA

thanks for help me

Answers:

Move the print statement a bit to the left, so it’s aligned with the beginning of the for loop.

Also, print("{:.2f}%".format(avrg)) should truncate to 2 decimals. Change the 2 by whatever numbers you want.

Answered By: JustLearning

This should work:

    ...
    toac = 0
    # check if string present on a current line
    if  re.findall(r"bRos_[0-9][0-9][0-9][0-9]" , line):
        # print count from the last series
        if toac !=0:
           avrg=float(tocag/toac * 100)
           print(avrg)
        line.write()
        ...

And in the end, keep the if… print, unindenting it, for the last series.
(Note that this block of code is repeated, which is not elegant, but for some reason I’m unable atm to find a more elegant solution…).

This seems better (provided all your genetic blocs start with ">Ros_nnnn"):

with open('ros_gc_948_1_dataset.txt', 'r') as fp:
    # join all lines in one string, then split around ">"
    lines = "".join(fp.readlines()).split(">")

    for line in lines:
        # skip empty lines
        if not line.strip():
            continue

        # print the 1st 8 characters ('Ros_xxxx')
        print(line[:8])

        c_count = line.count("C")
        g_count = line.count("G")
        a_count = line.count("A")
        t_count= line.count("T")
           
        tocag=c_count+g_count
        toac=c_count + g_count + a_count + t_count
        if toac !=0:
            avrg=float(tocag/toac * 100)
            print(avrg)
Answered By: Swifty
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.