How to get a consensus of multiple sequence alignments using Biopython?

Question:

I am trying to get a consensus sequence from my multiple alignments files (fasta format).

I have a few fasta files each containing multiple sequence alignments. When I try to run this function below I get an AttributeError: 'generator' object has no attribute 'get_alignment_length'.

I haven’t been able to find any code examples for this using AlignIO.parse, I only saw examples using AlignIO.read.

def get_consensus_seq(filename):

    alignments = (AlignIO.parse(filename,"fasta"))
    summary_align = AlignInfo.SummaryInfo(alignments)
    consensus_seq = summary_align.dumb_consensus(0.7,"N")
    print(consensus_seq)
Asked By: Angie

||

Answers:

If I understand your situation right, the problem is the impossibility to get SummaryInfo from several alignments. They should be united into one.

from __future__ import annotations
from pathlib import Path
from itertools import chain

import Bio
from Bio import AlignIO
from Bio.Align import MultipleSeqAlignment
from Bio.Align.AlignInfo import SummaryInfo


SeqRecord = Bio.SeqRecord.SeqRecord


def get_consensus_seq(filename: Path | str) -> SeqRecord:
    common_alignment = MultipleSeqAlignment(
        chain(*AlignIO.parse(filename, "fasta"))
    )
    summary = SummaryInfo(common_alignment)
    consensus = summary.dumb_consensus(0.7, "N")
    return consensus
Answered By: Vovin
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.