How to save output from python like tsv

Question:

I am using biopython package and I would like to save result like tsv file. This output from print to tsv.

for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
    print ("%s %s %s" % (record.id,record.seq, record.format("qual")))

Thank you.

Asked By: Vonton

||

Answers:

The following snippet:

from __future__ import print_function
with open("output.tsv", "w") as f:
  print ("%st%st%s" % ("asd", "sdf", "dfg"), file=f)
  print ("%st%st%s" % ("sdf", "dfg", "fgh"), file=f)

Yields a file output.tsv containing

asd    sdf    dfg
sdf    dfg    fgh

So, in your case:

from __future__ import print_function
with open("output.tsv", "w") as f:
  for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
    print ("%s %s %s" % (record.id,record.seq, record.format("qual")), file=f)
Answered By: EvenLisle

That is fairly simple , instead of printing it you need to write that to a file.

with open("records.tsv", "w") as record_file:
    for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
        record_file.write("%s %s %sn" % (record.id,record.seq, record.format("qual")))

And if you want to name the various columns in the file then you can use:

record_file.write("Record_Id    Record_Seq    Record_Qaln")

So the complete code may look like:

with open("records.tsv", "w") as record_file:
    record_file.write("Record_Id    Record_Seq    Record_Qaln")
    for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
        record_file.write(str(record.id)+"  "+str(record.seq)+"  "+ str(record.format("qual"))+"n")
Answered By: ZdaR

I prefer using join() in this type of code:

for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
    print ( 't'.join((str(record.id), str(record.seq), str(record.format("qual"))) )

The ‘tab’ character is t and the join function takes the (3) arguments and prints them with a tab in between.

Answered By: philshem

My preferred solution is to use the CSV module. It’s a standard module, so:

  • Somebody else has already done all the heavy lifting.
  • It allows you to leverage all the functionality of the CSV module.
  • You can be fairly confident it will function as expected (not always the case when I write it myself).
  • You’re not going to have to reinvent the wheel, either when you write the file or when you read it back in on the other end (I don’t know your record format, but if one of your records contains a TAB, CSV will escape it correctly for you).
  • It will be easier to support when the next person has to go in to update the code 5 years after you’ve left the company.

The following code snippet should do the trick for you:

#! /bin/env python3
import csv
with open('records.tsv', 'w', newline='') as tsvfile:
    writer = csv.writer(tsvfile, delimiter='t', lineterminator='n')
    for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
        writer.writerow([record.id, record.seq, record.format("qual")])
        

Note that this is for Python 3.x. If you’re using 2.x, the open and writer = ... will be slightly different.

Answered By: Deacon

If you want to use the .tsv to label your word embeddings in TensorBoard, use the following snippet. It uses the CSV module (see Doug’s answer).

# /bin/env python3
import csv

def save_vocabulary():
    label_file = "word2context/labels.tsv"
    with open(label_file, 'w', encoding='utf8', newline='') as tsv_file:
        tsv_writer = csv.writer(tsv_file, delimiter='t', lineterminator='n')
        tsv_writer.writerow(["Word", "Count"])
        for word, count in word_count:
            tsv_writer.writerow([word, count])

word_count is a list of tuples like this:

[('the', 222594), ('to', 61479), ('in', 52540), ('of', 48064) ... ]
Answered By: Domi W
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.