How to do row-to-column transposition of data in csv table?

Question:

I’m new to scripting. I have a table (Table1.txt) and I need to create another table that has Table1’s rows arranged in columns and vice versa. I have found solutions to this problem for Perl and SQL but not for Python.

I just started learning Python two days ago, so this is as far as I got:

import csv
import sys

with open(sys.argv[1], "rt") as inputfile:
   readinput = csv.reader(inputfile, delimiter='t')
   with open("output.csv", 'wt') as outputfile:
      writer = csv.writer(outputfile, delimiter="t")
      for row in readinput:
            values = [row[0], row[1], row[2], row[3]]
            writer.writerow([values])

This just reproduces the columns as columns. What I would have liked to do now is to write the last line as writer.writecol([values]) but it seems that there is no command like that and I haven’t found another way of writing rows as columns.

Asked By: Frank

||

Answers:

Here’s one way to do it, assume for simplicity that you just want to print out the objects in order:

  # lets read all the data into a big 2d array
  buffer = []
  for row in readinput: 
        values = [row[0], row[1], row[2], row[3]]  
        buffer.append(values)       

  # what you have in your code
  for i in range(len(buffer)):
      for j in range(len(buffer[0])):
          print buffer[i][j]

  # this is called a transpose; we have buffer[i][j] to read row then column, 
  #    switch i and j around to do the opposite
  for i in range(len(buffer[0])):
      for j in range(len(buffer)):
          print buffer[j][i]

Since you need an array to pass to writer.writerow , you could do this

  for i in range(len(buffer[0])):
      writer.writerow([buffer[j][i] for j in range(len(buffer))])
Answered By: dfb

The solution in general to transpose a sequence of iterables is: zip(*original_list)

sample input:

1   2   3   4   5
6   7   8   9   10
11  12  13  14  15

program:

with open('in.txt') as f:
  lis = [x.split() for x in f]

for x in zip(*lis):
  for y in x:
    print(y+'t', end='')
  print('n')

output:

1   6   11  

2   7   12  

3   8   13  

4   9   14  

5   10  15
Answered By: Ashwini Chaudhary

@Ashwini’s answer is perfect. The magic happens in

zip(*lis)

Let me explain why this works: zip takes (in the simplest case) two lists and “zips” them: zip([1,2,3], [4,5,6]) will become [(1,4), (2,5), (3,6)]. So if you consider the outer list to be a matrix and the inner tuples to be the rows, that’s a transposition (ie., we turned the rows to columns).

Now, zip is a function of arbitrary arity, so it can take more then two arguments:

# Our matrix is:
# 1 2 3
# 4 5 6
# 7 8 9

zip([1,2,3], [4,5,6], [7,8,9])

>>> [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

# Now it is
# 1 4 7
# 2 5 8
# 3 6 9

The problem we’re facing is that in your case, we don’t know how many arguments we want to pass to zip. But at least, we already know the arguments: they are the elements of lis! lis is a list, and each element of that list is a list as well (corresponding to one line of numbers in your input file). The * is just Pythons way of telling a function “please use the elements of whatever follows as your arguments and not the thing itself!”

So

lis = [[1,2,3], [4,5,6]]
zip(*lis)

is exactly the same as

zip([1,2,3], [4,5,6])

Congrats, now you’re a Python pro! 😉

Answered By: Manuel Ebert

Since we are talking about columns, rows and transposes, perhaps it is worth it to mention numpy

>>> import numpy as np
>>> x = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
>>> x
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])
>>> x.T
array([[ 1,  4,  7, 10],
       [ 2,  5,  8, 11],
       [ 3,  6,  9, 12]])
Answered By: Akavall

Just to construct on @Akavall answer, if you want to read from a file, transpose and then save again just do:

from numpy import genfromtxt, savetxt
data = genfromtxt('in.txt')
savetxt('out.txt',data.T)

data.T in the 3rd line is where the data gets transposed.

Answered By: elyase
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.