Adding row/column headers to NumPy arrays

Question:

I have a NumPy ndarray to which I would like to add row/column headers.

The data is actually 7x12x12, but I can represent it like this:

  A=[[[0, 1, 2, 3, 4, 5],
      [1, 0, 3, 4, 5, 6],
      [2, 3, 0, 5, 6, 7],
      [3, 4, 5, 0, 7, 8],
      [4, 5, 6, 7, 0, 9],
      [5, 6, 7, 8, 9, 0]]


     [[0, 1, 2, 3, 4, 5],
      [1, 0, 3, 4, 5, 6],
      [2, 3, 0, 5, 6, 7],
      [3, 4, 5, 0, 7, 8],
      [4, 5, 6, 7, 0, 9],
      [5, 6, 7, 8, 9, 0]]]

where A is my 2x6x6 array.

How do I insert headers across the first row and the first column, so that each array looks like this in my CSV output file?

        A, a, b, c, d, e, f 
        a, 0, 1, 2, 3, 4, 5,
        b, 1, 0, 3, 4, 5, 6,
        c, 2, 3, 0, 5, 6, 7,
        d, 3, 4, 5, 0, 7, 8,
        e, 4, 5, 6, 7, 0, 9,
        f, 5, 6, 7, 8, 9, 0

What I have done is made the array 7x13x13 and inserted the data such that I have a row and column of zeros, but I’d much prefer strings.

I guess I could just write an Excel macro to replace the zeros with strings. However, the problem is that NumPy cannot convert string to float, if I try to reassign those zeros as the strings I want.

Asked By: emmagras

||

Answers:

I am not aware of any method to add headers to the matrix (even though I would find it useful). What I would do is to create a small class that prints the object for me, overloading the __str__ function.

Something like this:

class myMat:
    def __init__(self, mat, name):
        self.mat = mat
        self.name = name
        self.head = ['a','b','c','d','e','f']
        self.sep = ','

    def __str__(self):
        s = "%s%s"%(self.name,self.sep)
        for x in self.head:
            s += "%s%s"%(x,self.sep)
        s = s[:-len(self.sep)] + 'n'

        for i in range(len(self.mat)):
            row = self.mat[i]
            s += "%s%s"%(self.head[i],self.sep)
            for x in row:
                s += "%s%s"%(str(x),self.sep)
            s += 'n'
        s = s[:-len(self.sep)-len('n')]

        return s

Then you could just easily print them with the headers, using the following code:

print myMat(A,'A')
print myMat(B,'B')
Answered By: Oriol Nieto

Think this does the trick generically

Input

mats = array([[[0, 1, 2, 3, 4, 5],
    [1, 0, 3, 4, 5, 6],
    [2, 3, 0, 5, 6, 7],
    [3, 4, 5, 0, 7, 8],
    [4, 5, 6, 7, 0, 9],
    [5, 6, 7, 8, 9, 0]],

   [[0, 1, 2, 3, 4, 5],
    [1, 0, 3, 4, 5, 6],
    [2, 3, 0, 5, 6, 7],
    [3, 4, 5, 0, 7, 8],
    [4, 5, 6, 7, 0, 9],
    [5, 6, 7, 8, 9, 0]]])

Code

# Recursively makes pyramiding column and row headers
def make_head(n):
    pre = ''
    if n/26:
        pre = make_head(n/26-1)

    alph = "abcdefghijklmnopqrstuvwxyz"
    pre+= alph[n%26]
    return pre

# Generator object to create header items for n-rows or n-cols
def gen_header(nitems):
    n = -1
    while n<nitems:
        n+=1
        yield make_head(n)

# Convert numpy to list
lmats = mats.tolist()

# Loop through each "matrix"
for mat in lmats:
    # Pre store number of columns as we modify it before working rows
    ncols = len(mat[0])

    # add header value to front of each row from generator object
    for row,hd in zip(mat,gen_header(len(mat))):
        row.insert(0,hd)

    # Create a "header" line for all the columns
    col_hd = [hd for hd in gen_header(ncols-1)]
    col_hd.insert(0,"A")

    # Insert header line into lead row of matrix
    mat.insert(0,col_hd)

# Convert back to numpy
mats = numpy.array(lmats)

Output (value stored in mats):

array([[['A', 'a', 'b', 'c', 'd', 'e', 'f'],
        ['a', '0', '1', '2', '3', '4', '5'],
        ['b', '1', '0', '3', '4', '5', '6'],
        ['c', '2', '3', '0', '5', '6', '7'],
        ['d', '3', '4', '5', '0', '7', '8'],
        ['e', '4', '5', '6', '7', '0', '9'],
        ['f', '5', '6', '7', '8', '9', '0']],

       [['A', 'a', 'b', 'c', 'd', 'e', 'f'],
        ['a', '0', '1', '2', '3', '4', '5'],
        ['b', '1', '0', '3', '4', '5', '6'],
        ['c', '2', '3', '0', '5', '6', '7'],
        ['d', '3', '4', '5', '0', '7', '8'],
        ['e', '4', '5', '6', '7', '0', '9'],
        ['f', '5', '6', '7', '8', '9', '0']]], 
      dtype='|S4')
Answered By: Paul Seeb

Not really sure, but you may consider having a look at Pandas.

Answered By: Davide

Numpy will handle n-dimensional arrays fine, but many of the facilities are limited to 2-dimensional arrays. Not even sure how you want the output file to look.

Many people who would wish for named columns overlook the recarray() capabilities of numpy. Good stuff to know, but that only "names" one dimension.

For two dimensions, Pandas is very cool.

In [275]: DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])],
   .....:                      orient='index', columns=['one', 'two', 'three'])
Out[275]: 
   one  two  three
A    1    2      3
B    4    5      6

If output is the only problem you are trying to solve here, I’d probably just stick with a few lines of hand coded magic as it will be less weighty than installing another package for one feature.

Answered By: Phil Cooper

With pandas.DataFrame.to_csv you can write the columns and the index to a file:

import numpy as np
import pandas as pd

A = np.random.randint(0, 10, size=36).reshape(6, 6)
names = [_ for _ in 'abcdef']
df = pd.DataFrame(A, index=names, columns=names)
df.to_csv('df.csv', index=True, header=True, sep=' ')

will give you the following df.csv file:

  a b c d e f 
a 1 5 5 0 4 4 
b 2 7 5 4 0 9 
c 6 5 6 9 7 0 
d 4 3 7 9 9 3 
e 8 1 5 1 9 0 
f 2 8 0 0 5 1    
Answered By: bmu
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.