Sorting matrix columns

Question:

I have a matrix 4*5 and I need to sort it by several columns.

Given these inputs:

sort_columns = [3, 1, 2, 4, 5, 2]
matrix = [[3, 1, 8, 1, 9],
          [3, 7, 8, 2, 9],
          [2, 7, 7, 1, 2],
          [2, 1, 7, 1, 9]]

the matrix should first be sorted by the 3nd column (so the values 8, 8, 7, 7), then the sorted result should again be sorted by column 1 (values 3, 3, 2, 2) and so on.

So, after first sorting by column 3, the matrix would be:

2 7 7 1 2
2 1 7 1 9
3 1 8 1 9
3 7 8 2 9

and sorting on column 1 then has no effect as the values are already in the right order. The next column, 2, then makes the order:

2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9

etc.

After sorting on all the sort_columns numbers, I expect to get the result:

2 7 7 1 2
3 1 8 1 9
2 1 7 1 9
3 7 8 2 9

This is my code to sort the matrix:

def sort_matrix_columns(matrix, n, sort_columns):
    for col in sort_columns:
        column = col - 1
        for i in range(n):
            for j in range(i + 1, n):
                if matrix[i][column] > matrix[j][column]:
                    temp = matrix[i]
                    matrix[i] = matrix[j]
                    matrix[j] = temp

which is called like this:

sort_matrix_columns(matrix, len(matrix), sort_columns)

But when I do I get the following wrong result:

3 1 8 1 9 
2 1 7 1 9 
2 7 7 1 2 
3 7 8 2 9 

Why am I getting the wrong order here? Where is my sort implementation failing?

Asked By: Tivasic

||

Answers:

The short answer is that your sort implementation is not stable.

A sort algorithm is stable when two entries in the sorted sequence keep the same (relative) order when their sort key is the same. For example, when sorting only by the first letter, a stable algorithm will always sort the sequence ['foo', 'flub', 'bar'] to be ['bar', 'foo', 'flub'], keeping the 'foo' and 'flub' values in the same relative order. Your algorithm would swap 'foo' and 'bar' (as 'f' > 'b' is true) without touching 'flub', and so you’d end up with ['bar', 'flub', 'foo'].

You need a stable sort algorithm when applying sort multiple times as you do when using multiple columns, because subsequent sortings should leave the original order applied by preceding sort operations when the value in the current column is the same between two rows.

You can see this when your implementation sorts by column 5, after first sorting on columns 3, 1, 2, 4. After those first 4 sort operations the matrix looks like this:

2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9

Your implementation then sorts by column 5, so by 9, 9, 2, 9. The first row is then swapped with the 3rd row (2 1 7 1 9 and 2 7 7 1 2, leaving the other rows all untouched. This changed the relative order of all the columns with a 9:

2 7 7 1 2  < - was third
3 1 8 1 9    < - so this row is now re-ordered!
2 1 7 1 9  < - was first
3 7 8 2 9

Sorting the above output by the 2nd column (7, 1, 1, 7) then leads to the wrong output you see.

A stable sort algorithm would have moved the 2 7 7 1 2 row to be the first row without reordering the other rows:

2 7 7 1 2  < - was third
2 1 7 1 9  < - was first
3 1 8 1 9  < - was second, stays *after* the first row
3 7 8 2 9  < - was third, stays *after* the second row

and sorting by the second column produces the correct output.

The default Python sort implementation, TimSort (named after its inventor, Tim Peters), is a stable sort function. You could just use that (via the list.sort() method and a sort key function):

def sort_matrix_columns(matrix, sort_columns):
    for col in sort_columns:
        matrix.sort(key=lambda row: row[col - 1])

Heads-up: I removed the n parameter from the function, for simplicity’s sake.

Demo:

>>> def pm(m): print(*(' '.join(map(str, r)) for r in m), sep="n")
...
>>> def sort_matrix_columns(matrix, sort_columns):
...     for col in sort_columns:
...         matrix.sort(key=lambda row: row[col - 1])
...
>>> sort_columns = [3, 1, 2, 4, 5, 2]
>>> matrix = [[3, 1, 8, 1, 9],
...           [3, 7, 8, 2, 9],
...           [2, 7, 7, 1, 2],
...           [2, 1, 7, 1, 9]]
>>> sort_matrix_columns(matrix, sort_columns)
>>> pm(matrix)
2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9

You don’t need to use loop, if you reverse the sort_columns list and use that to create a single sort key, you can do this with a single call:

def sort_matrix_columns(matrix, sort_columns):
    matrix.sort(key=lambda r: [r[c - 1] for c in sort_columns[::-1]])

This works the same way, the most significant sort is the last column, only when two rows have the same value (a tie) would the one-but-last column sort matter, etc.

There are other stable sort algorithms, e.g. insertion or bubble sort would work just as well here. Wikipedia has a handy table of comparison sort algorithms that includes a ‘stable’ column, if you wanted to implement sorting yourself still.

E.g. here is a version using insertion sort:

def insertionsort_matrix_columns(matrix, sort_columns):
    for col in sort_columns:
        column = col - 1
        for i in range(1, len(matrix)):
            for j in range(i, 0, -1):
                if matrix[j - 1][column] <= matrix[j][column]:
                    break
                matrix[j - 1], matrix[j] = matrix[j], matrix[j - 1]

I didn’t use a temp variable to swap two rows. In Python, you can swap two values simply by using tuple assignments.

Because insertion sort is stable, this produces the expected outcome:

>>> matrix = [[3, 1, 8, 1, 9],
...           [3, 7, 8, 2, 9],
...           [2, 7, 7, 1, 2],
...           [2, 1, 7, 1, 9]]
>>> insertionsort_matrix_columns(matrix, sort_columns)
>>> pm(matrix)
2 1 7 1 9
3 1 8 1 9
2 7 7 1 2
3 7 8 2 9
Answered By: Martijn Pieters
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.