Count nodes sharing row or column with at least one other node

Question:

I have a grid of nodes (represented by ones). I would like to quickly and simply (in a way that is both readable and fast) count the number of nodes that share a column or row with another node.

Here is my solution (can it be improved?):

grid=[[0,0,0,0],[1,1,1,1],[0,0,0,1],[0,0,1,1],[0,0,0,1]]
rowlen=len(grid)
collen=len(grid[0])
rd={i: 0 for i in range(rowlen)}
cd={i: 0 for i in range(collen)}
cl=[]
rl=[]
for (rowi, row) in enumerate(grid):
    for (coli,x) in enumerate(row):
        if x>0:
            cd[coli]=cd[coli]+1
            rd[rowi]=rd[rowi]+1
            cl.append(coli)
            rl.append(rowi)

coords=zip(rl,cl)
coordslist=list(coords)
bools=[int(bool(((rd[a]-1) or (cd[b]-1)))) for (a,b) in coordslist]
answer=sum(bools)
Asked By: bittahProfessional

||

Answers:

Sometimes fast and readable are the same, but more often they depend on your input and wishes. For example, what you are doing is still considered fast for most people, less than a second for a grid of 2500x2500 or 6.250.000 entries.

Now for your question it is noticable, that if you know the row and column counts of 1's, you can quickly look up if there are more than one 1 in a row or column.

Eventhough fast and readable are very personal, I will share with you my solutions to the problem.

Pure python

This solution uses only pure python that is available in the library.

import itertools

rows = [row.count(1) for row in grid]
cols = [col.count(1) for col in zip(*grid)]

width, height = len(grid), len(grid[0])
indices = itertools.product(range(width), range(height))

answer = sum(1 for irow, icol in indices if grid[irow][icol] == 1 and (rows[irow] > 1 or cols[icol] > 1))

For this I am using several small tricks or perks of the python language, which are the following:

  • row.count(1), which counts the number of times the element 1 occurs in a list.
  • zip(*grid), which is a reverse zipping, see this post from Mike Corcoran for more information.
  • itertools.product, which generates all indices for me (documentation)
  • sum(1 for each in val if <condition on each>), which is a generator expression.

For me the above solution, is considered very readable and concise. Now on the speed front, the solution is a bit faster than yours, but how much you might wonder? Well if we go to a grid of 5000x5000 or 25.000.000 elements, the time difference is about 4 seconds (from 12s to 8s).

Well that is kinda bad… All that trouble for 4 seconds on something that would be considered enormous for most programmers, and probably only be used once. And if you have that many elements, you will probably not use a list anymore, but a numpy array (docs).

Pure Numpy

So how would the solution look like if it was a numpy array?

import numpy as np

# Convert grid from `list` to a `np.ndarray`
arr = np.array(grid)

# Get to row and column totals
cols = arr.sum(axis=0)
rows = arr.sum(axis=1)

# Get all indices that are `1`, and combine the x and y coordinates by zip.
indices = zip(*np.where(arr == 1))
answer = sum(1 for irow, icol in indices if rows[irow] > 1 or cols[icol] > 1)

The above is even more readable than before, you just need to know a few numpy convetions and methods. Namely, that numpy uses row, col for indexing, instead of the traditional (col, row) or x, y indexing. Therefore axis=0, loops over the rows, which give you the column totals. And that np.where returns the indices of where the condition inside the brackets is True.

Now the solution is slightly slower than the pure python solution, because it has to convert the list to an array, but if your data was already a numpy array, your solution and the pure python solution would take almost twice the amount of time compared to the numpy solution.

Conclusion

Above are two different alternative solutions that would solve the same problem, but the readability and speed are depending on your knowledge of the language and your needs.

If the grid is not going to be much langer than a 1000 elements, all solutions are equally quick, namely instantaneously from a user perspective.

And the readability is as good as your skills are.

Answered By: Thymen