Finding locations of words as lists of coordinates in a grid of letters

Question:

Given a grid of letters and a list of words, find the location of each word as a list of coordinates. Resulting list can be in any order, but coordinates for individual words must be given in order. Letters cannot be reused across words, and letters. Each given word is guaranteed to be in the grid. Consecutive letters of words are either down or to the right (i.e no reversed words or reversed sections of words, only down or to the right).

For example, given the following grid and set of words,

 [
    ['d', 'r', 'd', 'o', 'r', 's'],
    ['o', 'b', 'i', 'g', 'n', 'c'],
    ['g', 'f', 'n', 'm', 't', 'a'],
    ['x', 's', 'i', 'a', 'n', 't']
]

words1 = [ "dog", "dogma", "cat" ]

output the list of coordinates below:

findWords(grid, words)->
  [ [ (1, 5), (2, 5), (3, 5) ], # cat
    [ (0, 2), (0, 3), (1, 3), (2, 3), (3, 3)], # dogma
    [ (0, 0), (1, 0), (2, 0) ], # dog
  ]

In this example, the "dog" in "dogma" cannot be used for the word "dog" since letters cannot be reused.

Asked By: Andrew Ho

||

Answers:

Here is my attempt at a solution. First, I find all possible paths that I can take to spell any of the words. Paths are indexed by the word that they spell. Then I iterate through all possible combinations of paths by adding one possible path per word at a time while maintaining a seen set. Once I run out of feasible paths for a word before I find them all, then I backtrack.

def findWords(grid, words):
    # Regular old dfs through the grid, we only go right or down
    def dfs(row, col, path, idx):
        if idx == len(word):
            if word in all_paths:
                all_paths[word].append(list(path))
            else:
                all_paths[word] = [list(path)]
        else:
            if row + 1 < len(grid):
                if grid[row+1][col] == word[idx]:
                    path.append((row+1, col))
                    dfs(row+1, col, path, idx+1)
                    path.pop()
            if col + 1 < len(grid[0]):
                if grid[row][col+1] == word[idx]:
                    path.append((row, col+1))
                    dfs(row, col+1, path, idx+1)
                    path.pop()

    # For each word, find all possible paths through the grid to spell the word
    # Each path is a collection of coordinates as is desired from the function
    # Paths are indexed by word and stored in a list in a dictionary
    all_paths = {}
    for row in range(len(grid)):
        for col in range(len(grid[0])):
            for word in words:
                if grid[row][col] == word[0]:
                    dfs(row, col, [(row, col)], 1)

    # Try all possible combinations of paths from each letter
    def dfs2(idx):
        if idx == len(words):
            return True

        word = words[idx]
        for path in all_paths[word]:
            for loc in path:
                if loc in seen:
                    return False
            for loc in path:
                seen.add(loc)
            if dfs2(idx+1):
                retlst.append(path)
                return True
            else:
                for loc in path:
                    seen.remove(loc)
        return False

    # Backtrack through possible combinations
    seen = set([])
    retlst = []
    dfs2(0)
    return retlst

There’s probably a way to DFS through possible combinations of paths WHILE you’re DFSing through the words you need to spell to avoid pre-computing all paths, but it was way too complicated for me to figure out.

Answered By: Andrew Ho

Based on this answer, first you want to make a dictionary which maps letter to positions:

board = [
    ['d', 'r', 'd', 'o', 'r', 's'],
    ['o', 'b', 'i', 'g', 'n', 'c'],
    ['g', 'f', 'n', 'm', 't', 'a'],
    ['x', 's', 'i', 'a', 'n', 't']
]

words = [ "dog", "dogma", "cat" ]

letter_positions = {}
for y, row in enumerate(board):
    for x, letter in enumerate(row):
         letter_positions.setdefault(letter, []).append((x, y))
>>> letter_positions
{'d': [(0, 0), (2, 0)],
 'r': [(1, 0), (4, 0)],
 'o': [(3, 0), (0, 1)],
 's': [(5, 0), (1, 3)],
 'b': [(1, 1)],
 'i': [(2, 1), (2, 3)],
 'g': [(3, 1), (0, 2)],
 'n': [(4, 1), (2, 2), (4, 3)],
 'c': [(5, 1)],
 'f': [(1, 2)],
 'm': [(3, 2)],
 't': [(4, 2), (5, 3)],
 'a': [(5, 2), (3, 3)],
 'x': [(0, 3)]}

As in the linked answer, you should keep track of valid moves. Also you can only move down or right, so I added a plus condition compared to the original answer. I left the find_word function unchanged.

def is_valid_move(position, last):
    if last == []:
        return True
    if position[0] < last[0] or position[1] < last[1]: 
        return False # only allow down and right
    return (
        abs(position[0] - last[0]) <= 1 and
        abs(position[1] - last[1]) <= 1
    )

def find_word(word, used=None):
    if word == "":
        return []
    if used is None:
        used = []
    letter, rest = word[:1], word[1:]
    for position in letter_positions.get(letter) or []:
        if position in used:
            continue
        if not is_valid_move(position, used and used[-1]):
            continue
        path = find_word(rest, used + [position])
        if path is not None:
            return [position] + path
    return None

A little bit of explanation of the logic of find_word. The idea here is to take the first letter of the word in letter and store every other letter in rest, then iterate over the possible positions of that letter. Filter those positions based on if it’s used and if it’s a valid move. After that, recursively call find_word on the rest of the letters.

for word in words:
    print(find_word(word))
[(0, 0), (0, 1), (0, 2)] # dog
[(2, 0), (3, 0), (3, 1), (3, 2), (3, 3)] # dogma
[(5, 1), (5, 2), (5, 3)] # cat

Well, the indexing is flipped compared to the question, but that shouldn’t be a big problem.

Answered By: Péter Leéh

The task of finding the words in the grid can be done through the solutions provided in the other answers, or through tries, suffix trees or arrays.

As an example, based on the answer given by @Péter Leéh, this would be a modified version for finding all paths using python3:

grid = [
    ['d', 'r', 'd', 'o', 'r', 's'],
    ['o', 'b', 'i', 'g', 'n', 'c'],
    ['g', 'f', 'n', 'm', 't', 'a'],
    ['x', 's', 'i', 'a', 'n', 't']
]

words1 = [ "dog", "dogma", "cat" ]

# Building the dense grid
dense_grid = {}
for row, line in enumerate(grid):
    for col, letter in enumerate(line):
        dense_grid.setdefault(letter, []).append((row, col))

# Finding all paths for all words
def is_valid_move(p, q):
    return ( p[0] == q[0] and p[1]+1 == q[1] ) or ( p[0]+1 == q[0] and p[1] == q[1] )
        
def find_all_paths(curr_pos, suffix, dense_grid=dense_grid):
    if len(suffix) == 0: 
        return [[curr_pos]]
    
    possible_suffix_paths = []
    for pos in dense_grid[suffix[0]]:
        if is_valid_move(curr_pos, pos):
            possible_suffix_paths += find_all_paths(pos, suffix[1:])

        # Since the list of positions is ordered, I can skip the rest
        elif pos[0] - curr_pos[0] >= 2:
            break
        
    return [ [curr_pos] + p for p in possible_suffix_paths ]

words_paths = [ 
    [ path for pos in dense_grid[word[0]] for path in find_all_paths(pos, word[1:]) ]
    for word in words1
]

The final dense_grid is a dictionary from character to list of positions in the grid, being the positions represented by (row, column):

{
    'd': [(0, 0), (0, 2)],
    'r': [(0, 1), (0, 4)],
    'o': [(0, 3), (1, 0)],
    's': [(0, 5), (3, 1)],
    'b': [(1, 1)],
    'i': [(1, 2), (3, 2)],
    'g': [(1, 3), (2, 0)],
    'n': [(1, 4), (2, 2), (3, 4)],
    'c': [(1, 5)],
    'f': [(2, 1)],
    'm': [(2, 3)],
    't': [(2, 4), (3, 5)],
    'a': [(2, 5), (3, 3)],
    'x': [(3, 0)]
}

The final words_paths is a list containing for each word a list of all possible paths, being each path defined by a sequence (list) of positions in the grid:

[
    [
         [(0, 0), (1, 0), (2, 0)], # dog
         [(0, 2), (0, 3), (1, 3)]
    ],
    [
         [(0, 2), (0, 3), (1, 3), (2, 3), (3, 3)] # dogma
    ],
    [
         [(1, 5), (2, 5), (3, 5)] # cat
    ]
]

After you have all the possible paths for all the words, you can find the words with unique characters by transforming the problem into a digraph maximum flow problem.

To do the transformation of this problem, for every word, you have to create a starting and ending node, henceforth called START_word and END_word. The START_word nodes are connected to all the first positions of the paths of the word, which will then be connected to the second positions, and so on. The last positions of all the paths of the word will then be connected to the END_word node.
The nodes of the positions are unique across the graph. Meaning that words sharing the same positions in the grid, will also share the same nodes.

Now that we have the graph representing all the possible paths for all the words, we just need to connect a SOURCE node to all the starting nodes, and connect all the ending nodes to a TARGET node. With the resulting graph, you can the solve the maximum flow problem, where every edge in the graph as a capacity of 1.

This would be the resulting graph that you get from the problem you defined in the question:

enter image description here

However, to make sure that there are no nodes where the minimum of the in degree and out degree is greater than 1, we also need to add choking nodes. Assuming that a node has this characteristic, we need to remove all the out edges, and connect the original node with a single choking node. To the choking node is then added the original node’s out edges.

I tested this idea using the library networkx, and here is the code I used to test it:

import networkx as nx

# Connecting source node with starting nodes
edges = [ ("SOURCE", "START_"+word) for word in words1 ]

# Connecting ending nodes with target nodes
edges += [ ("END_"+word, "TARGET") for word in words1 ]

# Connecting characters between them and to the starting and ending nodes too
edges += list(set(
    ( s_node if isinstance(s_node, tuple) else s_node, 
      t_node if isinstance(t_node, tuple) else t_node )
    for word, paths in zip(words1, words_paths)
    for path in paths
    for s_node, t_node in zip(["START_"+word] + path, path + ["END_"+word])
))

# Generating graph from the nodes and edges created
g = nx.DiGraph()
g.add_edges_from(edges, capacity=1)

# Adding choke nodes if required
node_edge_dict = {}
nodes_indeg_gt1 = [ node for node, in_deg in g.in_degree() if not isinstance(node, str) and in_deg > 1 ]
for s_node, t_node in g.out_edges(nodes_indeg_gt1):
    node_edge_dict.setdefault(s_node, []).append(t_node)
    
for node, next_nodes in node_edge_dict.items():
    if len(next_nodes) <= 1: continue

    choke_node = node + (-1,)
    g.add_edge(node, choke_node, capacity=1)
    g.add_edges_from([ (choke_node, p) for p in next_nodes ], capacity=1)
    g.remove_edges_from([ (node, p) for p in next_nodes ])

# Solving the maximum flow problem
num_words, max_flow_dict = nx.algorithms.flow.maximum_flow(g, "SOURCE", "TARGET")

# Extracting final paths for all the words
final_words_path = []
for word in words1:
    word_path = []
    start = "START_"+word
    end = "END_"+word
    node = start
    
    while node != end:
        node = next( n for n,f in max_flow_dict[node].items() if f == 1 )
        if isinstance(node, str) or len(node) == 3: continue
        word_path.append(node)
    
    final_words_path.append(word_path)
    
print(final_words_path)

The output for the problem stated in the question is this:

[
    [(0, 0), (1, 0), (2, 0)], # dog
    [(0, 2), (0, 3), (1, 3), (2, 3), (3, 3)], # dogma
    [(1, 5), (2, 5), (3, 5)] # cat
]
Answered By: MkWTF

Approach

  1. Find paths that spell words. We only continue a path as long as its a prefix of a word.
  2. We quickly check if a word is a prefix by using bisect_left to check if it’s found in the list of words (a fast alternative to Trie Tree).
  3. We gather the list of paths for each word
  4. We reduce the paths to the non-overlapping ones to satisfy the requirement that no two words share a cell letter.

Code

from bisect import bisect_left

def find_words(board, words, x, y, prefix, path):
    ' Find words that can be generated starting at position x, y '
    
    # Base case
    # find if current word prefix is in list of words
    found = bisect_left(words, prefix)  # can use binary search since words are sorted
    if found >= len(words):
        return
   
    if words[found] == prefix:
        yield prefix, path              # Prefix in list of words

    # Give up on path if what we found is not even a prefix
    # (there is no point in going further)
    if len(words[found]) < len(prefix) or words[found][:len(prefix)] != prefix:
        return
    
    # Extend path by one lettter in boarde
    # Since can only go right and down 
    # No need to worry about same cell occurring multiple times in a given path
    for adj_x, adj_y in [(0, 1), (1, 0)]:
        x_new, y_new = x + adj_x, y + adj_y
        if x_new < len(board) and y_new < len(board[0]):
            yield from find_words(board, words, x_new, y_new, 
                                  prefix + board[x_new][y_new], 
                                  path + [(x_new, y_new)])
     
def check_all_starts(board, words):
    ' find all possilble paths through board for generating words '
    # check each starting point in board
    for x in range(len(board)):
        for y in range(len(board[0])):
            yield from find_words(board, words, x, y, board[x][y], [(x, y)])
   
def find_non_overlapping(choices, path):
    ' Find set of choices with non-overlapping paths '
    if not choices:
        # Base case
        yield path
    else:
        word, options = choices[0]

        for option in options:
            set_option = set(option)
            
            if any(set_option.intersection(p) for w, p in path):
                # overlaps with path
                continue
            else:
                yield from find_non_overlapping(choices[1:], path + [(word, option)])
        
    
def solve(board, words):
    ' Solve for path through board to create words '
    words.sort()
    
    # Get choice of paths for each word
    choices = {}
    for word, path in check_all_starts(board, words):
        choices.setdefault(word, []).append(path)
    
    # Find non-intersecting paths (i.e. no two words should have a x, y in common)
    if len(choices) == len(words):
        return next(find_non_overlapping(list(choices.items()), []), None)
    

Tests

Test 1

from pprint import pprint as pp

words = [ "dog", "dogma", "cat" ]
board = [
            ['d', 'r', 'd', 'o', 'r', 's'],
            ['o', 'b', 'i', 'g', 'n', 'c'],
            ['g', 'f', 'n', 'm', 't', 'a'],
            ['x', 's', 'i', 'a', 'n', 't']]

pp(solve(board, words))
        

Output

Test 1
[('dog', [(0, 0), (1, 0), (2, 0)]),
 ('dogma', [(0, 2), (0, 3), (1, 3), (2, 3), (3, 3)]),
 ('cat', [(1, 5), (2, 5), (3, 5)])]

Test 2

words = ["by","bat"] 
board = [ ['b', 'a', 't'], 
          ['y', 'x', 'b'], 
          ['x', 'x', 'y'], ] 

pp(solve(board, words))

Output

Test 2
[('bat', [(0, 0), (0, 1), (0, 2)]), 
 ('by', [(1, 2), (2, 2)])]
Answered By: DarrylG

Here is another way to do it:

def sol(word, board):
    rows = len(board)
    cols = len(board[0])
    coordinates = []
    wordCnt = 0
    co = []
    result = []
    def getWord(row, col, word, wordCnt, board):
        if row < 0 or col < 0 or row > len(board)-1 or col > len(board[0])-1 or wordCnt > len(word) -1 or board[row][col] != word[wordCnt]:
            return
        result.append(word[wordCnt])
        co.append((row, col))
        getWord(row+1, col, word, wordCnt+1, board)
        getWord(row, col+1, word, wordCnt+1, board)
        return co, result

    for row in range(rows):
        for col in range(cols):
            if board[row][col] == word[wordCnt]:
                co, result = getWord(row, col, word, wordCnt, board)
                if ''.join(result[-len(word):]) == word:
                    print(co[-len(word):])

                
sol('cat', board)
Answered By: Nirali Supe
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.