Longest word chain from a list of words

Question:

So, this is a part of a function I’m trying to make.

I don’t want the code to be too complicated.

I have a list of words, e.g.

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

The idea of the word chain sequence is for the next word to begin with the letter that the last word ended in.

(Edit: Each word cannot be used more than once. Other than that there are no other constraints.)

I want the output to give the longest word chain sequence, which in this case is:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

I’m not really sure how to do it, I had different attempts at trying this. One of them…

This code finds the word chain correctly if we start with a specific word from the list, e.g. words[0] (so ‘giraffe’):

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []

word_chain.append(words[0])

for word in words:
    for char in word[0]:

       if char == word_chain[-1][-1]:
            word_chain.append(word)

print(word_chain)

Output:

['giraffe', 'elephant', 'tiger', 'racoon']

BUT, I want to find the longest possible chain of words (explained above).

My method: So, I tried to use the above working code that I wrote and loop through, using each word from the list as the starting point and finding the word chain for each word[0], word[1], word[2] etc. Then I tried to find the longest word chain by using an if statement and compare the length to the previous longest chain, but I can’t get it done properly and I don’t really know where this is going.

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []
max_length = 0
for starting_word_index in range(len(words) - 1):

    word_chain.append(words[starting_word_index])

    for word in words:
        for char in word[0]:

            if char == word_chain[-1][-1]:
                word_chain.append(word)

    # Not sure

    if len(word_chain) > max_length:
        final_word_chain = word_chain
        longest = len(word_chain)
        word_chain.clear()

print(final_word_chain)

This is my nth attempt, I think this one prints an empty list, I had different attempts before this that failed to clear the word_chain list properly and ended up repeating words over again.

Any help much appreciated. Hopefully I didn’t make this too teedious or confusing… Thanks!

Asked By: Mandingo

||

Answers:

You can use recursion to explore every “branch” that emerges when every possible letter containing the proper initial character is added to a running list:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
def get_results(_start, _current, _seen):
  if all(c in _seen for c in words if c[0] == _start[-1]):
    yield _current
  else:
      for i in words:
        if i[0] == _start[-1]:
          yield from get_results(i, _current+[i], _seen+[i])


new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

Output:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

This solution works similar to the breadth-first search, as the function get_resuls will continue to iterate over the entire list as long as the current value has not been called on before. Values that have been seen by the function are added to the _seen list, ultimately ceasing the stream of recursive calls.

This solution will also ignore results with duplicates:

words = ['giraffe', 'elephant', 'ant', 'ning', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse',]
new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

Output:

['ant', 'tiger', 'racoon', 'ning', 'giraffe', 'elephant']
Answered By: Ajax1234

Hopefully, a more intuitive way of doing it without recursion. Iterate through the list and let Python’s sort and list comprehension do the work for you:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chain_longest(pivot, words):
    new_words = []
    new_words.append(pivot)
    for word in words:
        potential_words = [i for i in words if i.startswith(pivot[-1]) and i not in new_words]
        if potential_words:
            next_word = sorted(potential_words, key = lambda x: len)[0]
            new_words.append(next_word)
            pivot = next_word
        else:
            pass
    return new_words

max([chain_longest(i, words) for i in words], key = len)
>>
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

Set a pivot and check for potential_words if they start with your pivot word and do not appear in your new list of words. If found then just sort them by length and take the first element.

The list comprehension goes through every word as a pivot and returns you the longest chain.

Answered By: BernardL

This function creates a type of iterator called a generator (see: What does the "yield" keyword do?). It recursively creates further instances of the same generator to explore all possible tail sequences:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chains(words, previous_word=None):
    # Consider an empty sequence to be valid (as a "tail" or on its own):
    yield []
    # Remove the previous word, if any, from consideration, both here and in any subcalls:
    words = [word for word in words if word != previous_word]
    # Take each remaining word...
    for each_word in words:
        # ...provided it obeys the chaining rule
        if not previous_word or each_word.startswith(previous_word[-1]):
            # and recurse to consider all possible tail sequences that can follow this particular word:
            for tail in chains(words, previous_word=each_word):
                # Concatenate the word we're considering with each possible tail:
                yield [each_word] + tail  

all_legal_sequences = list(chains(words))  # convert the output (an iterator) to a list
all_legal_sequences.sort(key=len) # sort the list of chains in increasing order of chain length
for seq in all_legal_sequences: print(seq)
# The last line (and hence longest chain) prints as follows:
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

Or, to get straight to the longest chain more efficiently:

print(max(chains(words), key=len)

Finally, here is an alternative version that allows repeated words in the input (i.e. if you include a word N times, you may use it up to N times in the chain):

def chains(words, previous_word_index=None):
    yield []
    if previous_word_index is not None:
        previous_letter = words[previous_word_index][-1]
        words = words[:previous_word_index] + words[previous_word_index + 1:]
    for i, each_word in enumerate( words ):
        if previous_word_index is None or each_word.startswith(previous_letter):
            for tail in chains(words, previous_word_index=i):
                yield [each_word] + tail  
Answered By: jez

Here is a working recursive brute-force approach:

def brute_force(pool, last=None, so_far=None):
    so_far = so_far or []
    if not pool:
        return so_far
    candidates = []
    for w in pool:
        if not last or w.startswith(last):
            c_so_far, c_pool = list(so_far) + [w], set(pool) - set([w])
            candidates.append(brute_force(c_pool, w[-1], c_so_far))
    return max(candidates, key=len, default=so_far)

>>> brute_force(words)
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

At every recursive call, this tries to continue the chain with every eligible word form the remaining pool. It then chooses the longest such continuation.

Answered By: user2390182

Another answer using a recursive approach:

def word_list(w_list, remaining_list):
    max_result_len=0
    res = w_list
    for word_index in range(len(remaining_list)):
        # if the last letter of the word list is equal to the first letter of the word
        if w_list[-1][-1] == remaining_list[word_index][0]:
            # make copies of the lists to not alter it in the caller function
            w_list_copy = w_list.copy()
            remaining_list_copy = remaining_list.copy()
            # removes the used word from the remaining list
            remaining_list_copy.pop(word_index)
            # append the matching word to the new word list
            w_list_copy.append(remaining_list[word_index])
            res_aux = word_list(w_list_copy, remaining_list_copy)
            # Keep only the longest list
            res = res_aux if len(res_aux) > max_result_len else res 
    return res

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
word_list(['dog'], words)

output:

['dog', 'giraffe', 'elephant', 'tiger', 'racoon']
Answered By: Pedro Borges

In the spirit of brute force solutions, you can check all permutations of the words list and choose the best continuous starting sequence:

from itertools import permutations

def continuous_starting_sequence(words):
    chain = [words[0]]
    for i in range(1, len(words)):
        if not words[i].startswith(words[i - 1][-1]):
            break
        chain.append(words[i])
    return chain

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
best = max((continuous_starting_sequence(seq) for seq in permutations(words)), key=len)

print(best)
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

Since we’re considering all permutations, we know that there must be a permutation that starts with the largest word chain.

This, of course, has O(n n!) time complexity 😀

Answered By: slider

I have a tree-based approach for this question which might be faster. I am still working on implementation of the code but here is what I would do:

    1. Form a tree with the root node as first word. 
    2. Form the branches if there is any word or words that starts 
with the alphabet with which this current word ends.
    3. Exhaust the entire given list based on the ending alphabet
 of current word and form the entire tree.
    4. Now just find the longest path of this tree and store it.
    5. Repeat steps 1 to 4 for each of the words given in the list
 and print the longest path among the longest paths we got above.

I hope this might give a better solution in case there is a large list of words given. I will update this with the actual code implementation.

Answered By: CodeHunter

I have a new idea, as the figure shows:

enter image description here

We can construct a directed graph by word[0] == word[-1], then the problem is converted to find the maximum length path.

Answered By: TimeSeam

As mentioned by others, the problem is to find the longest path in a directed acyclic graph.

For anything graph related in Python, networkx is your friend.

You just need to initialize the graph, add the nodes, add the edges and launch dag_longest_path:

import networkx as nx
import matplotlib.pyplot as plt

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat',
         'hedgehog', 'mouse']

G = nx.DiGraph()
G.add_nodes_from(words)

for word1 in words:
    for word2 in words:
        if word1 != word2 and word1[-1] == word2[0]:
            G.add_edge(word1, word2)
nx.draw_networkx(G)
plt.show()
print(nx.algorithms.dag.dag_longest_path(G))

enter image description here

It outputs:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

Note : this algorithm only works if there are no cycles (loops) in the graph. It means it will fail with ['ab', 'ba'] because there would be a path of infinite length: ['ab', 'ba', 'ab', 'ba', 'ab', 'ba', ...]

Answered By: Eric Duminil