Return a random set of words from system dictionary that meet specific criteria

Question:

I’d like to help my young child broaden his vocabulary. The plan is to parse the dictionary (in this case, MacOS), append the words to a list and if those list items meets specific criteria they are added to another list… perhaps a little messier than it needs to be!

I’d like to see just five randomly chosen words be printed. I’ve managed most of it already but get an error when trying to pick a random item to show…

IndexError: list index out of range

And the code thus far…

import random


word_file = "/usr/share/dict/words"
WORDS = open(word_file).read().splitlines()


for x in WORDS:
    myRawList = []
    myRawListWithCount = []

    # filters out words that don't start with 'a"
    if x.startswith("a"):
        myRawList.append(x)

    # word len. cannot exceed 5
    for y in myRawList:
        if (len(y)) <= 5:
            myRawListWithCount.append(y)

# the line that causes an error. Simpler/shorter lists for names etc seem to work OK with the command.
print(random.choice(myRawListWithCount))
Asked By: William Lombard

||

Answers:

Try changing the scope of your lists. It’s possible that it doesn’t like you accessing it in the way you currently are with two different scopes.

myRawList = []
myRawListWithCount = []
for x in WORDS:
    

    # filters out words that don't start with 'a"
    if x.startswith("a"):
        myRawList.append(x)

    # word len. cannot exceed 5
    for y in myRawList:
        if (len(y)) <= 5:
            myRawListWithCount.append(y)

# the line that causes an error. Simpler/shorter lists for names etc seem to work OK with the command.
print(random.choice(myRawListWithCount))
Answered By: JRose

You set up a new empty list at each step in the loop.

Also the way you generate the second list is extremely inefficient (you read again all found words for each new word).

myRawList = []
myRawListWithCount = []

for x in WORDS:

    # filters out words that don't start with 'a"
    if x.startswith("a"):
        myRawList.append(x)

        # word len. cannot exceed 5
        if (len(x)) <= 5:
            myRawListWithCount.append(x)

print(random.choice(myRawListWithCount))

Example output: amino

Another idea of optimization, as a dictionary is sorted, you could break out of the loop as soon as you find a word not starting with a (you would then need a separate loop to create the second list)

Answered By: mozway

I may have misunderstood your goal. If you just need five words chosen at random from the file, why not use a list comprehension to build up a list of all words from the file that meet your criteria, then use random.sample to pull out a sample of five words?

import random

word_file = "/usr/share/dict/words"

with open(word_file, "r") as f:
  print(random.sample([word for line in f.readlines() 
                            for word in [line.strip()]
                            if word.startswith("a") and len(word) <= 5], 
                       5))
Answered By: Chris
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.