find all words in a certain alphabet with multi character letters

Question:

I want to find out what words can be formed using the names of musical notes.

This question is very similar: Python code that will find words made out of specific letters. Any subset of the letters could be used
But my alphabet also contains "fis","cis" and so on.

letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]

I have a really long word list with one word per list and want to use

with open(...) as f:
for line in f:
    if

to check if each word is part of that "language" and then save it to another file.

my problem is how to alter

>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False

so it also matches with "fis","cis" and so on.

e.g. "fish" is a match but "ifsh" is not a match.

Asked By: Nivatius

||

Answers:

This function works, it doesn’t use any external libraries:

def func(word, letters):
    for l in sorted(letters, key=lambda x: x.length, reverse=True):
        word = word.replace(l, "")
    return not s

it works because if s=="", then it has been decomposed into your letters.


Update:

It seems that my explanation wasn’t clear. WORD.replace(LETTER, "") will replace the note/LETTER in WORD by nothing, here is an example :

func("banana", {'na'})

it will replace every 'na' in "banana" by nothing ('')

the result after this is "ba", which is not a note

not "" means True and not "ba" is false, this is syntactic sugar.

here is another example :

func("banana", {'na', 'chicken', 'b', 'ba'})

it will replace every 'chicken' in "banana" by nothing ('')

the result after this is "banana"

it will replace every 'ba' in "banana" by nothing ('')

the result after this is "nana"

it will replace every 'na' in "nana" by nothing ('')

the result after this is ""

it will replace every 'b' in "" by nothing ('')

the result after this is ""

not "" is True ==> HURRAY IT IS A MELODY !

note: The reason for the sorted by length is because otherwise, the second example would not have worked. The result after deleting "b" would be "a", which can’t be decomposed in notes.

Answered By: Benoît P

I believe ^(fis|cis|dis|[abcfhg])+$ will do the job.

Some deconstruction of what’s going on here:

  • | workds like OR conjunction
  • [...] denotes “any symbol from what’s inside the brackets”
  • ^ and $ stand for beginning and end of line, respectively
  • + stands for “1 or more time”
  • ( ... ) stands for grouping, needed to apply +/*/{} modifiers. Without grouping such modifiers applies to closest left expression

Alltogether this “reads” as “whole string is one or more repetition of fis/cis/dis or one of abcfhg”

Answered By: Slam

You can calculate the number of letters of all units (names of musical notes), which are in the word, and compare this number to the length of the word.

from collections import Counter

units = {"c","d","e","f","g","a","h", "fis","cis","dis"}

def func(word, units=units):
    letters_count = Counter()
    for unit in units:
        num_of_units = word.count(unit)
        letters_count[unit] += num_of_units * len(unit) 
        if len(unit) == 1:
            continue
        # if the unit consists of more than 1 letter (e.g. dis)
        # check if these letters are in one letter units
        # if yes, substruct the number of repeating letters
        for letter in unit:
            if letter in units:
                letters_count[letter] -= num_of_units
    return len(word) == sum(letters_count.values())

print(func('disc'))
print(func('disco'))    
# True
# False
Answered By: Mykola Zotko

A solution with tkinter window opening to choose file:

import re
from tkinter import filedialog as fd

m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()


with open(filename) as f:
    for line in f:
        if m.match(str(line).lower()) is not None:
            matches.append(line[:-1])


print(matches)

This answer was posted as an edit to the question find all words in a certain alphabet with multi character letters by the OP Nivatius under CC BY-SA 4.0.

Answered By: vvvvv
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.