find all words in a certain alphabet with multi character letters
Question:
I want to find out what words can be formed using the names of musical notes.
This question is very similar: Python code that will find words made out of specific letters. Any subset of the letters could be used
But my alphabet also contains "fis","cis" and so on.
letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]
I have a really long word list with one word per list and want to use
with open(...) as f:
for line in f:
if
to check if each word is part of that "language" and then save it to another file.
my problem is how to alter
>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False
so it also matches with "fis","cis" and so on.
e.g. "fish" is a match but "ifsh" is not a match.
Answers:
This function works, it doesn’t use any external libraries:
def func(word, letters):
for l in sorted(letters, key=lambda x: x.length, reverse=True):
word = word.replace(l, "")
return not s
it works because if s==""
, then it has been decomposed into your letters.
Update:
It seems that my explanation wasn’t clear. WORD.replace(LETTER, "")
will replace the note/LETTER in WORD by nothing, here is an example :
func("banana", {'na'})
it will replace every 'na'
in "banana"
by nothing (''
)
the result after this is "ba"
, which is not a note
not ""
means True
and not "ba"
is false, this is syntactic sugar.
here is another example :
func("banana", {'na', 'chicken', 'b', 'ba'})
it will replace every 'chicken'
in "banana"
by nothing (''
)
the result after this is "banana"
it will replace every 'ba'
in "banana"
by nothing (''
)
the result after this is "nana"
it will replace every 'na'
in "nana"
by nothing (''
)
the result after this is ""
it will replace every 'b'
in ""
by nothing (''
)
the result after this is ""
not ""
is True
==> HURRAY IT IS A MELODY !
note: The reason for the sorted
by length is because otherwise, the second example would not have worked. The result after deleting "b" would be "a", which can’t be decomposed in notes.
I believe ^(fis|cis|dis|[abcfhg])+$
will do the job.
Some deconstruction of what’s going on here:
|
workds like OR conjunction
[...]
denotes “any symbol from what’s inside the brackets”
^
and $
stand for beginning and end of line, respectively
+
stands for “1 or more time”
( ... )
stands for grouping, needed to apply +
/*
/{}
modifiers. Without grouping such modifiers applies to closest left expression
Alltogether this “reads” as “whole string is one or more repetition of fis/cis/dis or one of abcfhg”
You can calculate the number of letters of all units (names of musical notes), which are in the word, and compare this number to the length of the word.
from collections import Counter
units = {"c","d","e","f","g","a","h", "fis","cis","dis"}
def func(word, units=units):
letters_count = Counter()
for unit in units:
num_of_units = word.count(unit)
letters_count[unit] += num_of_units * len(unit)
if len(unit) == 1:
continue
# if the unit consists of more than 1 letter (e.g. dis)
# check if these letters are in one letter units
# if yes, substruct the number of repeating letters
for letter in unit:
if letter in units:
letters_count[letter] -= num_of_units
return len(word) == sum(letters_count.values())
print(func('disc'))
print(func('disco'))
# True
# False
A solution with tkinter window opening to choose file:
import re
from tkinter import filedialog as fd
m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()
with open(filename) as f:
for line in f:
if m.match(str(line).lower()) is not None:
matches.append(line[:-1])
print(matches)
This answer was posted as an edit to the question find all words in a certain alphabet with multi character letters by the OP Nivatius under CC BY-SA 4.0.
I want to find out what words can be formed using the names of musical notes.
This question is very similar: Python code that will find words made out of specific letters. Any subset of the letters could be used
But my alphabet also contains "fis","cis" and so on.
letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]
I have a really long word list with one word per list and want to use
with open(...) as f:
for line in f:
if
to check if each word is part of that "language" and then save it to another file.
my problem is how to alter
>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False
so it also matches with "fis","cis" and so on.
e.g. "fish" is a match but "ifsh" is not a match.
This function works, it doesn’t use any external libraries:
def func(word, letters):
for l in sorted(letters, key=lambda x: x.length, reverse=True):
word = word.replace(l, "")
return not s
it works because if s==""
, then it has been decomposed into your letters.
Update:
It seems that my explanation wasn’t clear. WORD.replace(LETTER, "")
will replace the note/LETTER in WORD by nothing, here is an example :
func("banana", {'na'})
it will replace every 'na'
in "banana"
by nothing (''
)
the result after this is "ba"
, which is not a note
not ""
means True
and not "ba"
is false, this is syntactic sugar.
here is another example :
func("banana", {'na', 'chicken', 'b', 'ba'})
it will replace every 'chicken'
in "banana"
by nothing (''
)
the result after this is "banana"
it will replace every 'ba'
in "banana"
by nothing (''
)
the result after this is "nana"
it will replace every 'na'
in "nana"
by nothing (''
)
the result after this is ""
it will replace every 'b'
in ""
by nothing (''
)
the result after this is ""
not ""
is True
==> HURRAY IT IS A MELODY !
note: The reason for the sorted
by length is because otherwise, the second example would not have worked. The result after deleting "b" would be "a", which can’t be decomposed in notes.
I believe ^(fis|cis|dis|[abcfhg])+$
will do the job.
Some deconstruction of what’s going on here:
|
workds like OR conjunction[...]
denotes “any symbol from what’s inside the brackets”^
and$
stand for beginning and end of line, respectively+
stands for “1 or more time”( ... )
stands for grouping, needed to apply+
/*
/{}
modifiers. Without grouping such modifiers applies to closest left expression
Alltogether this “reads” as “whole string is one or more repetition of fis/cis/dis or one of abcfhg”
You can calculate the number of letters of all units (names of musical notes), which are in the word, and compare this number to the length of the word.
from collections import Counter
units = {"c","d","e","f","g","a","h", "fis","cis","dis"}
def func(word, units=units):
letters_count = Counter()
for unit in units:
num_of_units = word.count(unit)
letters_count[unit] += num_of_units * len(unit)
if len(unit) == 1:
continue
# if the unit consists of more than 1 letter (e.g. dis)
# check if these letters are in one letter units
# if yes, substruct the number of repeating letters
for letter in unit:
if letter in units:
letters_count[letter] -= num_of_units
return len(word) == sum(letters_count.values())
print(func('disc'))
print(func('disco'))
# True
# False
A solution with tkinter window opening to choose file:
import re
from tkinter import filedialog as fd
m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()
with open(filename) as f:
for line in f:
if m.match(str(line).lower()) is not None:
matches.append(line[:-1])
print(matches)
This answer was posted as an edit to the question find all words in a certain alphabet with multi character letters by the OP Nivatius under CC BY-SA 4.0.