How do I let Python know that these two words are the same?
Question:
I have a .csv
file. Here is an example:
Table format:
A
AA
BB
B
C
CC
D D
D DD
Text format:
A,AA
BB,B
C,CC
D D,D DD
I want Python to know that A
is equal to AA
, BB
is equal to B
, and C
is equal to CC
. Also, the fourth example has spaces.
It can also be reversed, such as checking whether AA
is equal to A
.
What I’m working on is a search engine. A word may be written in two ways, so I need to do this.
For example, I have a boolean variable that checks if the search result is A
and it returns AA
, then it is True
. Of course, returning A
is also True
.
My code:
query = … // "AA"
result_list = … // ["A"]
sorted_list = [element for element in result_list if element.find(query) != -1]
Feel free to leave a comment if you need more information.
How do I let Python know that these two words are the same?
Answers:
Maybe it helps if you check for the hex code. Something like this:
input = "A AA"
hex_code=(":".join("{:02x}".format(ord(x)) for x in input))
if "41" in hex_code:
print("True")
You can solve that with a simplified string at the left and right sides. For example, A
will be simplified as A
, AA
be A
, D D
as D
, etc. Try this :
import pandas as pd
def simplify(s):
return "".join(set(s))
# data.csv
# A,AA
# BB,B
# C,CC
# D D,D DD
df = pd.read_csv('data.csv', header=None)
df = df.values.tolist()
result = []
for i in df:
result.append(simplify(i[0]) == simplify(i[1]))
print(result)
The output must be like this :
[True, True, True, True]
Turn both strings into sets, add a space to one, and check if they match. Wrap that into function and use as you will:
def isequal_with_spaces(s1, s2):
set1 = set(s1 + " ")
return set1.issuperset(s2)
assert isequal_with_spaces('A', 'AA')
assert not isequal_with_spaces('A', 'BA')
assert isequal_with_spaces('A', 'A A')
assert isequal_with_spaces('AA', 'AAA A ')
assert not isequal_with_spaces('B', 'A A')
This doesn’t take into account case and might work not quite as you need for strings like 'AB' == 'A A'
, but that wasn’t specified in the question =)
Moreover, if a "word" could be defined as "capital letter followed by any number of spaces or the same letters", and that’s guaranteed, you can simplify isequal_with_spaces
even further:
def isequal_with_spaces(s1, s2):
return s1[0] == s2[0]
After using csv
module to read the rows as lists, remove any spaces, and compare the characters:
ex: all(set(x.replace(' ', '')) == set(A[0]) for x in A)
>>> A = ['AA']
>>> D = ['D','D DD']
>>> all(set(x.replace(' ', '')) == set(A[0]) for x in A)
True
>>> all(set(x.replace(' ', '')) == set(D[0]) for x in D)
True
One way to handle this scenario is to use a dictionary that maps words to their equivalent words. For example:
word_map = {
"A": "AA",
"AA": "A",
"BB": "B",
"B": "BB",
"C": "CC",
"CC": "C",
"D D": "D DD",
"D DD": "D D",
}
Then, you can use this dictionary to check if the query is equivalent to any of the elements in the result_list:
equivalent_word = word_map.get(query, None)
if equivalent_word in result_list:
sorted_list.append(equivalent_word)
You can add all the equivalent words to the word_map dictionary, and then use this dictionary to check if a word is equivalent to another word.
I have a .csv
file. Here is an example:
Table format:
A | AA |
---|---|
BB | B |
C | CC |
D D | D DD |
Text format:
A,AA
BB,B
C,CC
D D,D DD
I want Python to know that A
is equal to AA
, BB
is equal to B
, and C
is equal to CC
. Also, the fourth example has spaces.
It can also be reversed, such as checking whether AA
is equal to A
.
What I’m working on is a search engine. A word may be written in two ways, so I need to do this.
For example, I have a boolean variable that checks if the search result is A
and it returns AA
, then it is True
. Of course, returning A
is also True
.
My code:
query = … // "AA"
result_list = … // ["A"]
sorted_list = [element for element in result_list if element.find(query) != -1]
Feel free to leave a comment if you need more information.
How do I let Python know that these two words are the same?
Maybe it helps if you check for the hex code. Something like this:
input = "A AA"
hex_code=(":".join("{:02x}".format(ord(x)) for x in input))
if "41" in hex_code:
print("True")
You can solve that with a simplified string at the left and right sides. For example, A
will be simplified as A
, AA
be A
, D D
as D
, etc. Try this :
import pandas as pd
def simplify(s):
return "".join(set(s))
# data.csv
# A,AA
# BB,B
# C,CC
# D D,D DD
df = pd.read_csv('data.csv', header=None)
df = df.values.tolist()
result = []
for i in df:
result.append(simplify(i[0]) == simplify(i[1]))
print(result)
The output must be like this :
[True, True, True, True]
Turn both strings into sets, add a space to one, and check if they match. Wrap that into function and use as you will:
def isequal_with_spaces(s1, s2):
set1 = set(s1 + " ")
return set1.issuperset(s2)
assert isequal_with_spaces('A', 'AA')
assert not isequal_with_spaces('A', 'BA')
assert isequal_with_spaces('A', 'A A')
assert isequal_with_spaces('AA', 'AAA A ')
assert not isequal_with_spaces('B', 'A A')
This doesn’t take into account case and might work not quite as you need for strings like 'AB' == 'A A'
, but that wasn’t specified in the question =)
Moreover, if a "word" could be defined as "capital letter followed by any number of spaces or the same letters", and that’s guaranteed, you can simplify isequal_with_spaces
even further:
def isequal_with_spaces(s1, s2):
return s1[0] == s2[0]
After using csv
module to read the rows as lists, remove any spaces, and compare the characters:
ex: all(set(x.replace(' ', '')) == set(A[0]) for x in A)
>>> A = ['AA']
>>> D = ['D','D DD']
>>> all(set(x.replace(' ', '')) == set(A[0]) for x in A)
True
>>> all(set(x.replace(' ', '')) == set(D[0]) for x in D)
True
One way to handle this scenario is to use a dictionary that maps words to their equivalent words. For example:
word_map = {
"A": "AA",
"AA": "A",
"BB": "B",
"B": "BB",
"C": "CC",
"CC": "C",
"D D": "D DD",
"D DD": "D D",
}
Then, you can use this dictionary to check if the query is equivalent to any of the elements in the result_list:
equivalent_word = word_map.get(query, None)
if equivalent_word in result_list:
sorted_list.append(equivalent_word)
You can add all the equivalent words to the word_map dictionary, and then use this dictionary to check if a word is equivalent to another word.