Problems matching chinese characters in a table
Question:
I am currently having difficulty with a matchlist task.
Objective 1:
Target words that are present – exact match – in the ‘match list’ should be sorted into one group.
Objective 2:
For target words that are not present in the match list, locate all entries in the match list that has the first character of the target word. For instance, there isn’t 带着 in the match list, hence the code proceeds to search for any words in the match list with the first character 带 – which would be 带走.
- If successful, the target word and the located words in the match list should be sorted into a second group (e.g. lists, tuple).
- If no such entry exists, the target word should be saved in a third group.
I am able to tackle objective 1 with the following code in mind:
matchlist = table[‘Match list’]
targetlist = table[‘Target word’]
table[‘present’] = matchlist.isin(targetlist)
table[‘present’].map({Yes:1, No:0})
table.loc[table['present'] == 1]
I am not sure how to go about tackling objective 2. I have some disjointed lines of code.
Something like this –
list = table.loc[table[‘present’]== 0]
a = {}
Or like this –
for i in list:
if matchlist.str.contains(targetword[i][0])
But I am largely drawing a blank.
Answers:
You’re being asked to perform set operations. Namely, first the intersection between match
and target
:
match = set(...)
target = set(...)
shared = match & target
which yields the list of "things" present in both sets.
For the second, start with all targets that you know are not shared and then iterate with those missing terms to find entries in match with a shared first character. If there are any, you add them to your "first character only" result set, if not, they go in the "yeah this can’t be matched at all" list:
missing = target - shared
first_only = set()
unmatched = list()
for term in missing:
found = {x for x in match if x[0] == term[0]}
if len(found) > 0:
first_only |= found
else:
unmatched.append(term)
Note that this is not the most efficient code, but simply an implementation we end up with if we follow each of the steps as described in the text as its own thing. Coming up with efficient code will be up to you.
I am currently having difficulty with a matchlist task.
Objective 1:
Target words that are present – exact match – in the ‘match list’ should be sorted into one group.
Objective 2:
For target words that are not present in the match list, locate all entries in the match list that has the first character of the target word. For instance, there isn’t 带着 in the match list, hence the code proceeds to search for any words in the match list with the first character 带 – which would be 带走.
- If successful, the target word and the located words in the match list should be sorted into a second group (e.g. lists, tuple).
- If no such entry exists, the target word should be saved in a third group.
I am able to tackle objective 1 with the following code in mind:
matchlist = table[‘Match list’]
targetlist = table[‘Target word’]
table[‘present’] = matchlist.isin(targetlist)
table[‘present’].map({Yes:1, No:0})
table.loc[table['present'] == 1]
I am not sure how to go about tackling objective 2. I have some disjointed lines of code.
Something like this –
list = table.loc[table[‘present’]== 0]
a = {}
Or like this –
for i in list:
if matchlist.str.contains(targetword[i][0])
But I am largely drawing a blank.
You’re being asked to perform set operations. Namely, first the intersection between match
and target
:
match = set(...)
target = set(...)
shared = match & target
which yields the list of "things" present in both sets.
For the second, start with all targets that you know are not shared and then iterate with those missing terms to find entries in match with a shared first character. If there are any, you add them to your "first character only" result set, if not, they go in the "yeah this can’t be matched at all" list:
missing = target - shared
first_only = set()
unmatched = list()
for term in missing:
found = {x for x in match if x[0] == term[0]}
if len(found) > 0:
first_only |= found
else:
unmatched.append(term)
Note that this is not the most efficient code, but simply an implementation we end up with if we follow each of the steps as described in the text as its own thing. Coming up with efficient code will be up to you.