Problems matching chinese characters in a table

Question:

I am currently having difficulty with a matchlist task.

Two columns of words

Objective 1:
Target words that are present – exact match – in the ‘match list’ should be sorted into one group.

Objective 2:
For target words that are not present in the match list, locate all entries in the match list that has the first character of the target word. For instance, there isn’t 带着 in the match list, hence the code proceeds to search for any words in the match list with the first character 带 – which would be 带走.

  1. If successful, the target word and the located words in the match list should be sorted into a second group (e.g. lists, tuple).
  2. If no such entry exists, the target word should be saved in a third group.

I am able to tackle objective 1 with the following code in mind:

matchlist = table[‘Match list’]
targetlist = table[‘Target word’]
table[‘present’] = matchlist.isin(targetlist)
table[‘present’].map({Yes:1, No:0})
table.loc[table['present'] == 1]

I am not sure how to go about tackling objective 2. I have some disjointed lines of code.

Something like this –

list = table.loc[table[‘present’]== 0]
a = {}

Or like this –

 for i in list:
    if matchlist.str.contains(targetword[i][0]) 

But I am largely drawing a blank.

Asked By: Apples

||

Answers:

You’re being asked to perform set operations. Namely, first the intersection between match and target:

match = set(...)
target = set(...)
shared = match & target

which yields the list of "things" present in both sets.

For the second, start with all targets that you know are not shared and then iterate with those missing terms to find entries in match with a shared first character. If there are any, you add them to your "first character only" result set, if not, they go in the "yeah this can’t be matched at all" list:

missing = target - shared
first_only = set()
unmatched = list()

for term in missing:
    found = {x for x in match if x[0] == term[0]}
    if len(found) > 0:
      first_only |= found
    else:
      unmatched.append(term)

Note that this is not the most efficient code, but simply an implementation we end up with if we follow each of the steps as described in the text as its own thing. Coming up with efficient code will be up to you.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.