Using the 'in' operand to check if string contains a keyword with and without whitespace

Question:

I wrote this code to sort out a new taglist, by checking if a description(string) contains a specific keyword.

For example

tagslist=['LED_AuraSync', 'LED_ARGB', 'LED_RGB', 'LED_Blue...',]
description=('Arctic Freezer 50 Dual Tower ARGB Heatsink ...')

tagged=[]
for tags in tagslist:
    splitted=tags.split('_')[1]
    if (splitted) in description:
        tagged.append(splitted)

print(tagged)

This will add ARGB and RGB to the ‘tagged’ list as well, which is wrong as the description only in reality contains ARGB only, however if i add whitespace before and after the ‘splitted’ variable with the ‘in’ operand, it works correctly

if (' '+splitted+' ') in description:

But I don’t understand why it works, could someone explain?

Asked By: CryptoTex

||

Answers:

TL; DR

tagslist = [
    "LED_AuraSync",
    "LED_ARGB",
    "LED_RGB",
    "LED_Blue...",
]
description = "Arctic Freezer 50 Dual Tower ARGB Heatsink ...".split()

tagged = [
    splitted for tag in tagslist if (splitted := tag.split("_")[1]) in description
]

print(tagged)

As a list comprehenshion

Why it does not work in the first place? (Why it works later)

The key part to this answer is that in operator matches any part in the string, doesn’t matter if you want to match word by word.
So, "RGB" in "ARGB" would have been computed to True.

But if you split the description by whitespace (turns it into a list of strings), and use the in operator, it would work because it’s compareing each string in the list and see if they are the same as the given, instead of matching a substring from it.

By using " " + splitted + " " in description, you essentially matched splitted with whitespace around it, so when it is the iteration of "RGB", it is actually investivating if " RGB " is in the description. And it is not, so it is not appended to the list.

A little more into the comprehenshion

I’m guessing here that op does not have much experience with python, so I will give a little explaination here 🙂

That list comprehenshion,

tagged = [
    splitted for tag in tagslist if (splitted := tag.split("_")[1]) in description
]

is essentially (not fully) equvilent to the following:

tagged = []
for tag in tagslist:
  splitted = tag.split("_")
  if splitted in description:
    tagged.append(splitted)

where the warlus operator := assigned tag.split("_")[1] to splitted in the comprehension to make it compute only once.

An alternative way to write it would be

tagged = [
   tag.split("_")[1]
   for tag in tagslist 
   if tag.split("_")[1] in description
]

but it will compute tag.split("_")[1] twice.

Answered By: Fed_Dragon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.