Extract words from a string, creating a variable according to their exact order in the string

Question

I would like to print one or more words of "keywords" contained in text.

I would like to print them in the exact order they are written. So var1 will be Python, var2 will be Java, var3 will be Rust. I need to be able to handle these variables individually and separately. Maybe I need split(), not like this

If I try to print x, I get Java, Python, Rust (they’re not in order)

I need Python, Java, Rust, and the exact order should set automatically

How can i get this?

text     = "My favorite languages are Python and Java. I like less Rust"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "RUST", "JAVASCRIPT"] 

matches  = [x for x in keywords if x in text.upper()]

for x in matches:
    print("test x: ", x) #Java, Python, Rust
    var1= x
    var2= x
    var3= x

print(var1)
print(var2)
print(var3)

Asked By: aesh

||

Source

Answer 1

I think regular expression are the best way to go.

import re

text     = "My favorite languages are Python and Java. I like less Rust. Python meh"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT"]

regex = re.compile(r'|'.join(keywords), re.IGNORECASE)
print(regex.findall(text))

Output:

['Python', 'Java', 'Python']

This will print python twice, you hadn’t specified if that is desired behaviour.

Answered By: Victor Savenije

Answer 2

So, I’ve done some testing and I thing this will work easiest:

text = "My favorite languages are Python and Java. I like less Rust"

keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]

matches = [x for x in keywords if x in text.upper()]

for x in matches:

print("test x: ", x) #Java, Python, Rust

print(x)

Hope this helps!

Answered By: PD6152

Answer 3

Trying to keep my solution as similar as possible to yours, I think this is the best way (OFC not using other libraries).

text     = "My favorite languages are Python and Java. I like less Rust."
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"] 

# Normalize text with no "stops"
text = text.replace(".", "")

# Split Upper Text
text = text.upper().split()

# Iterate through Tw and if found in Kw append
matched = []
for w in text:
    for k in keywords:
        if w == k:
            matched.append(k)

var1, var2, var3 = matched
print(var1, var2, var3, sep="n")

If you want to create "dynamic vars" I don’t think you can, you must declare them.
For this reason, it’s better to have a list, and access/iterate to the value you need, when you need it.

I hope to have helped you

Edit:
It’s possible to assign variables dynamically but it’s not a "good to use"

vars = [f"var{i} ='{v}'" for i,v in enumerate(matched)]
for e in vars:
    exec(e)

print(var1)

Answered By: X1foideo

Answer 4

Since perfect match is searched for, I suggest a solution without re.
Though re is standard library – therefore one should use it.

class MyContainer:
    def __init__(self, keywords, ignore_case=True, remove_signs=True, return_key=False):
        self._keywords = keywords      # keep original input
        self.ignore_case = ignore_case
        self.remove_signs = remove_signs
        self.keywords = [self.prepare(x) for x in keywords]
        self.return_key = return_key
        self._keydict = {k: v for k, v in zip(self.keywords, self._keywords)}
    
    def __contains__(self, x):
        return self.prepare(x) in self.keywords
    
    def remove_sign(self, x):
        return x.rstrip("!.?;,- t"'")
    
    def prepare(self, x):
        if self.ignore_case:
            x = x.lower()
        if self.remove_signs:
            x = self.remove_sign(x)
        return x
    
    def findall(self, s):
        return [x if not self.return_key else self._keydict[self.prepare(x)] for x in s.split() if x in self]

# return what was found in text
kws = MyContainer(keywords, ignore_case=True, remove_signs=True, return_key=False)
kws.findall(text) 
## ['Python', 'Java.', 'Python']


# return as given in keywords
kws = MyContainer(keywords, ignore_case=True, remove_signs=True, return_key=True)
kws.findall(text) 
## ['PYTHON', 'JAVA', 'PYTHON']

Using the __init__ method, you can pre-prepare your keywords list depending on ignore_case and remove_signs.
the __contains__ dunder method is for easy use via the in operator.

With the flag arguments ignore_case, remove_signs, return_key, you can determine whether case should be ignored, signs at the right end removed and whether the word should be given as found in text or as specified in keywords, respectively.

Answered By: Gwang-Jin Kim

Answer 5

To your var1, var2, etc. problem, use multi-assignment in Python via destructuring:

var1, var2, var3 = matches

If you want to print them out:

for i, x in enumerate(matches):
    print(f"var{i} = {x}")

Answered By: Gwang-Jin Kim

Answer 6

@aesh

I skimmed the few answers above and I realized how quick people are to bring in an elephant to kill a mouse when a simple swatter is enough.

Below is a small concise and foldable script while trying to preserve a dynamic side.


import re

text     = "My favorite languages are Python and Java. I like less Rust and Javascript and my Python is a little rusty"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]


j = '|'.join(keywords)

r = re.compile(rf'b({j})b', re.IGNORECASE)
matches = r.findall(text)

d = dict()

for i, m in enumerate(matches):
    # if m not in d.values(): # To avoid duplicates, uncomment this line and indent the next one
    d["var_" + str(i)] = m   

for k, v in d.items():
    print(f'{k} = {v}')

Prints out

var_0 = Python
var_1 = Java
var_2 = Rust
var_3 = Javascript
var_4 = Python

Answered By: VenomCA

Extract words from a string, creating a variable according to their exact order in the string

Question:

Answers: