Extract words from a string, creating a variable according to their exact order in the string
Question:
I would like to print one or more words of "keywords" contained in text.
I would like to print them in the exact order they are written. So var1
will be Python, var2
will be Java, var3
will be Rust. I need to be able to handle these variables individually and separately. Maybe I need split()
, not like this
If I try to print x
, I get Java, Python, Rust
(they’re not in order)
I need Python, Java, Rust
, and the exact order should set automatically
How can i get this?
text = "My favorite languages are Python and Java. I like less Rust"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "RUST", "JAVASCRIPT"]
matches = [x for x in keywords if x in text.upper()]
for x in matches:
print("test x: ", x) #Java, Python, Rust
var1= x
var2= x
var3= x
print(var1)
print(var2)
print(var3)
Answers:
I think regular expression are the best way to go.
import re
text = "My favorite languages are Python and Java. I like less Rust. Python meh"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT"]
regex = re.compile(r'|'.join(keywords), re.IGNORECASE)
print(regex.findall(text))
Output:
['Python', 'Java', 'Python']
This will print python twice, you hadn’t specified if that is desired behaviour.
So, I’ve done some testing and I thing this will work easiest:
text = "My favorite languages are Python and Java. I like less Rust"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]
matches = [x for x in keywords if x in text.upper()]
for x in matches:
print("test x: ", x) #Java, Python, Rust
print(x)
Hope this helps!
Trying to keep my solution as similar as possible to yours, I think this is the best way (OFC not using other libraries).
text = "My favorite languages are Python and Java. I like less Rust."
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]
# Normalize text with no "stops"
text = text.replace(".", "")
# Split Upper Text
text = text.upper().split()
# Iterate through Tw and if found in Kw append
matched = []
for w in text:
for k in keywords:
if w == k:
matched.append(k)
var1, var2, var3 = matched
print(var1, var2, var3, sep="n")
If you want to create "dynamic vars" I don’t think you can, you must declare them.
For this reason, it’s better to have a list, and access/iterate to the value you need, when you need it.
I hope to have helped you
Edit:
It’s possible to assign variables dynamically but it’s not a "good to use"
vars = [f"var{i} ='{v}'" for i,v in enumerate(matched)]
for e in vars:
exec(e)
print(var1)
Since perfect match is searched for, I suggest a solution without re
.
Though re
is standard library – therefore one should use it.
class MyContainer:
def __init__(self, keywords, ignore_case=True, remove_signs=True, return_key=False):
self._keywords = keywords # keep original input
self.ignore_case = ignore_case
self.remove_signs = remove_signs
self.keywords = [self.prepare(x) for x in keywords]
self.return_key = return_key
self._keydict = {k: v for k, v in zip(self.keywords, self._keywords)}
def __contains__(self, x):
return self.prepare(x) in self.keywords
def remove_sign(self, x):
return x.rstrip("!.?;,- t"'")
def prepare(self, x):
if self.ignore_case:
x = x.lower()
if self.remove_signs:
x = self.remove_sign(x)
return x
def findall(self, s):
return [x if not self.return_key else self._keydict[self.prepare(x)] for x in s.split() if x in self]
# return what was found in text
kws = MyContainer(keywords, ignore_case=True, remove_signs=True, return_key=False)
kws.findall(text)
## ['Python', 'Java.', 'Python']
# return as given in keywords
kws = MyContainer(keywords, ignore_case=True, remove_signs=True, return_key=True)
kws.findall(text)
## ['PYTHON', 'JAVA', 'PYTHON']
Using the __init__
method, you can pre-prepare your keywords list depending on ignore_case
and remove_signs
.
the __contains__
dunder method is for easy use via the in
operator.
With the flag arguments ignore_case
, remove_signs
, return_key
, you can determine whether case should be ignored, signs at the right end removed and whether the word should be given as found in text or as specified in keywords, respectively.
To your var1
, var2
, etc. problem, use multi-assignment in Python via destructuring:
var1, var2, var3 = matches
If you want to print them out:
for i, x in enumerate(matches):
print(f"var{i} = {x}")
@aesh
I skimmed the few answers above and I realized how quick people are to bring in an elephant to kill a mouse when a simple swatter is enough.
Below is a small concise and foldable script while trying to preserve a dynamic side.
import re
text = "My favorite languages are Python and Java. I like less Rust and Javascript and my Python is a little rusty"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]
j = '|'.join(keywords)
r = re.compile(rf'b({j})b', re.IGNORECASE)
matches = r.findall(text)
d = dict()
for i, m in enumerate(matches):
# if m not in d.values(): # To avoid duplicates, uncomment this line and indent the next one
d["var_" + str(i)] = m
for k, v in d.items():
print(f'{k} = {v}')
Prints out
var_0 = Python
var_1 = Java
var_2 = Rust
var_3 = Javascript
var_4 = Python
I would like to print one or more words of "keywords" contained in text.
I would like to print them in the exact order they are written. So var1
will be Python, var2
will be Java, var3
will be Rust. I need to be able to handle these variables individually and separately. Maybe I need split()
, not like this
If I try to print x
, I get Java, Python, Rust
(they’re not in order)
I need Python, Java, Rust
, and the exact order should set automatically
How can i get this?
text = "My favorite languages are Python and Java. I like less Rust"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "RUST", "JAVASCRIPT"]
matches = [x for x in keywords if x in text.upper()]
for x in matches:
print("test x: ", x) #Java, Python, Rust
var1= x
var2= x
var3= x
print(var1)
print(var2)
print(var3)
I think regular expression are the best way to go.
import re
text = "My favorite languages are Python and Java. I like less Rust. Python meh"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT"]
regex = re.compile(r'|'.join(keywords), re.IGNORECASE)
print(regex.findall(text))
Output:
['Python', 'Java', 'Python']
This will print python twice, you hadn’t specified if that is desired behaviour.
So, I’ve done some testing and I thing this will work easiest:
text = "My favorite languages are Python and Java. I like less Rust"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]
matches = [x for x in keywords if x in text.upper()]
for x in matches:
print("test x: ", x) #Java, Python, Rust
print(x)
Hope this helps!
Trying to keep my solution as similar as possible to yours, I think this is the best way (OFC not using other libraries).
text = "My favorite languages are Python and Java. I like less Rust."
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]
# Normalize text with no "stops"
text = text.replace(".", "")
# Split Upper Text
text = text.upper().split()
# Iterate through Tw and if found in Kw append
matched = []
for w in text:
for k in keywords:
if w == k:
matched.append(k)
var1, var2, var3 = matched
print(var1, var2, var3, sep="n")
If you want to create "dynamic vars" I don’t think you can, you must declare them.
For this reason, it’s better to have a list, and access/iterate to the value you need, when you need it.
I hope to have helped you
Edit:
It’s possible to assign variables dynamically but it’s not a "good to use"
vars = [f"var{i} ='{v}'" for i,v in enumerate(matched)]
for e in vars:
exec(e)
print(var1)
Since perfect match is searched for, I suggest a solution without re
.
Though re
is standard library – therefore one should use it.
class MyContainer:
def __init__(self, keywords, ignore_case=True, remove_signs=True, return_key=False):
self._keywords = keywords # keep original input
self.ignore_case = ignore_case
self.remove_signs = remove_signs
self.keywords = [self.prepare(x) for x in keywords]
self.return_key = return_key
self._keydict = {k: v for k, v in zip(self.keywords, self._keywords)}
def __contains__(self, x):
return self.prepare(x) in self.keywords
def remove_sign(self, x):
return x.rstrip("!.?;,- t"'")
def prepare(self, x):
if self.ignore_case:
x = x.lower()
if self.remove_signs:
x = self.remove_sign(x)
return x
def findall(self, s):
return [x if not self.return_key else self._keydict[self.prepare(x)] for x in s.split() if x in self]
# return what was found in text
kws = MyContainer(keywords, ignore_case=True, remove_signs=True, return_key=False)
kws.findall(text)
## ['Python', 'Java.', 'Python']
# return as given in keywords
kws = MyContainer(keywords, ignore_case=True, remove_signs=True, return_key=True)
kws.findall(text)
## ['PYTHON', 'JAVA', 'PYTHON']
Using the __init__
method, you can pre-prepare your keywords list depending on ignore_case
and remove_signs
.
the __contains__
dunder method is for easy use via the in
operator.
With the flag arguments ignore_case
, remove_signs
, return_key
, you can determine whether case should be ignored, signs at the right end removed and whether the word should be given as found in text or as specified in keywords, respectively.
To your var1
, var2
, etc. problem, use multi-assignment in Python via destructuring:
var1, var2, var3 = matches
If you want to print them out:
for i, x in enumerate(matches):
print(f"var{i} = {x}")
@aesh
I skimmed the few answers above and I realized how quick people are to bring in an elephant to kill a mouse when a simple swatter is enough.
Below is a small concise and foldable script while trying to preserve a dynamic side.
import re
text = "My favorite languages are Python and Java. I like less Rust and Javascript and my Python is a little rusty"
keywords = ["C#", "JAVA", "PHP", "PYTHON", "JAVASCRIPT", "RUST"]
j = '|'.join(keywords)
r = re.compile(rf'b({j})b', re.IGNORECASE)
matches = r.findall(text)
d = dict()
for i, m in enumerate(matches):
# if m not in d.values(): # To avoid duplicates, uncomment this line and indent the next one
d["var_" + str(i)] = m
for k, v in d.items():
print(f'{k} = {v}')
Prints out
var_0 = Python
var_1 = Java
var_2 = Rust
var_3 = Javascript
var_4 = Python