How to remove specific substrings from a set of strings in Python?
Question:
I have a set of strings and all the strings have one of two specific substrings which I want to remove:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
I want the ".good
" and ".bad
" substrings removed from all the strings. I tried this:
for x in set1:
x.replace('.good', '')
x.replace('.bad', '')
but it doesn’t seem to work, set1
stays exactly the same. I tried using for x in list(set1)
instead but that doesn’t change anything.
Answers:
>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace
doesn’t change the string, it returns a copy of the string with the replacement. You can’t change the string directly because strings are immutable.
You need to take the return values from x.replace
and put them in a new set.
Strings are immutable. str.replace
creates a new string. This is stated in the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. […]
This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
P.S. if you want to change the prefix or suffix of a string and you’re using Python 3.9 or newer, use str.removeprefix()
or str.removesuffix()
instead:
new_set = {x.removesuffix('.good').removesuffix('.bad') for x in set1}
You could do this:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('.good$', '', x)
x = re.sub('.bad$', '', x)
print(x)
I did the test (but it is not your example) and the data does not return them orderly or complete
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']
If list
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub
is a patter that you do not wish to have in a list of lines LinSplitUnOr
for example
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A
will be
All you need is a bit of black magic!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']
When there are multiple substrings to remove, one simple and effective option is to use re.sub
with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|
) pipe.
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']
In Python 3.9+ you could remove the suffix using str.removesuffix('mysuffix')
. From the docs:
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]
. Otherwise, return a copy of the original string
So you can either create a new empty set and add each element without the suffix to it:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
set2 = set()
for s in set1:
set2.add(s.removesuffix(".good").removesuffix(".bad"))
Or create the new set using a set comprehension:
set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
print(set2)
Output:
{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}
# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)
print : Amin Is A Programmer
I have a set of strings and all the strings have one of two specific substrings which I want to remove:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
I want the ".good
" and ".bad
" substrings removed from all the strings. I tried this:
for x in set1:
x.replace('.good', '')
x.replace('.bad', '')
but it doesn’t seem to work, set1
stays exactly the same. I tried using for x in list(set1)
instead but that doesn’t change anything.
>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace
doesn’t change the string, it returns a copy of the string with the replacement. You can’t change the string directly because strings are immutable.
You need to take the return values from x.replace
and put them in a new set.
Strings are immutable. str.replace
creates a new string. This is stated in the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. […]
This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
P.S. if you want to change the prefix or suffix of a string and you’re using Python 3.9 or newer, use str.removeprefix()
or str.removesuffix()
instead:
new_set = {x.removesuffix('.good').removesuffix('.bad') for x in set1}
You could do this:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('.good$', '', x)
x = re.sub('.bad$', '', x)
print(x)
I did the test (but it is not your example) and the data does not return them orderly or complete
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']
If list
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub
is a patter that you do not wish to have in a list of lines LinSplitUnOr
for example
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A
will be
All you need is a bit of black magic!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']
When there are multiple substrings to remove, one simple and effective option is to use re.sub
with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|
) pipe.
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']
In Python 3.9+ you could remove the suffix using str.removesuffix('mysuffix')
. From the docs:
If the string ends with the suffix string and that suffix is not empty, return
string[:-len(suffix)]
. Otherwise, return a copy of the original string
So you can either create a new empty set and add each element without the suffix to it:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
set2 = set()
for s in set1:
set2.add(s.removesuffix(".good").removesuffix(".bad"))
Or create the new set using a set comprehension:
set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
print(set2)
Output:
{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}
# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)
print : Amin Is A Programmer