Removing substring from string in Python
Question:
I want to write a function that takes 2 inputs: a string and a substring, then the function will remove that part of the substring from the string.
def remove_substring(s, substr):
"""(str, str) -> NoneType
Returns string without string substr
remove_substring_from_string("Im in cs", "cs")
Im in
"""
other_s = ''
for substr in s:
if substr in s:
continue
How do I continue on from here? Assuming my logic is sound.
Answers:
Avoiding the use of Python functions.
Method 1
def remove_substring_from_string(s, substr):
'''
find start index in s of substring
remove it by skipping over it
'''
i = 0
while i < len(s) - len(substr) + 1:
# Check if substring starts at i
if s[i:i+len(substr)] == substr:
break
i += 1
else:
# break not hit, so substr not found
return s
# break hit
return s[:i] + s[i+len(substr):]
Method 2
If the range function can be used, the above can be written more compactly as follows.
def remove_substring_from_string(s, substr):
'''
find start index in s of substring
remove it by skipping over it
'''
for i in range(len(s) - len(substr) + 1):
if s[i:i+len(substr)] == substr:
break
else:
# break not hit, so substr not found
return s
return s[:i] + s[i+len(substr):]
Test
print(remove_substring_from_string("I have nothing to declare except my genuis", " except my genuis"))
# Output: I have nothing to declare'
This approach is based on the KMP algorithm:
def KMP(s):
n = len(s)
pi = [0 for _ in range(n)]
for i in range(1, n):
j = pi[i - 1]
while j > 0 and s[i] != s[j]:
j = pi[j - 1]
if s[i] == s[j]:
j += 1
pi[i] = j
return pi
# Removes all occurences of t in s
def remove_substring_from_string(s, t):
n = len(s)
m = len(t)
# Calculate the prefix function using KMP
pi = KMP(t + 'x00' + s)[m + 1:]
r = ""
i = 0
while i + m - 1 < n: # Before the remaining string is smaller than the substring
if pi[i + m - 1] == m: # If the substring is here, skip it
i += m
else: # Otherwise, add the current character and move to the next
r += s[i]
i += 1
# Add the remaining string
r += s[i:]
return r
It runs in O(|s| + |t|), but it has a few downsides:
- The code is long and unintuitive.
- It requires that there is no null (
x00
) in the input strings.
- Its constant factors are pretty bad for short s and t.
- It doesn’t handle overlapping strings how you might want it:
remove_substring_from_string("aaa", "aa")
will return "a"
. The only guarantee made is that t in remove_substring_from_string(s, t)
is False
for any two strings s and t.
A C++ example and further explanation for the KMP algorithm can be found here. The remove_substring_from_string
function then only checks if the entire substring is matched at each position and if so, skips over the substring.
I would do this using re
.
import re
def remove_substring(s, substr):
# type: (str, str) -> str
return re.subn(substr, '', s)[0]
remove_substring('I am in cs', 'cs')
# 'I am in '
remove_substring('This also removes multiple substr that are found. Even if that substr is repeated like substrsubstrsubstr', 'substr')
# 'This also removes multiple that are found. Even if that is repeated like '
def remove_substring(s, substr):
while s != "":
if substr in s:
s = s.replace(substr, "")
else:
return s
if s == "":
return "Empty String"
The idea here is that we replace all occurrences of substr within s, by replacing the first instance of substr and then looping until we are done.
I want to write a function that takes 2 inputs: a string and a substring, then the function will remove that part of the substring from the string.
def remove_substring(s, substr):
"""(str, str) -> NoneType
Returns string without string substr
remove_substring_from_string("Im in cs", "cs")
Im in
"""
other_s = ''
for substr in s:
if substr in s:
continue
How do I continue on from here? Assuming my logic is sound.
Avoiding the use of Python functions.
Method 1
def remove_substring_from_string(s, substr):
'''
find start index in s of substring
remove it by skipping over it
'''
i = 0
while i < len(s) - len(substr) + 1:
# Check if substring starts at i
if s[i:i+len(substr)] == substr:
break
i += 1
else:
# break not hit, so substr not found
return s
# break hit
return s[:i] + s[i+len(substr):]
Method 2
If the range function can be used, the above can be written more compactly as follows.
def remove_substring_from_string(s, substr):
'''
find start index in s of substring
remove it by skipping over it
'''
for i in range(len(s) - len(substr) + 1):
if s[i:i+len(substr)] == substr:
break
else:
# break not hit, so substr not found
return s
return s[:i] + s[i+len(substr):]
Test
print(remove_substring_from_string("I have nothing to declare except my genuis", " except my genuis"))
# Output: I have nothing to declare'
This approach is based on the KMP algorithm:
def KMP(s):
n = len(s)
pi = [0 for _ in range(n)]
for i in range(1, n):
j = pi[i - 1]
while j > 0 and s[i] != s[j]:
j = pi[j - 1]
if s[i] == s[j]:
j += 1
pi[i] = j
return pi
# Removes all occurences of t in s
def remove_substring_from_string(s, t):
n = len(s)
m = len(t)
# Calculate the prefix function using KMP
pi = KMP(t + 'x00' + s)[m + 1:]
r = ""
i = 0
while i + m - 1 < n: # Before the remaining string is smaller than the substring
if pi[i + m - 1] == m: # If the substring is here, skip it
i += m
else: # Otherwise, add the current character and move to the next
r += s[i]
i += 1
# Add the remaining string
r += s[i:]
return r
It runs in O(|s| + |t|), but it has a few downsides:
- The code is long and unintuitive.
- It requires that there is no null (
x00
) in the input strings. - Its constant factors are pretty bad for short s and t.
- It doesn’t handle overlapping strings how you might want it:
remove_substring_from_string("aaa", "aa")
will return"a"
. The only guarantee made is thatt in remove_substring_from_string(s, t)
isFalse
for any two strings s and t.
A C++ example and further explanation for the KMP algorithm can be found here. The remove_substring_from_string
function then only checks if the entire substring is matched at each position and if so, skips over the substring.
I would do this using re
.
import re
def remove_substring(s, substr):
# type: (str, str) -> str
return re.subn(substr, '', s)[0]
remove_substring('I am in cs', 'cs')
# 'I am in '
remove_substring('This also removes multiple substr that are found. Even if that substr is repeated like substrsubstrsubstr', 'substr')
# 'This also removes multiple that are found. Even if that is repeated like '
def remove_substring(s, substr):
while s != "":
if substr in s:
s = s.replace(substr, "")
else:
return s
if s == "":
return "Empty String"
The idea here is that we replace all occurrences of substr within s, by replacing the first instance of substr and then looping until we are done.