How do I append the index of a string to a list using extend() in python?
Question:
I’m trying to look through a long string to find instances of a substring, then I want to create a list that has the index of each substring found and the substring found. But instead of the index in a readable form, I’m getting a reference to the object, such as [<built-in method index of str object at 0x000001687B01E930>, 'b']
. I’d rather have [123, 'b']
.
Here’s the code I’ve tried:
test_string = "abcdefg"
look_for = ["b","f"]
result = []
for each in test_string:
if each in look_for:
result.extend([each.index, each])
print(result)
I know I could do this with a list comprehension, but I plan to add a bunch of other code to this for
later and am only asking about the index issue here.
I’ve tried str(each.index)
and print(str(result))
But that doesn’t help. What am I missing?
Answers:
You can use enumerate :
test_string = "abcdefg"
look_for = ["b", "f"]
result = []
for index, each in enumerate(test_string):
if each in look_for:
result.extend([index, each])
print(result)
Here are 2 ways you can achieve it, either using .index() or enumerate()
test_string = "abcdefg"
look_for = ["b","f"]
result = []
for i, each in enumerate(test_string):
if each in look_for:
result.append((i, each))
print(result)
result = []
for i in look_for:
try:
result.append((i, test_string.index(i)))
except:
pass
print(result)
This is not an answer about using .extend()
.
Your specification says find all substrings, but only uses a test for a single character. The following code will handle longer substrings, and do it with a single pass using regular expressions. This is a significant performance speed up as the number of substrings increases. Regular expressions are our friends!
import re
def find_all_substrings(target, substrings):
# regex to scan for all substrings in one mass
pattern = "|".join(map(re.escape, substrings))
# Scan once for all strings
matches = re.finditer(pattern, target)
# Get start position and matched string
results = [(match.start(),match.group()) for match in matches]
return results
print(find_all_substrings("I have an apple, a banana, and a cherry in my bag. Also, another apple.", ['apple', 'banana', 'cherry']))
test_string = "abcdefg"
look_for = ["b", "f"]
print(find_all_substrings(test_string, look_for))
giving the following output:
[(10, 'apple'), (19, 'banana'), (33, 'cherry'), (65, 'apple')]
[(1, 'b'), (5, 'f')]
If you are determined to use a call to .extend(), possibly to allow modifying the results on the fly, you could use:
import re
def find_all_substrings_generator(target, substrings):
# REGEX to scan for all substrings in one mass
pattern = "|".join(map(re.escape, substrings))
matches = re.finditer(pattern, target)
# Get start position and matched string
for match in matches:
yield (match.start(), match.group())
results = []
test_string = "abcdefg"
look_for = ["b", "f"]
for i, result in enumerate(find_all_substrings_generator(test_string, look_for)):
print(f"Result {i}: {result}")
results.extend([result])
print(results)
With the output:
Result 0: (1, 'b')
Result 1: (5, 'f')
[(1, 'b'), (5, 'f')]
I’m trying to look through a long string to find instances of a substring, then I want to create a list that has the index of each substring found and the substring found. But instead of the index in a readable form, I’m getting a reference to the object, such as [<built-in method index of str object at 0x000001687B01E930>, 'b']
. I’d rather have [123, 'b']
.
Here’s the code I’ve tried:
test_string = "abcdefg"
look_for = ["b","f"]
result = []
for each in test_string:
if each in look_for:
result.extend([each.index, each])
print(result)
I know I could do this with a list comprehension, but I plan to add a bunch of other code to this for
later and am only asking about the index issue here.
I’ve tried str(each.index)
and print(str(result))
But that doesn’t help. What am I missing?
You can use enumerate :
test_string = "abcdefg"
look_for = ["b", "f"]
result = []
for index, each in enumerate(test_string):
if each in look_for:
result.extend([index, each])
print(result)
Here are 2 ways you can achieve it, either using .index() or enumerate()
test_string = "abcdefg"
look_for = ["b","f"]
result = []
for i, each in enumerate(test_string):
if each in look_for:
result.append((i, each))
print(result)
result = []
for i in look_for:
try:
result.append((i, test_string.index(i)))
except:
pass
print(result)
This is not an answer about using .extend()
.
Your specification says find all substrings, but only uses a test for a single character. The following code will handle longer substrings, and do it with a single pass using regular expressions. This is a significant performance speed up as the number of substrings increases. Regular expressions are our friends!
import re
def find_all_substrings(target, substrings):
# regex to scan for all substrings in one mass
pattern = "|".join(map(re.escape, substrings))
# Scan once for all strings
matches = re.finditer(pattern, target)
# Get start position and matched string
results = [(match.start(),match.group()) for match in matches]
return results
print(find_all_substrings("I have an apple, a banana, and a cherry in my bag. Also, another apple.", ['apple', 'banana', 'cherry']))
test_string = "abcdefg"
look_for = ["b", "f"]
print(find_all_substrings(test_string, look_for))
giving the following output:
[(10, 'apple'), (19, 'banana'), (33, 'cherry'), (65, 'apple')]
[(1, 'b'), (5, 'f')]
If you are determined to use a call to .extend(), possibly to allow modifying the results on the fly, you could use:
import re
def find_all_substrings_generator(target, substrings):
# REGEX to scan for all substrings in one mass
pattern = "|".join(map(re.escape, substrings))
matches = re.finditer(pattern, target)
# Get start position and matched string
for match in matches:
yield (match.start(), match.group())
results = []
test_string = "abcdefg"
look_for = ["b", "f"]
for i, result in enumerate(find_all_substrings_generator(test_string, look_for)):
print(f"Result {i}: {result}")
results.extend([result])
print(results)
With the output:
Result 0: (1, 'b')
Result 1: (5, 'f')
[(1, 'b'), (5, 'f')]