Replacing characters of a string by a given string
Question:
Given this string 'www__ww_www_'
I need to replace all the '_'
characters with characters from the following string '1234'
. The result should be 'www12ww3www4'
.
TEXT = 'aio__oo_ecc_'
INSERT = '1234'
insert = list(INSERT)
ret = ''
for char in TEXT:
if char == '_':
ret += insert[0]
insert.pop(0)
else:
ret += char
print (ret)
>> aio12oo3ecc4
What is the right way to do this? Because this seems like the most inefficient way.
Answers:
Consider splitting the pattern string by the underscore and zipping it with the string of inserts:
TEXT = 'aio__oo_ecc_a' # '_a' added to illustrate the need for zip_longest
from itertools import zip_longest, chain
''.join(chain.from_iterable(zip_longest(TEXT.split('_'), INSERT, fillvalue='')))
#'aio12oo3ecc4a'
zip_longest
is used instead of the “normal” zip
to make sure the last fragment of the pattern, if any, is not lost.
A step-by-step exploration:
pieces = TEXT.split('_')
# ['aio', '', 'oo', 'ecc', 'a']
mix = zip_longest(pieces, INSERT, fillvalue='')
# [('aio', '1'), ('', '2'), ('oo', '3'), ('ecc', '4'), ('a', '')]
flat_mix = chain.from_iterable(mix)
# ['aio', '1', '', '2', 'oo', '3', 'ecc', '4', 'a', '']
result = ''.join(flat_mix)
Speed comparison:
- This solution: 1.32 µs ± 9.08 ns per loop
- Iterator + ternary + list comprehension: 1.77 µs ± 20.8 ns per loop
- Original solution: 2 µs ± 13.2 ns per loop
- The loop + regex solution: 3.66 µs ± 103 ns per loop
As pointed in the comments, you can use the str.replace
directly:
for c in INSERT:
TEXT = TEXT.replace('_', c, 1)
You can use also the regex replace for that:
import re
for c in INSERT:
TEXT = re.sub('_', c, TEXT, 1)
You can loop over the TEXT
using a list comprehension that uses a ternary to select from an INSERT
iterator or from the current element in TEXT
:
>>> TEXT = 'aio__oo_ecc_'
>>> INSERT = '1234'
>>> it = iter(INSERT)
>>> "".join([next(it) if x == "_" else x for x in TEXT])
'aio12oo3ecc4'
The benefits include avoiding Shlemiel the Painter’s Algorithm with ret += char
. Also, pop(0)
requires the whole list to be shifted forward, so it’s linear (better would be reversing INSERT
and using pop()
).
In response to some of the comments here, list comprehensions tend to be faster than generators when the whole iterable will be consumed on the spot.
You can use an iterator in a replacement function for re.sub
:
import re
TEXT = 'aio__oo_ecc_'
INSERT = '1234'
i = iter(INSERT)
print(re.sub('_', lambda _: next(i), TEXT))
This outputs:
aio12oo3ecc4
Given this string 'www__ww_www_'
I need to replace all the '_'
characters with characters from the following string '1234'
. The result should be 'www12ww3www4'
.
TEXT = 'aio__oo_ecc_'
INSERT = '1234'
insert = list(INSERT)
ret = ''
for char in TEXT:
if char == '_':
ret += insert[0]
insert.pop(0)
else:
ret += char
print (ret)
>> aio12oo3ecc4
What is the right way to do this? Because this seems like the most inefficient way.
Consider splitting the pattern string by the underscore and zipping it with the string of inserts:
TEXT = 'aio__oo_ecc_a' # '_a' added to illustrate the need for zip_longest
from itertools import zip_longest, chain
''.join(chain.from_iterable(zip_longest(TEXT.split('_'), INSERT, fillvalue='')))
#'aio12oo3ecc4a'
zip_longest
is used instead of the “normal” zip
to make sure the last fragment of the pattern, if any, is not lost.
A step-by-step exploration:
pieces = TEXT.split('_')
# ['aio', '', 'oo', 'ecc', 'a']
mix = zip_longest(pieces, INSERT, fillvalue='')
# [('aio', '1'), ('', '2'), ('oo', '3'), ('ecc', '4'), ('a', '')]
flat_mix = chain.from_iterable(mix)
# ['aio', '1', '', '2', 'oo', '3', 'ecc', '4', 'a', '']
result = ''.join(flat_mix)
Speed comparison:
- This solution: 1.32 µs ± 9.08 ns per loop
- Iterator + ternary + list comprehension: 1.77 µs ± 20.8 ns per loop
- Original solution: 2 µs ± 13.2 ns per loop
- The loop + regex solution: 3.66 µs ± 103 ns per loop
As pointed in the comments, you can use the str.replace
directly:
for c in INSERT:
TEXT = TEXT.replace('_', c, 1)
You can use also the regex replace for that:
import re
for c in INSERT:
TEXT = re.sub('_', c, TEXT, 1)
You can loop over the TEXT
using a list comprehension that uses a ternary to select from an INSERT
iterator or from the current element in TEXT
:
>>> TEXT = 'aio__oo_ecc_'
>>> INSERT = '1234'
>>> it = iter(INSERT)
>>> "".join([next(it) if x == "_" else x for x in TEXT])
'aio12oo3ecc4'
The benefits include avoiding Shlemiel the Painter’s Algorithm with ret += char
. Also, pop(0)
requires the whole list to be shifted forward, so it’s linear (better would be reversing INSERT
and using pop()
).
In response to some of the comments here, list comprehensions tend to be faster than generators when the whole iterable will be consumed on the spot.
You can use an iterator in a replacement function for re.sub
:
import re
TEXT = 'aio__oo_ecc_'
INSERT = '1234'
i = iter(INSERT)
print(re.sub('_', lambda _: next(i), TEXT))
This outputs:
aio12oo3ecc4