What is the Big O notation for the str.replace function in Python?

Question:

What is the big Oh notation for str.replace function in Python ?

Is it always O(n) ?

str = "this is string example"
print str.replace("is", "was")
thwas was string example
Asked By: Mohamad Zein

||

Answers:

Big O notation is calculated at worst-case scenario, and Python sources for worst case do just ‘find next position of substr, replace, and go further’.
One replacement does O(n) operations (copying the string).
One search, according to http://effbot.org/zone/stringlib.htm, in worst-case scenario does O(n*m) operations.
And since it can be up to n/m replacements, in total it should be surprisingly O(n*n).

Answered By: Nickolay Olshevsky

I coded up a test for what I believe is the worst case scenario – a string repeated over and over, and we’re replacing said string with another string. Because t/n levels off as n increases, worst case scenario seems empirically like it may be O(n). But I really can’t argue with @NickolayOlshevsky ‘s post.

import time
from matplotlib import pyplot as plt

x=[10]
while x[-1]<10**8:
    x.append(int(x[len(x)-1]*1.5))

y = [0]*len(x)

nst = 'abcd'
sst = 'abcd'

for ix,i in enumerate(x):
    s = ''.join([nst]*i)
    t = time.time()
    s = s.replace(sst,'efgh')
    y[ix] = time.time()-t

x = [a*len(nst) for a in x]

%matplotlib inline
fig, (ax1,ax2) = plt.subplots(2, sharex=True)
fig.set_size_inches(8, 6)
ax1.set_xscale('log')
ax1.set_yscale('log')
ax1.set_xlabel('n')
ax1.set_ylabel('t')
ax1.plot(x,y)
ax2.set_xscale('log')
ax2.set_yscale('log')
ax2.set_xlabel('n')
ax2.set_ylabel('t/n')
ax2.plot(x,[a/b for a,b in zip(x,y)])

n vs t

Answered By: Charlie Haley

While technically, in mathematics, Big O notation is used to capture best, average and worst case runtimes of a function…in the programming world, worst case or upper bound is generally what is being considered in regards to Big O runtime, because it is the greatest concern for scalability and because it’s brief. While this may be incorrect usage of the term or may seem vague, it’s just become a quicker way in the industry to address this concern or reference worst case.

If you look at the Python 3.11 source code
(Python replace method), you may see this has an average case of O(N) and a worst case of O(N+M), where N is the length of the string and M is number of occurrences of the substring we’re looking to replace. The method was updated in v3.10 to use heuristics to choose between the better of two search algorithms.
This post explains it in more detail. Note: it addresses str.replace() and several other string methods.

Answered By: skilbur
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.