When is StringIO used, as opposed to joining a list of strings?

Question:

Using StringIO as string buffer is slower than using list as buffer.

When is StringIO used?

from io import StringIO


def meth1(string):
    a = []
    for i in range(100):
        a.append(string)
    return ''.join(a)

def meth2(string):
    a = StringIO()
    for i in range(100):
        a.write(string)
    return a.getvalue()


if __name__ == '__main__':
    from timeit import Timer
    string = "This is test string"
    print(Timer("meth1(string)", "from __main__ import meth1, string").timeit())
    print(Timer("meth2(string)", "from __main__ import meth2, string").timeit())

Results:

16.7872819901
18.7160351276
Asked By: simha

||

Answers:

If you measure for speed, you should use cStringIO.

From the docs:

The module cStringIO provides an
interface similar to that of the
StringIO module. Heavy use of
StringIO.StringIO objects can be made
more efficient by using the function
StringIO() from this module instead.

But the point of StringIO is to be a file-like object, for when something expects such and you don’t want to use actual files.

Edit: I noticed you use from io import StringIO, so you are probably on Python >= 3 or at least 2.6. The separate StringIO and cStringIO are gone in Py3. Not sure what implementation they used to provide the io.StringIO. There is io.BytesIO too.

Answered By: plundra

The main advantage of StringIO is that it can be used where a file was expected. So you can do for example (for Python 2):

import sys
import StringIO

out = StringIO.StringIO()
sys.stdout = out
print "hi, I'm going out"
sys.stdout = sys.__stdout__
print out.getvalue()
Answered By: TryPyPy

Well, I don’t know if I would like to call that using it as a “buffer”, you are just multiplying a string a 100 times, in two complicated ways. Here is an uncomplicated way:

def meth3(string):
    return string * 100

If we add that to your test:

if __name__ == '__main__':

    from timeit import Timer
    string = "This is test string"
    # Make sure it all does the same:
    assert(meth1(string) == meth3(string))
    assert(meth2(string) == meth3(string))
    print(Timer("meth1(string)", "from __main__ import meth1, string").timeit())
    print(Timer("meth2(string)", "from __main__ import meth2, string").timeit())
    print(Timer("meth3(string)", "from __main__ import meth3, string").timeit())

It turns out to be way faster as a bonus:

21.0300650597
22.4869811535
0.811429977417

If you want to create a bunch of strings, and then join them, meth1() is the correct way. There is no point in writing it to StringIO, which is something completely different, namely a string with a file-like stream interface.

Answered By: Lennart Regebro

Another approach based on Lennart Regebro approach.
This is faster than list method (meth1)

def meth4(string):
    a = StringIO(string * 100)
    contents = a.getvalue()
    a.close()
    return contents

if __name__ == '__main__':
    from timeit import Timer
    string = "This is test string"
    print(Timer("meth1(string)", "from __main__ import meth1, string").timeit())
    print(Timer("meth2(string)", "from __main__ import meth2, string").timeit())
    print(Timer("meth3(string)", "from __main__ import meth3, string").timeit())
    print(Timer("meth4(string)", "from __main__ import meth4, string").timeit())

Results (sec.):

meth1 = 7.731315963647944

meth2 = 9.609279402186985

meth3 = 0.26534052061106195

meth4 = 2.915035489152274

Answered By: Jagadeesh Sali
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.