Python: Indent all lines of a string except first while preserving linebreaks?
Question:
I want to indent all lines of a multi-line string except the first, without wrapping the text.
For example, I want to turn:
A very very very very very very very very very very very very very very very very
long mutiline
string
into:
A very very very very very very very very very very very very very very very very
long multiline
string
I have tried
textwrap.fill(string, width=999999999999, subsequent_indent=' ',)
But this still puts all of the text on one line. Thoughts?
Answers:
Do you mean something like this:
In [21]: s = 'abcndefnxyz'
In [22]: print s
abc
def
xyz
In [23]: print 'n '.join(s.split('n'))
abc
def
xyz
?
edit: Alternatively (HT @Steven Rumbalski):
In [24]: print s.replace('n', 'n ')
abc
def
xyz
You just need to replace the newline character 'n'
with a new line character plus the white spaces 'n '
and save it to a variable (since replace
won’t change your original string, but return a new one with the replacements).
string = string.replace('n', 'n ')
The bare replace mentioned by @steven-rumbalski is going to be the most efficient way to accomplish this, but it’s not the only way.
Here’s another solution using list comprehensions. If the text has already been split into a list of lines, this will be much faster than running join()
, replace()
and splitlines()
text = """A very very very very very very very very very very very very very very very very
long mutiline
string"""
lines = text.splitlines()
indented = [' ' + l for l in lines]
indented[0] = lines[0]
indented = 'n'.join(indented)
The list could be modified in place, but there’s a significant performance cost versus using a second variable. It’s also moderately faster to indent all the lines and then swap out the first line in another operation.
There’s also the textwrap
module. I disagree that using textwrap for indentation is unpythonic. If the lines are joined in a single string containing newlines, that string is inherently wrapped. Indentation is a logical extension of text wrapping, so textwrap makes sense to me.
Except that it’s slow. Really, really slow. Like 15x slower.
Python 3 added indent
to textwrap
which makes indenting without re-wrapping very easy. There’s certainly a more elegant way of handling the lambda predicate, but this does exactly what the original question was asking for.
indented = textwrap.indent(text, ' ', lambda x: not text.splitlines()[0] in x )
Here are some timeit
results of the various methods.
>>> timeit.timeit(r"text.replace('n', 'n ')", setup='text = """%s"""' % text)
0.5123521030182019
The two list comprehension solutions:
>>> timeit.timeit(r"indented = [' ' + i for i in lines]; indented[0] = lines[0]", setup='lines = """%s""".splitlines()' % text)
0.7037646849639714
>>> timeit.timeit(r"indented = [lines[0]] + [' ' + i for i in lines[1:]]", setup='lines = """%s""".splitlines()' % text)
1.0310905870283023
And here’s the unfortunate textwrap
result:
>>> timeit.timeit(r"textwrap.indent(text, ' ', lambda x: not text.splitlines()[0] in x )", setup='import textwrap; text = """%s"""' % text)
7.7950868209591135
I thought some of that time could be the horribly inefficient predicate, but even with that removed, textwrap.indent
is still more than 8 times slower than a bare replace.
>>> timeit.timeit(r"textwrap.indent(text, ' ')", setup='import textwrap; text = """%s"""' % text)
4.266149697010405
In Python 3.3 (introduced indent
) and later, you can do this:
from textwrap import indent
def not_first():
"""Creates a function returning False only the first time."""
_first_time_call = True
def fn(_) -> bool:
nonlocal _first_time_call
res = not _first_time_call
_first_time_call = False
return res
return fn
def indent_except_first_line(s: str, indent_string: str) -> str:
return indent(s, indent_string, not_first())
Every time you call not_first
, it creates a new function with a built-in flag which check if it’s being called for the first time.
indent
uses that generated function to decide whether to indent each line in the supplied string.
I want to indent all lines of a multi-line string except the first, without wrapping the text.
For example, I want to turn:
A very very very very very very very very very very very very very very very very
long mutiline
string
into:
A very very very very very very very very very very very very very very very very
long multiline
string
I have tried
textwrap.fill(string, width=999999999999, subsequent_indent=' ',)
But this still puts all of the text on one line. Thoughts?
Do you mean something like this:
In [21]: s = 'abcndefnxyz'
In [22]: print s
abc
def
xyz
In [23]: print 'n '.join(s.split('n'))
abc
def
xyz
?
edit: Alternatively (HT @Steven Rumbalski):
In [24]: print s.replace('n', 'n ')
abc
def
xyz
You just need to replace the newline character 'n'
with a new line character plus the white spaces 'n '
and save it to a variable (since replace
won’t change your original string, but return a new one with the replacements).
string = string.replace('n', 'n ')
The bare replace mentioned by @steven-rumbalski is going to be the most efficient way to accomplish this, but it’s not the only way.
Here’s another solution using list comprehensions. If the text has already been split into a list of lines, this will be much faster than running join()
, replace()
and splitlines()
text = """A very very very very very very very very very very very very very very very very
long mutiline
string"""
lines = text.splitlines()
indented = [' ' + l for l in lines]
indented[0] = lines[0]
indented = 'n'.join(indented)
The list could be modified in place, but there’s a significant performance cost versus using a second variable. It’s also moderately faster to indent all the lines and then swap out the first line in another operation.
There’s also the textwrap
module. I disagree that using textwrap for indentation is unpythonic. If the lines are joined in a single string containing newlines, that string is inherently wrapped. Indentation is a logical extension of text wrapping, so textwrap makes sense to me.
Except that it’s slow. Really, really slow. Like 15x slower.
Python 3 added indent
to textwrap
which makes indenting without re-wrapping very easy. There’s certainly a more elegant way of handling the lambda predicate, but this does exactly what the original question was asking for.
indented = textwrap.indent(text, ' ', lambda x: not text.splitlines()[0] in x )
Here are some timeit
results of the various methods.
>>> timeit.timeit(r"text.replace('n', 'n ')", setup='text = """%s"""' % text)
0.5123521030182019
The two list comprehension solutions:
>>> timeit.timeit(r"indented = [' ' + i for i in lines]; indented[0] = lines[0]", setup='lines = """%s""".splitlines()' % text)
0.7037646849639714
>>> timeit.timeit(r"indented = [lines[0]] + [' ' + i for i in lines[1:]]", setup='lines = """%s""".splitlines()' % text)
1.0310905870283023
And here’s the unfortunate textwrap
result:
>>> timeit.timeit(r"textwrap.indent(text, ' ', lambda x: not text.splitlines()[0] in x )", setup='import textwrap; text = """%s"""' % text)
7.7950868209591135
I thought some of that time could be the horribly inefficient predicate, but even with that removed, textwrap.indent
is still more than 8 times slower than a bare replace.
>>> timeit.timeit(r"textwrap.indent(text, ' ')", setup='import textwrap; text = """%s"""' % text)
4.266149697010405
In Python 3.3 (introduced indent
) and later, you can do this:
from textwrap import indent
def not_first():
"""Creates a function returning False only the first time."""
_first_time_call = True
def fn(_) -> bool:
nonlocal _first_time_call
res = not _first_time_call
_first_time_call = False
return res
return fn
def indent_except_first_line(s: str, indent_string: str) -> str:
return indent(s, indent_string, not_first())
Every time you call not_first
, it creates a new function with a built-in flag which check if it’s being called for the first time.
indent
uses that generated function to decide whether to indent each line in the supplied string.