How to remove extra indentation of Python triple quoted multi-line strings?

Question:

I have a python editor where the user is entering a script or code, which is then put into a main method behind the scenes, while also having every line indented. The problem is that if a user has a multi line string, the indentation made to the whole script affects the string, by inserting a tab in every space. A problem script would be something so simple as:

"""foo
bar
foo2"""

So when in the main method it would look like:

def main():
    """foo
    bar
    foo2"""

and the string would now have an extra tab at the beginning of every line.

Asked By: Mike

||

Answers:

The only way i see – is to strip first n tabs for each line starting with second, where n is known identation of main method.

If that identation is not known beforehand – you can add trailing newline before inserting it and strip number of tabs from the last line…

The third solution is to parse data and find beginning of multiline quote and do not add your identation to every line after until it will be closed.

Think there is a better solution..

Answered By: Mikhail Churbanov

What follows the first line of a multiline string is part of the string, and not treated as indentation by the parser. You may freely write:

def main():
    """foo
bar
foo2"""
    pass

and it will do the right thing.

On the other hand, that’s not readable, and Python knows it. So if a docstring contains whitespace in it’s second line, that amount of whitespace is stripped off when you use help() to view the docstring. Thus, help(main) and the below help(main2) produce the same help info.

def main2():
    """foo
    bar
    foo2"""
    pass

So if I get it correctly, you take whatever the user inputs, indent it properly and add it to the rest of your program (and then run that whole program).

So after you put the user input into your program, you could run a regex, that basically takes that forced indentation back. Something like: Within three quotes, replace all “new line markers” followed by four spaces (or a tab) with only a “new line marker”.

Answered By: FlorianH

textwrap.dedent from the standard library is there to automatically undo the wacky indentation.

Answered By: thraxil

From what I see, a better answer here might be inspect.cleandoc, which does much of what textwrap.dedent does but also fixes the problems that textwrap.dedent has with the leading line.

The below example shows the differences:

>>> import textwrap
>>> import inspect
>>> x = """foo bar
    baz
    foobar
    foobaz
    """
>>> inspect.cleandoc(x)
'foo barnbaznfoobarnfoobaz'
>>> textwrap.dedent(x)
'foo barn    bazn    foobarn    foobazn'
>>> y = """
...     foo
...     bar
... """
>>> inspect.cleandoc(y)
'foonbar'
>>> textwrap.dedent(y)
'nfoonbarn'
>>> z = """tfoo
bartbaz
"""
>>> inspect.cleandoc(z)
'foonbar     baz'
>>> textwrap.dedent(z)
'tfoonbartbazn'

Note that inspect.cleandoc also expands internal tabs to spaces.
This may be inappropriate for one’s use case, but works fine for me.

Answered By: bbenne10

Showing the difference between textwrap.dedent and inspect.cleandoc with a little more clarity:

Behavior with the leading part not indented

import textwrap
import inspect

string1="""String
with
no indentation
       """
string2="""String
        with
        indentation
       """
print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output

string1 plain='Stringnwithnno indentationn       '
string1 inspect.cleandoc='Stringnwithnno indentationn       '
string1 texwrap.dedent='Stringnwithnno indentationn'
string2 plain='Stringn        withn        indentationn       '
string2 inspect.cleandoc='Stringnwithnindentation'
string2 texwrap.dedent='Stringn        withn        indentationn'

Behavior with the leading part indented

string1="""
String
with
no indentation
       """
string2="""
        String
        with
        indentation
       """

print('string1 plain=' + repr(string1))
print('string1 inspect.cleandoc=' + repr(inspect.cleandoc(string1)))
print('string1 texwrap.dedent=' + repr(textwrap.dedent(string1)))
print('string2 plain=' + repr(string2))
print('string2 inspect.cleandoc=' + repr(inspect.cleandoc(string2)))
print('string2 texwrap.dedent=' + repr(textwrap.dedent(string2)))

Output

string1 plain='nStringnwithnno indentationn       '
string1 inspect.cleandoc='Stringnwithnno indentationn       '
string1 texwrap.dedent='nStringnwithnno indentationn'
string2 plain='n        Stringn        withn        indentationn       '
string2 inspect.cleandoc='Stringnwithnindentation'
string2 texwrap.dedent='nStringnwithnindentationn'
Answered By: codeforester

I wanted to preserve exactly what is between the triple-quote lines, removing common leading indent only. I found that texwrap.dedent and inspect.cleandoc didn’t do it quite right, so I wrote this one. It uses os.path.commonprefix.

import re
from os.path import commonprefix

def ql(s, eol=True):
    lines = s.splitlines()
    l0 = None
    if lines:
        l0 = lines.pop(0) or None
    common = commonprefix(lines)
    indent = re.match(r's*', common)[0]
    n = len(indent)
    lines2 = [l[n:] for l in lines]
    if not eol and lines2 and not lines2[-1]:
        lines2.pop()
    if l0 is not None:
        lines2.insert(0, l0)
    s2 = "n".join(lines2)
    return s2

This can quote any string with any indent. I wanted it to include the trailing newline by default, but with an option to remove it so that it can quote any string neatly.

Example:

print(ql("""
     Hello
    |---/|
    | o_o |
     _^_/
    """))

print(ql("""
         World
        |---/|
        | o_o |
         _^_/
    """))

The second string has 4 spaces of common indentation because the final """ is indented less than the quoted text:

 Hello
|---/|
| o_o |
 _^_/

     World
    |---/|
    | o_o |
     _^_/

I thought this was going to be simpler, otherwise I wouldn’t have bothered with it!

Answered By: Sam Watkins

I had a similar issue: I wanted my triple quoted string to be indented, but I didn’t want the string to have all those spaces at the beginning of each line. I used re to correct my issue:

        print(re.sub('n *','n', f"""Content-Type: multipart/mixed; boundary="===============9004758485092194316=="
`           MIME-Version: 1.0
            Subject: Get the reader's attention here!
            To: [email protected]

            --===============9004758485092194316==
            Content-Type: text/html; charset="us-ascii"
            MIME-Version: 1.0
            Content-Transfer-Encoding: 7bit

            Very important message goes here - you can even use <b>HTML</b>.
            --===============9004758485092194316==--
        """))

Above, I was able to keep my code indented, but the string was left trimmed essentially. All spaces at the beginning of each line were deleted. This was important since any spaces in front of the SMTP or MIME specific lines would break the email message.

The tradeoff I made was that I left the Content-Type on the first line because the regex I was using didn’t remove the initial n (which broke email). If it bothered me enough, I guess I could have added an lstrip like this:

print(re.sub('n *','n', f"""
    Content-Type: ...
""").lstrip()

After reading this 10 year old page, I decided to stick with re.sub since I didn’t truly understand all the nuances of textwrap and inspect.

Answered By: Mark

There is a much simpler way:

    foo = """first line
             nsecond line"""
Answered By: Kostia

This does the trick, if I understand the question correctly. lstrip() removes leading whitespace, so it will remove
tabs as well as spaces.

from os import linesep

def dedent(message):
    return linesep.join(line.lstrip() for line in message.splitlines())

Example:

name='host'
config_file='/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
message = f"""Missing env var or configuration entry for 'host'. 
              Please add '{name}' entry to file
              {config_file}
              or export environment variable 'mqtt_{name}' before
              running the program.
           """

>>> print(message)
Missing env var or configuration entry for 'host'. 
              Please add 'host' entry to
              '/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
              or export environment variable 'mqtt_host' before
              running the program.

>>> print(dedent(message))
Missing env var or configuration entry for 'host'. 
Please add 'host' entry to file
'/Users/nmellor/code/cold_fusion/end-to-end/config/stage.toml'
or export environment variable 'mqtt_host' before
running the program.

The above solution will remove ALL indentation. If you want to remove indentation that is common to the whole multiline string, use textwrap.dedent(). But take care that the first and last lines in the multi-line string are also indented otherwise .dedent() will do nothing.

Answered By: Nick
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.