How do I remove leading whitespace in Python?
Question:
I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"
Answers:
The lstrip()
method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ')
should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'thello world with 2 spaces and a tab!'
Related question:
The function strip
will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str
to "text"
.
To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first ‘a’. [^a]
can be replaced with any character class you like, such as word characters.
If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)
The question doesn’t address multiline strings, but here is how you would strip leading whitespace from a multiline string using python’s standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s)
we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent
:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"
My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
- First it strips – this removes leading and ending spaces.
- Then it splits – it does this on whitespace by default (so it’ll even get tabs and newlines). The thing is that this returns a list.
- Finally join iterates over the list and joins each with a single space in between.
I have a text string that starts with a number of spaces, varying between 2 & 4.
What is the simplest way to remove the leading whitespace? (ie. remove everything before a certain character?)
" Example" -> "Example"
" Example " -> "Example "
" Example" -> "Example"
The lstrip()
method will remove leading whitespaces, newline and tab characters on a string beginning:
>>> ' hello world!'.lstrip()
'hello world!'
Edit
As balpha pointed out in the comments, in order to remove only spaces from the beginning of the string, lstrip(' ')
should be used:
>>> ' hello world with 2 spaces and a tab!'.lstrip(' ')
'thello world with 2 spaces and a tab!'
Related question:
The function strip
will remove whitespace from the beginning and end of a string.
my_str = " text "
my_str = my_str.strip()
will set my_str
to "text"
.
To remove everything before a certain character, use a regular expression:
re.sub(r'^[^a]*', '')
to remove everything up to the first ‘a’. [^a]
can be replaced with any character class you like, such as word characters.
If you want to cut the whitespaces before and behind the word, but keep the middle ones.
You could use:
word = ' Hello World '
stripped = word.strip()
print(stripped)
The question doesn’t address multiline strings, but here is how you would strip leading whitespace from a multiline string using python’s standard library textwrap module. If we had a string like:
s = """
line 1 has 4 leading spaces
line 2 has 4 leading spaces
line 3 has 4 leading spaces
"""
if we print(s)
we would get output like:
>>> print(s)
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
and if we used textwrap.dedent
:
>>> import textwrap
>>> print(textwrap.dedent(s))
this has 4 leading spaces 1
this has 4 leading spaces 2
this has 4 leading spaces 3
Using regular expressions when cleaning the text is the best practice
def removing_leading_whitespaces(text):
return re.sub(r"^s+","",text)
Apply the above function
removing_leading_whitespaces(" Example")
" Example" -> "Example"
removing_leading_whitespaces(" Example ")
" Example " -> "Example "
removing_leading_whitespaces(" Example")
" Example" -> "Example"
My personal favorite for any string handling is strip, split and join (in that order):
>>> ' '.join(" this is my badly spaced string ! ".strip().split())
'this is my badly spaced string !'
In general it can be good to apply this for all string handling.
This does the following:
- First it strips – this removes leading and ending spaces.
- Then it splits – it does this on whitespace by default (so it’ll even get tabs and newlines). The thing is that this returns a list.
- Finally join iterates over the list and joins each with a single space in between.