What is the pythonic way to count the leading spaces in a string?
Question:
I know I can count the leading spaces in a string with this:
>>> a = " foo bar baz qua n"
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 3
>>>
But is there a more pythonic way?
Answers:
That looks… great to me. Usually I answer “Is X Pythonic?” questions with some functional magic, but I don’t feel that approach is appropriate for string manipulation.
If there were a built-in to only return the leading spaces, and the take the len()
of that, I’d say go for it- but AFAIK there isn’t, and re
and other solutions are absolutely overkill.
You could use itertools.takewhile
sum( 1 for _ in itertools.takewhile(str.isspace,a) )
And demonstrating that it gives the same result as your code:
>>> import itertools
>>> a = " leading spaces"
>>> print sum( 1 for _ in itertools.takewhile(str.isspace,a) )
4
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 4
I’m not sure whether this code is actually better than your original solution. It has the advantage that it doesn’t create more temporary strings, but that’s pretty minor (unless the strings are really big). I don’t find either version to be immediately clear about that line of code does, so I would definitely wrap it in a nicely named function if you plan on using it more than once (with appropriate comments in either case).
Your way is pythonic but incorrect, it will also count other whitespace chars, to count only spaces be explicit a.lstrip(' ')
. Compare
a = " rtntfoo bar baz qua n"
print("Leading spaces", len(a) - len(a.lstrip()))
>>> Leading spaces 7
and
print("Leading spaces", len(a) - len(a.lstrip(' '))
>>> Leading spaces 3
Using next
and enumerate
:
next((i for i, c in enumerate(a) if c != ' '), len(a))
For any whitespace:
next((i for i, c in enumerate(a) if not c.isspace()), len(a))
Just for variety, you could theoretically use regex. It’s a little shorter, and looks nicer than the double call to len()
.
>>> import re
>>> a = " foo bar baz qua n"
>>> re.search('S', a).start() # index of the first non-whitespace char
3
Or alternatively:
>>> re.search('[^ ]', a).start() # index of the first non-space char
3
But I don’t recommend this; according to a quick test I did, it’s much less efficient than len(a)-len(lstrip(a))
.
I recently had a similar task of counting indents, because of which I wanted to count tab as four spaces:
def indent(string: str):
return sum(4 if char is 't' else 1 for char in string[:-len(string.lstrip())])
You can use a regular expression:
def count_leading_space(s):
match = re.search(r"^s*", s)
return 0 if not match else match.end()
In [17]: count_leading_space(" asd fjk gl")
Out[17]: 4
In [18]: count_leading_space(" asd fjk gl")
Out[18]: 1
In [19]: count_leading_space("asd fjk gl")
Out[19]: 0
I know I can count the leading spaces in a string with this:
>>> a = " foo bar baz qua n"
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 3
>>>
But is there a more pythonic way?
That looks… great to me. Usually I answer “Is X Pythonic?” questions with some functional magic, but I don’t feel that approach is appropriate for string manipulation.
If there were a built-in to only return the leading spaces, and the take the len()
of that, I’d say go for it- but AFAIK there isn’t, and re
and other solutions are absolutely overkill.
You could use itertools.takewhile
sum( 1 for _ in itertools.takewhile(str.isspace,a) )
And demonstrating that it gives the same result as your code:
>>> import itertools
>>> a = " leading spaces"
>>> print sum( 1 for _ in itertools.takewhile(str.isspace,a) )
4
>>> print "Leading spaces", len(a) - len(a.lstrip())
Leading spaces 4
I’m not sure whether this code is actually better than your original solution. It has the advantage that it doesn’t create more temporary strings, but that’s pretty minor (unless the strings are really big). I don’t find either version to be immediately clear about that line of code does, so I would definitely wrap it in a nicely named function if you plan on using it more than once (with appropriate comments in either case).
Your way is pythonic but incorrect, it will also count other whitespace chars, to count only spaces be explicit a.lstrip(' ')
. Compare
a = " rtntfoo bar baz qua n"
print("Leading spaces", len(a) - len(a.lstrip()))
>>> Leading spaces 7
and
print("Leading spaces", len(a) - len(a.lstrip(' '))
>>> Leading spaces 3
Using next
and enumerate
:
next((i for i, c in enumerate(a) if c != ' '), len(a))
For any whitespace:
next((i for i, c in enumerate(a) if not c.isspace()), len(a))
Just for variety, you could theoretically use regex. It’s a little shorter, and looks nicer than the double call to len()
.
>>> import re
>>> a = " foo bar baz qua n"
>>> re.search('S', a).start() # index of the first non-whitespace char
3
Or alternatively:
>>> re.search('[^ ]', a).start() # index of the first non-space char
3
But I don’t recommend this; according to a quick test I did, it’s much less efficient than len(a)-len(lstrip(a))
.
I recently had a similar task of counting indents, because of which I wanted to count tab as four spaces:
def indent(string: str):
return sum(4 if char is 't' else 1 for char in string[:-len(string.lstrip())])
You can use a regular expression:
def count_leading_space(s):
match = re.search(r"^s*", s)
return 0 if not match else match.end()
In [17]: count_leading_space(" asd fjk gl")
Out[17]: 4
In [18]: count_leading_space(" asd fjk gl")
Out[18]: 1
In [19]: count_leading_space("asd fjk gl")
Out[19]: 0