Cutting string after x chars at whitespace in python
Question:
I want to cut a long text after x characters, but I don’t want to cut a word in the middle, I want to cut at the last whitespace before x chars:
'This is a sample text'[:20]
gives me
'This is a sample tex'
but I want
'This is a sample'
Another example:
'And another sample sentence'[:15]
gives me
'And another sam'
but I want
'And another'
What is the easiest way to do this?
Answers:
import textwrap
lines = textwrap.wrap(text, 20)
# then use either
lines[0]
# or
'n'.join(lines)
You can use str.rpartition()
or str.rsplit()
to remove everything after the last space of the remainder:
example[:20].rpartition(' ')[0]
example[:20].rsplit(' ', 1)[0]
The second argument to str.rsplit()
limits the split to the first space from the right, and the [0]
index takes whatever was split off before that space.
str.rpartition()
is a bit faster, and always returns three strings; if there was no space then the first string returned is empty, so you may want to stick with str.rsplit()
if that’s a possibility (that version would return a list with a single string in that case, so you end up with the original string again).
Upvoted the two other answers, but just for the fun, with regex:
import re
r = re.compile('.{,20}(?<! )(?= |Z|A)')
for s in ('This is a sample text',
'abcdefghijklmnopqrstuvwxyz ',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'This is 1 first sample text ',
'This is 1 again sample text',
'A great blank here',
'Another blank here',
'A short text',
' little indent',
' great indent',
'ocean',
'!',
''):
print ('-----------------------n'
" ....5...10...15...20n"
'%rn%r'
% (s, r.match(s).group() ) )
result
-----------------------
....5...10...15...20
'This is a sample text'
'This is a sample'
-----------------------
....5...10...15...20
'abcdefghijklmnopqrstuvwxyz '
''
-----------------------
....5...10...15...20
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
''
-----------------------
....5...10...15...20
'This is 1 first sample text '
'This is 1 first'
-----------------------
....5...10...15...20
'This is 1 again sample text'
'This is 1 again'
-----------------------
....5...10...15...20
'A great blank here'
'A great blank'
-----------------------
....5...10...15...20
'Another blank here'
'Another blank'
-----------------------
....5...10...15...20
'A short text'
'A short text'
-----------------------
....5...10...15...20
' little indent'
' little indent'
-----------------------
....5...10...15...20
' great indent'
''
-----------------------
....5...10...15...20
'ocean'
'ocean'
-----------------------
....5...10...15...20
'!'
'!'
-----------------------
....5...10...15...20
''
''
Now in 2022, I discovered that textwrap
from stdlibs, which Steve thankfully recommended, has a method exactly for this need: textwrap.shorten().
>>> import textwrap
>>> textwrap.shorten("This is a sample text", width=20, placeholder="")
'This is a sample'
It was introduced before Python 3.6, so should be available in most projects now.
I want to cut a long text after x characters, but I don’t want to cut a word in the middle, I want to cut at the last whitespace before x chars:
'This is a sample text'[:20]
gives me
'This is a sample tex'
but I want
'This is a sample'
Another example:
'And another sample sentence'[:15]
gives me
'And another sam'
but I want
'And another'
What is the easiest way to do this?
import textwrap
lines = textwrap.wrap(text, 20)
# then use either
lines[0]
# or
'n'.join(lines)
You can use str.rpartition()
or str.rsplit()
to remove everything after the last space of the remainder:
example[:20].rpartition(' ')[0]
example[:20].rsplit(' ', 1)[0]
The second argument to str.rsplit()
limits the split to the first space from the right, and the [0]
index takes whatever was split off before that space.
str.rpartition()
is a bit faster, and always returns three strings; if there was no space then the first string returned is empty, so you may want to stick with str.rsplit()
if that’s a possibility (that version would return a list with a single string in that case, so you end up with the original string again).
Upvoted the two other answers, but just for the fun, with regex:
import re
r = re.compile('.{,20}(?<! )(?= |Z|A)')
for s in ('This is a sample text',
'abcdefghijklmnopqrstuvwxyz ',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'This is 1 first sample text ',
'This is 1 again sample text',
'A great blank here',
'Another blank here',
'A short text',
' little indent',
' great indent',
'ocean',
'!',
''):
print ('-----------------------n'
" ....5...10...15...20n"
'%rn%r'
% (s, r.match(s).group() ) )
result
-----------------------
....5...10...15...20
'This is a sample text'
'This is a sample'
-----------------------
....5...10...15...20
'abcdefghijklmnopqrstuvwxyz '
''
-----------------------
....5...10...15...20
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
''
-----------------------
....5...10...15...20
'This is 1 first sample text '
'This is 1 first'
-----------------------
....5...10...15...20
'This is 1 again sample text'
'This is 1 again'
-----------------------
....5...10...15...20
'A great blank here'
'A great blank'
-----------------------
....5...10...15...20
'Another blank here'
'Another blank'
-----------------------
....5...10...15...20
'A short text'
'A short text'
-----------------------
....5...10...15...20
' little indent'
' little indent'
-----------------------
....5...10...15...20
' great indent'
''
-----------------------
....5...10...15...20
'ocean'
'ocean'
-----------------------
....5...10...15...20
'!'
'!'
-----------------------
....5...10...15...20
''
''
Now in 2022, I discovered that textwrap
from stdlibs, which Steve thankfully recommended, has a method exactly for this need: textwrap.shorten().
>>> import textwrap
>>> textwrap.shorten("This is a sample text", width=20, placeholder="")
'This is a sample'
It was introduced before Python 3.6, so should be available in most projects now.