Cutting string after x chars at whitespace in python

Question:

I want to cut a long text after x characters, but I don’t want to cut a word in the middle, I want to cut at the last whitespace before x chars:

'This is a sample text'[:20]

gives me

'This is a sample tex'

but I want

'This is a sample'

Another example:

'And another sample sentence'[:15]

gives me

'And another sam'

but I want

'And another'

What is the easiest way to do this?

Asked By: dynobo

||

Answers:

import textwrap
lines = textwrap.wrap(text, 20)
# then use either
lines[0]
# or
'n'.join(lines)
Answered By: Steve Barnes

You can use str.rpartition() or str.rsplit() to remove everything after the last space of the remainder:

example[:20].rpartition(' ')[0]

example[:20].rsplit(' ', 1)[0]

The second argument to str.rsplit() limits the split to the first space from the right, and the [0] index takes whatever was split off before that space.

str.rpartition() is a bit faster, and always returns three strings; if there was no space then the first string returned is empty, so you may want to stick with str.rsplit() if that’s a possibility (that version would return a list with a single string in that case, so you end up with the original string again).

Answered By: Martijn Pieters

Upvoted the two other answers, but just for the fun, with regex:

import re

r = re.compile('.{,20}(?<! )(?= |Z|A)')
for s in ('This is a sample text',
          'abcdefghijklmnopqrstuvwxyz  ',
          'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
          'This is 1 first sample text  ',
          'This is 1 again sample text',
          'A great blank          here',
          'Another blank     here',
          'A short text',
          '  little indent',
          '                      great indent',
          'ocean',
          '!',
          ''):
    print ('-----------------------n'
           " ....5...10...15...20n"
           '%rn%r'
           % (s, r.match(s).group() )   )

result

-----------------------
 ....5...10...15...20
'This is a sample text'
'This is a sample'
-----------------------
 ....5...10...15...20
'abcdefghijklmnopqrstuvwxyz  '
''
-----------------------
 ....5...10...15...20
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
''
-----------------------
 ....5...10...15...20
'This is 1 first sample text  '
'This is 1 first'
-----------------------
 ....5...10...15...20
'This is 1 again sample text'
'This is 1 again'
-----------------------
 ....5...10...15...20
'A great blank          here'
'A great blank'
-----------------------
 ....5...10...15...20
'Another blank     here'
'Another blank'
-----------------------
 ....5...10...15...20
'A short text'
'A short text'
-----------------------
 ....5...10...15...20
'  little indent'
'  little indent'
-----------------------
 ....5...10...15...20
'                      great indent'
''
-----------------------
 ....5...10...15...20
'ocean'
'ocean'
-----------------------
 ....5...10...15...20
'!'
'!'
-----------------------
 ....5...10...15...20
''
''
Answered By: eyquem

Now in 2022, I discovered that textwrap from stdlibs, which Steve thankfully recommended, has a method exactly for this need: textwrap.shorten().

>>> import textwrap
>>> textwrap.shorten("This is a sample text", width=20, placeholder="")
'This is a sample'

It was introduced before Python 3.6, so should be available in most projects now.

Answered By: dynobo
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.