Python splitting string by length without cutting words

Question:

I am trying to split a string in multiple lines. Here is an example of the string:

Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars because this part have to be in the second line and it also needs to be seperated such as the string part before and this has to be in the third line with the end of the string

It has to be split like this:

Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars
because this part have to be in the second line and it also needs to be seperated such as the string part before and 
this has to be in the third line with the end of the string

I’m trying to split the string at the length of 120 chars but if a word is about to be cut it should use the last word before that limit und put the last word in the next line and calculate how the rest have to treated the same way if the text is any longer than 2 lines.

Also there is a part in the string that have to stay in one line at the end.

How do i do this dynamically? I tried some solutions like string[0:120], string.splitlines() and wrap. Maybe like putting it in a list and loop through it but how to build this splitting logic?

Is there maybe a built-in solution for this?

Asked By: Ano Nymous

||

Answers:

Using textwrap.wrap:

>>> text = "Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars because this part have to be in the second line and it also needs to be seperated such as the string part before and this has to be in the third line with the end of the string"
>>> import textwrap
>>> print(*textwrap.wrap(text, 120), sep='n')
Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars
because this part have to be in the second line and it also needs to be seperated such as the string part before and
this has to be in the third line with the end of the string

If you were doing it from scratch, using split and iteratively adding words to a list of lines would be a good way to start:

>>> words = text.split(" ")
>>> lines = [words[0]]
>>> for word in words[1:]:
...     if len(lines[-1]) + len(word) < 120:
...         lines[-1] += (" " + word)
...     else:
...         lines.append(word)
...
>>> print(*lines, sep='n')
Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars
because this part have to be in the second line and it also needs to be seperated such as the string part before and
this has to be in the third line with the end of the string
Answered By: Samwise

You might also use regular expressions, matching anywhere from 1 to 120 characters followed by a word boundary.

re.findall(r'(.{1,120})(?=b)', "Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars because this part have to be in the second line and it also needs to be seperated such as the string part before and this has to be in the third line with the end of the string")

Yields:

['Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars ', 
 'because this part have to be in the second line and it also needs to be seperated such as the string part before and ', 
 'this has to be in the third line with the end of the string']

Putting this into a function:

def wrap(length, text):
  pat = re.compile(f'(.{{1,{length}}})(?=\b)')
  return pat.findall(text)
 
wrap(120, "Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars because this part have to be in the second line and it also needs to be seperated such as the string part before and this has to be in the third line with the end of the string")
# ['Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars ', 
#  'because this part have to be in the second line and it also needs to be seperated such as the string part before and ',
#  'this has to be in the third line with the end of the string']

wrap(80, "Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars because this part have to be in the second line and it also needs to be seperated such as the string part before and this has to be in the third line with the end of the string")
# ['Here the line that have to be under 120 chars and cut at the point in the string', 
#  ' where the last word is under 120 chars because this part have to be in the ', 
#  'second line and it also needs to be seperated such as the string part before and', 
#  ' this has to be in the third line with the end of the string']

wrap(60, "Here the line that have to be under 120 chars and cut at the point in the string where the last word is under 120 chars because this part have to be in the second line and it also needs to be seperated such as the string part before and this has to be in the third line with the end of the string")
# ['Here the line that have to be under 120 chars and cut at the', 
#  ' point in the string where the last word is under 120 chars ', 
#  'because this part have to be in the second line and it also ', 
#  'needs to be seperated such as the string part before and ', 
#  'this has to be in the third line with the end of the string']

Further exercise would be stripping leading and trailing whitespaces from each line.

Answered By: Chris
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.