Remove all line breaks from a long string of text

Question:

Basically, I’m asking the user to input a string of text into the console, but the string is very long and includes many line breaks. How would I take the user’s string and delete all line breaks to make it a single line of text. My method for acquiring the string is very simple.

string = raw_input("Please enter string: ")

Is there a different way I should be grabbing the string from the user? I’m running Python 2.7.4 on a Mac.

P.S. Clearly I’m a noob, so even if a solution isn’t the most efficient, the one that uses the most simple syntax would be appreciated.

Asked By: Ian Zane

||

Answers:

You can try using string replace:

string = string.replace('r', '').replace('n', '')
Answered By: Konstantin Dinev

How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.

>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>

In the example above, I replaced all spaces. The string 'n' represents newlines. And r represents carriage returns (if you’re on windows, you might be getting these and a second replace will handle them for you!).

basically:

# you probably want to use a space ' ' to replace `n`
mystring = mystring.replace('n', ' ').replace('r', '')

Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I’d avoid but would love to use sometimes: file. For the same reason.

Answered By: Daren Thomas

updated based on Xbello comment:

string = my_string.rstrip('rn')

read more here

Answered By: tokhi

You can split the string with no separator arg, which will treat consecutive whitespace as a single separator (including newlines and tabs). Then join using a space:

In : " ".join("nnsome    text rn with multiple whitespace".split())
Out: 'some text with multiple whitespace'

https://docs.python.org/2/library/stdtypes.html#str.split

Answered By: Sean

A method taking into consideration

  • additional white characters at the beginning/end of string
  • additional white characters at the beginning/end of every line
  • various end-line characters

it takes such a multi-line string which may be messy e.g.

test_str = 'nhej ho n aaarn   an '

and produces nice one-line string

>>> ' '.join([line.strip() for line in test_str.strip().splitlines()])
'hej ho aaa a'

UPDATE:
To fix multiple new-line character producing redundant spaces:

' '.join([line.strip() for line in test_str.strip().splitlines() if line.strip()])

This works for the following too
test_str = 'nhej ho n aaarnnnnn an '

Answered By: Kamil Neczaj

Another option is regex:

>>> import re
>>> re.sub("n|r", "", "Foonrbarnrbaznr")
'Foobarbaz'
Answered By: Neil

The problem with rstrip() is that it does not work in all cases (as I myself have seen few). Instead you can use

text = text.replace("n"," ")

This will remove all new line 'n' with a space.

Answered By: Ankit Dwivedi

If anybody decides to use replace, you should try r'n' instead 'n'

mystring = mystring.replace(r'n', ' ').replace(r'r', '')
Answered By: Anar Salimkhanov

The canonic answer, in Python, would be :

s = ''.join(s.splitlines())

It splits the string into lines (letting Python doing it according to its own best practices). Then you merge it. Two possibilities here:

  • replace the newline by a whitespace (' '.join())
  • or without a whitespace (''.join())
Answered By: fralau

Regular expressions is the fastest way to do this

s='''some kind   of
string with a bunchr of

  
 extra spaces in   it'''

re.sub(r's(?=s)','',re.sub(r's',' ',s))

result:

'some kind of string with a bunch of extra spaces in it'
Answered By: Quin

You really don’t need to remove ALL the signs: lf cr crlf.

# Pythonic:
r'n', r'r', r'rn' 

Some texts must have breaks, but you probably need to join broken lines to keep particular sentences together.

Therefore it is natural that line breaking happens after priod, semicolon, colon, but not after comma.

My code considers above conditions. Works well with texts copied from pdfs.
Enjoy!:

def unbreak_pdf_text(raw_text):
    """ the newline careful sign removal tool

    Args:
        raw_text (str): string containing unwanted newline signs: \n or \r or \r\n
        e.g. imported from OCR or copied from a pdf document.

    Returns:
        _type_: _description_
    """
    pat = re.compile((r"[, w]n|[, w]r|[, w]rn"))
    breaks = re.finditer(pat, raw_text)

    processed_text = raw_text
    raw_text = None

    for i in breaks:
        processed_text = processed_text.replace(i.group(), i.group()[0]+" ")

    return processed_text
Answered By: pythamator
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.