How do I split a multi-line string into multiple lines?
Question:
I have a multi-line string that I want to do an operation on each line, like so:
inputString = """Line 1
Line 2
Line 3"""
I want to iterate on each line:
for line in inputString:
doStuff()
Answers:
inputString.splitlines()
Will give you a list with each item, the splitlines()
method is designed to split each line into a list element.
inputString.split('n') # --> ['Line 1', 'Line 2', 'Line 3']
This is identical to the above, but the string module’s functions are deprecated and should be avoided:
import string
string.split(inputString, 'n') # --> ['Line 1', 'Line 2', 'Line 3']
Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines
method with a True
argument:
inputString.splitlines(True) # --> ['Line 1n', 'Line 2n', 'Line 3']
Might be overkill in this particular case but another option involves using StringIO
to create a file-like object
for line in StringIO.StringIO(inputString):
doStuff()
Use inputString.splitlines()
.
Why splitlines
is better
splitlines
handles newlines properly, unlike split
.
It also can optionally return the newline character in the split result when called with a True
argument, which is useful in some specific scenarios.
Why you should NOT use split("n")
Using split
creates very confusing bugs when sharing files across operating systems.
n
in Python represents a Unix line-break (ASCII decimal code 10), independently of the OS where you run it. However, the ASCII linebreak representation is OS-dependent.
On Windows, n
is two characters, CR
and LF
(ASCII decimal codes 13 and 10, r
and n
), while on modern Unix (Mac OS X, Linux, Android), it’s the single character LF
.
print
works correctly even if you have a string with line endings that don’t match your platform:
>>> print " a n b rn c "
a
b
c
However, explicitly splitting on "n", has OS-dependent behaviour:
>>> " a n b rn c ".split("n")
[' a ', ' b r', ' c ']
Even if you use os.linesep
, it will only split according to the newline separator on your platform, and will fail if you’re processing text created in other platforms, or with a bare n
:
>>> " a n b rn c ".split(os.linesep)
[' a n b ', ' c ']
splitlines
solves all these problems:
>>> " a n b rn c ".splitlines()
[' a ', ' b ', ' c ']
Reading files in text mode partially mitigates the newline representation problem, as it converts Python’s n
into the platform’s newline representation.
However, text mode only exists on Windows. On Unix systems, all files are opened in binary mode, so using split('n')
in a UNIX system with a Windows file will lead to undesired behavior. This can also happen when transferring files in the network.
I would like to augment @1_CR ‘s answer: He led me to the following technique; it will use cStringIO if available (BUT NOTE: cStringIO and StringIO are not the same, because you cannot subclass cStringIO… it is a built-in… but for basic operations the syntax will be identical, so you can do this):
try:
import cStringIO
StringIO = cStringIO
except ImportError:
import StringIO
for line in StringIO.StringIO(variable_with_multiline_string):
pass
print line.strip()
The original post requested for code which prints some rows (if they are true for some condition) plus the following row.
My implementation would be this:
text = """1 sfasdf
asdfasdf
2 sfasdf
asdfgadfg
1 asfasdf
sdfasdgf
"""
text = text.splitlines()
rows_to_print = {}
for line in range(len(text)):
if text[line][0] == '1':
rows_to_print = rows_to_print | {line, line + 1}
rows_to_print = sorted(list(rows_to_print))
for i in rows_to_print:
print(text[i])
I have a multi-line string that I want to do an operation on each line, like so:
inputString = """Line 1
Line 2
Line 3"""
I want to iterate on each line:
for line in inputString:
doStuff()
inputString.splitlines()
Will give you a list with each item, the splitlines()
method is designed to split each line into a list element.
inputString.split('n') # --> ['Line 1', 'Line 2', 'Line 3']
This is identical to the above, but the string module’s functions are deprecated and should be avoided:
import string
string.split(inputString, 'n') # --> ['Line 1', 'Line 2', 'Line 3']
Alternatively, if you want each line to include the break sequence (CR,LF,CRLF), use the splitlines
method with a True
argument:
inputString.splitlines(True) # --> ['Line 1n', 'Line 2n', 'Line 3']
Might be overkill in this particular case but another option involves using StringIO
to create a file-like object
for line in StringIO.StringIO(inputString):
doStuff()
Use inputString.splitlines()
.
Why splitlines
is better
splitlines
handles newlines properly, unlike split
.
It also can optionally return the newline character in the split result when called with a True
argument, which is useful in some specific scenarios.
Why you should NOT use split("n")
Using split
creates very confusing bugs when sharing files across operating systems.
n
in Python represents a Unix line-break (ASCII decimal code 10), independently of the OS where you run it. However, the ASCII linebreak representation is OS-dependent.
On Windows, n
is two characters, CR
and LF
(ASCII decimal codes 13 and 10, r
and n
), while on modern Unix (Mac OS X, Linux, Android), it’s the single character LF
.
print
works correctly even if you have a string with line endings that don’t match your platform:
>>> print " a n b rn c "
a
b
c
However, explicitly splitting on "n", has OS-dependent behaviour:
>>> " a n b rn c ".split("n")
[' a ', ' b r', ' c ']
Even if you use os.linesep
, it will only split according to the newline separator on your platform, and will fail if you’re processing text created in other platforms, or with a bare n
:
>>> " a n b rn c ".split(os.linesep)
[' a n b ', ' c ']
splitlines
solves all these problems:
>>> " a n b rn c ".splitlines()
[' a ', ' b ', ' c ']
Reading files in text mode partially mitigates the newline representation problem, as it converts Python’s n
into the platform’s newline representation.
However, text mode only exists on Windows. On Unix systems, all files are opened in binary mode, so using split('n')
in a UNIX system with a Windows file will lead to undesired behavior. This can also happen when transferring files in the network.
I would like to augment @1_CR ‘s answer: He led me to the following technique; it will use cStringIO if available (BUT NOTE: cStringIO and StringIO are not the same, because you cannot subclass cStringIO… it is a built-in… but for basic operations the syntax will be identical, so you can do this):
try:
import cStringIO
StringIO = cStringIO
except ImportError:
import StringIO
for line in StringIO.StringIO(variable_with_multiline_string):
pass
print line.strip()
The original post requested for code which prints some rows (if they are true for some condition) plus the following row.
My implementation would be this:
text = """1 sfasdf
asdfasdf
2 sfasdf
asdfgadfg
1 asfasdf
sdfasdgf
"""
text = text.splitlines()
rows_to_print = {}
for line in range(len(text)):
if text[line][0] == '1':
rows_to_print = rows_to_print | {line, line + 1}
rows_to_print = sorted(list(rows_to_print))
for i in rows_to_print:
print(text[i])