How to remove empty lines with or without whitespace
Question:
I have large string which I split by newlines.
How can I remove all lines that are empty, (whitespace only)?
pseudo code:
for stuff in largestring:
remove stuff that is blank
Answers:
Try list comprehension and string.strip()
:
>>> mystr = "L1nL2nnL3nL4n nnL5"
>>> mystr.split('n')
['L1', 'L2', '', 'L3', 'L4', ' ', '', 'L5']
>>> [line for line in mystr.split('n') if line.strip()]
['L1', 'L2', 'L3', 'L4', 'L5']
lines = bigstring.split('n')
lines = [line for line in lines if line.strip()]
I also tried regexp and list solutions, and list one is faster.
Here is my solution (by previous answers):
text = "n".join([ll.rstrip() for ll in original_text.splitlines() if ll.strip()])
If you are not willing to try regex (which you should), you can use this:
s.replace('nn','n')
Repeat this several times to make sure there is no blank line left. Or chaining the commands:
s.replace('nn','n').replace('nn','n')
Just to encourage you to use regex, here are two introductory videos that I find intuitive:
• Regular Expressions (Regex) Tutorial
• Python Tutorial: re Module
My version:
while '' in all_lines:
all_lines.pop(all_lines.index(''))
Same as what @NullUserException said, this is how I write it:
removedWhitespce = re.sub(r'^s*$', '', line)
Surprised a multiline re.sub has not been suggested (Oh, because you’ve already split your string… But why?):
>>> import re
>>> a = "Foon nBarnBaznn Garplyn n"
>>> print a
Foo
Bar
Baz
Garply
>>> print(re.sub(r'ns*n','n',a,re.MULTILINE))
Foo
Bar
Baz
Garply
>>>
I use this solution to delete empty lines and join everything together as one line:
match_p = re.sub(r's{2}', '', my_txt) # my_txt is text above
you can simply use rstrip:
for stuff in largestring:
print(stuff.rstrip("n")
Use positive lookbehind regex:
re.sub(r'(?<=n)s+', '', s, re.MULTILINE)
When you input:
foo
<tab> <tab>
bar
The output will be:
foo
bar
str_whith_space = """
example line 1
example line 2
example line 3
example line 4"""
new_str = 'n'.join(el.strip() for el in str_whith_space.split('n') if el.strip())
print(new_str)
Output:
""" <br>
example line 1 <br>
example line 2 <br>
example line 3 <br>
example line 4 <br>
"""
You can combine map
and strip
to remove spaces and use filter(None, iterable)
to remove empty elements:
string = "an nnb"
list_of_str = string.split("n")
list_of_str = filter(None, map(str.strip, list_of_str))
list(list_of_str)
Returns: ['a', 'b']
html_content = [l for l in html_content.splitlines() if l.rstrip()]
html_content = "n".join(html_content)
I have large string which I split by newlines.
How can I remove all lines that are empty, (whitespace only)?
pseudo code:
for stuff in largestring:
remove stuff that is blank
Try list comprehension and string.strip()
:
>>> mystr = "L1nL2nnL3nL4n nnL5"
>>> mystr.split('n')
['L1', 'L2', '', 'L3', 'L4', ' ', '', 'L5']
>>> [line for line in mystr.split('n') if line.strip()]
['L1', 'L2', 'L3', 'L4', 'L5']
lines = bigstring.split('n')
lines = [line for line in lines if line.strip()]
I also tried regexp and list solutions, and list one is faster.
Here is my solution (by previous answers):
text = "n".join([ll.rstrip() for ll in original_text.splitlines() if ll.strip()])
If you are not willing to try regex (which you should), you can use this:
s.replace('nn','n')
Repeat this several times to make sure there is no blank line left. Or chaining the commands:
s.replace('nn','n').replace('nn','n')
Just to encourage you to use regex, here are two introductory videos that I find intuitive:
• Regular Expressions (Regex) Tutorial
• Python Tutorial: re Module
My version:
while '' in all_lines:
all_lines.pop(all_lines.index(''))
Same as what @NullUserException said, this is how I write it:
removedWhitespce = re.sub(r'^s*$', '', line)
Surprised a multiline re.sub has not been suggested (Oh, because you’ve already split your string… But why?):
>>> import re
>>> a = "Foon nBarnBaznn Garplyn n"
>>> print a
Foo
Bar
Baz
Garply
>>> print(re.sub(r'ns*n','n',a,re.MULTILINE))
Foo
Bar
Baz
Garply
>>>
I use this solution to delete empty lines and join everything together as one line:
match_p = re.sub(r's{2}', '', my_txt) # my_txt is text above
you can simply use rstrip:
for stuff in largestring:
print(stuff.rstrip("n")
Use positive lookbehind regex:
re.sub(r'(?<=n)s+', '', s, re.MULTILINE)
When you input:
foo
<tab> <tab>
bar
The output will be:
foo
bar
str_whith_space = """
example line 1
example line 2
example line 3
example line 4"""
new_str = 'n'.join(el.strip() for el in str_whith_space.split('n') if el.strip())
print(new_str)
Output:
""" <br>
example line 1 <br>
example line 2 <br>
example line 3 <br>
example line 4 <br>
"""
You can combine map
and strip
to remove spaces and use filter(None, iterable)
to remove empty elements:
string = "an nnb"
list_of_str = string.split("n")
list_of_str = filter(None, map(str.strip, list_of_str))
list(list_of_str)
Returns: ['a', 'b']
html_content = [l for l in html_content.splitlines() if l.rstrip()]
html_content = "n".join(html_content)