What is the easiest way to get all strings that do not start with a character?
Question:
I am trying to parse about 20 million lines from a text file and am looking for a way to do some further manipulations on lines that do not start with question marks. I would like a solution that does not use regex matching. What I would like to do is something like this:
for line in x:
header = line.startswith('?')
if line.startswith() != header:
DO SOME STUFF HERE
I realize the startswith
method takes one argument, but is there any simple solution to get all lines from a line that DO NOT start with a question mark?
Answers:
Use generator expressions, the best way I think.
for line in (line for line in x if not line.startswith('?')):
DO_STUFF
Or your way:
for line in x:
if line.startswith("?"):
continue
DO_STUFF
Or:
for line in x:
if not line.startswith("?"):
DO_STUFF
It is really up to your programming style. I prefer the first one, but maybe second one seems simplier. But I don’t really like third one because of a lot of indentation.
Something like this is probably what you’re after:
with open('myfile.txt') as fh:
for line in fh:
if line[0] != '?': # strings can be accessed like lists - they're immutable sequences.
continue
# All of the processing here when lines don't start with question marks.
Similar to utdemir’s answer:
from itertools import ifilterfalse # just "filterfalse" if using Python 3
for line in ifilterfalse(lambda s: s.startswith('?'), lines):
# DO STUFF
http://docs.python.org/library/itertools.html#itertools.ifilterfalse
http://docs.python.org/dev/py3k/library/itertools.html#itertools.filterfalse
Here is a nice one-liner, which is very close to natural language.
String definition:
StringList = [ '__one', '__two', 'three', 'four' ]
Code which performs the deed:
BetterStringList = [ p for p in StringList if not(p.startswith('__'))]
I am trying to parse about 20 million lines from a text file and am looking for a way to do some further manipulations on lines that do not start with question marks. I would like a solution that does not use regex matching. What I would like to do is something like this:
for line in x:
header = line.startswith('?')
if line.startswith() != header:
DO SOME STUFF HERE
I realize the startswith
method takes one argument, but is there any simple solution to get all lines from a line that DO NOT start with a question mark?
Use generator expressions, the best way I think.
for line in (line for line in x if not line.startswith('?')):
DO_STUFF
Or your way:
for line in x:
if line.startswith("?"):
continue
DO_STUFF
Or:
for line in x:
if not line.startswith("?"):
DO_STUFF
It is really up to your programming style. I prefer the first one, but maybe second one seems simplier. But I don’t really like third one because of a lot of indentation.
Something like this is probably what you’re after:
with open('myfile.txt') as fh:
for line in fh:
if line[0] != '?': # strings can be accessed like lists - they're immutable sequences.
continue
# All of the processing here when lines don't start with question marks.
Similar to utdemir’s answer:
from itertools import ifilterfalse # just "filterfalse" if using Python 3
for line in ifilterfalse(lambda s: s.startswith('?'), lines):
# DO STUFF
http://docs.python.org/library/itertools.html#itertools.ifilterfalse
http://docs.python.org/dev/py3k/library/itertools.html#itertools.filterfalse
Here is a nice one-liner, which is very close to natural language.
String definition:
StringList = [ '__one', '__two', 'three', 'four' ]
Code which performs the deed:
BetterStringList = [ p for p in StringList if not(p.startswith('__'))]