difference between readlines() and split() [python]
Question:
imagine we have a file = open("filetext.txt", 'r')
what is the difference between the split() method and the readlines() method
It seems that both split each line and put it as a string in a list.
so what makes them different ?
for line in file:
values = line.split() #break each line into a list
file.readlines() #return a list of strings each represent a single line in the file
Answers:
This is the main difference:
A file
object has readlines
but not split
:
>>> print hasattr(file, 'split')
False
>>> print hasattr(file, 'readlines')
True
A str
object has split
but not readlines
:
>>> hasattr("somestring", 'split')
True
>>> hasattr("somestring", 'readlines')
False
And to answer your question, one is operating on a string object and one is operating on a file object.
They don’t do the same thing since one returns a list of lines when operating on a file and one returns a split line when operating on a string.
readlines
splits the entire file into lines and is nearly equivalent to file.read().split('n')
, except that the latter will remove new lines, whereas readlines
by itself will retain new lines.
Your example,
for line in file:
values = line.split()
splits each line by its spaces, building a list of words in the line. value
is overwritten on each iteration so unless you save values somewhere, only parts of the file are in-memory at a single time.
readlines
does platform agnostic line splitting and split
does generic splitting.
As an example:
In [1]: from StringIO import StringIO
In [2]: StringIO('test:test:test').readlines()
Out[2]: ['test:test:test']
In [3]: StringIO('test:test:test').read().split(':')
Out[3]: ['test', 'test', 'test']
imagine we have a file = open("filetext.txt", 'r')
what is the difference between the split() method and the readlines() method
It seems that both split each line and put it as a string in a list.
so what makes them different ?
for line in file:
values = line.split() #break each line into a list
file.readlines() #return a list of strings each represent a single line in the file
This is the main difference:
A file
object has readlines
but not split
:
>>> print hasattr(file, 'split')
False
>>> print hasattr(file, 'readlines')
True
A str
object has split
but not readlines
:
>>> hasattr("somestring", 'split')
True
>>> hasattr("somestring", 'readlines')
False
And to answer your question, one is operating on a string object and one is operating on a file object.
They don’t do the same thing since one returns a list of lines when operating on a file and one returns a split line when operating on a string.
readlines
splits the entire file into lines and is nearly equivalent to file.read().split('n')
, except that the latter will remove new lines, whereas readlines
by itself will retain new lines.
Your example,
for line in file:
values = line.split()
splits each line by its spaces, building a list of words in the line. value
is overwritten on each iteration so unless you save values somewhere, only parts of the file are in-memory at a single time.
readlines
does platform agnostic line splitting and split
does generic splitting.
As an example:
In [1]: from StringIO import StringIO
In [2]: StringIO('test:test:test').readlines()
Out[2]: ['test:test:test']
In [3]: StringIO('test:test:test').read().split(':')
Out[3]: ['test', 'test', 'test']