Pythonic way to ignore for loop control variable
Question:
A Python program I’m writing is to read a set number of lines from the top of a file, and the program needs to preserve this header for future use. Currently, I’m doing something similar to the following:
header = ''
header_len = 4
for i in range(1, header_len):
header += file_handle.readline()
Pylint complains that I’m not using the variable i
. What would be a more pythonic way to do this?
Edit: The purpose of the program is to intelligently split the original file into smaller files, each of which contains the original header and a subset of the data. So, I need to read and preserve just the header before reading the rest of the file.
Answers:
May be this:
header_len = 4
header = open("file.txt").readlines()[:header_len]
But, it will be troublesome for long files.
I’m not sure what the Pylint rules are, but you could use the ‘_’ throwaway variable name.
header = ''
header_len = 4
for _ in range(1, header_len):
header += file_handle.readline()
import itertools
header_lines = list(itertools.islice(file_handle, header_len))
# or
header = "".join(itertools.islice(file_handle, header_len))
Note that with the first, the newline chars will still be present, to strip them:
header_lines = list(n.rstrip("n")
for n in itertools.islice(file_handle, header_len))
My best answer is as follows:
file test.dat:
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
Python script:
f = open('test.dat')
nlines = 4
header = "".join(f.readline() for _ in range(nlines))
Output:
>>> header
'This is line 1nThis is line 2nThis is line 3nThis is line 4n'
Notice that you don’t need to call any modules; also that you could use any dummy variable in place of _
(it works with i
, or j
, or ni
, or whatever) but I recomend you don’t (to avoid confusion). You could strip the newline characters (though I don’t recommend you do – this way you can distinguish among lines) or do anything that you can do with strings in Python.
Notice that I did not provide a mode for opening the file, so it defaults to “read only” – this is not Pythonic; in Python “explicit is better than implicit”. Finally, nice people close their files; in this case it is automatic (because the script ends) but it is best practice to close them using f.close()
.
Happy Pythoning.
Edit: As pointed out by Roger Pate the square brackets are unnecessary in the list comprehension, thereby reducing the line by two characters. The original script has been edited to reflect this.
s=""
f=open("file")
for n,line in enumerate(f):
if n<=3 : s=s+line
else:
# do something here to process the rest of the lines
print s
f.close()
I do not see any thing wrong with your solution, may be just replace i with _, I also do not like invoking itertools everywhere where simpler solution will work, it is like people using jQuery for trivial javascript tasks. anyway just to have itertools revenge here is my solution
as you want to read whole file anyway line by line, why not just first read header and after that do whatever you want to do
header = ''
header_len = 4
for i, line in enumerate(file_handle):
if i < header_len:
header += line
else:
# output chunks to separate files
pass
print header
What about:
header = []
for i,l in enumerate(file_handle):
if i <= 3:
header += l
continue
#proc rest of file here
f = open('fname')
header = [next(f) for _ in range(header_len)]
Since you’re going to write header back to the new files, you don’t need to do anything with it. To write it back to the new file:
open('new', 'w').writelines(header + list_of_lines)
if you know the number of lines in the old file, list_of_lines
would become:
list_of_lines = [next(f) for _ in range(chunk_len)]
One problem with using _ as a dummy variable is that it only solves the problem on one level, consider something like the following.
def f(n, m):
"""A function to run g() n times and run h() m times per g."""
for _ in range(n):
g()
for _ in range(m):
h()
return 0
This function works fine but the _ iterator over m runs is problematic as it may conflict with the upper _. In any case PyCharm is complaining about this kind of syntax.
So I would argue that _ is not as “throwaway” as was suggested before.
Perhaps you might like to just create a function to do it!
def run(f, n, *args):
"""Runs f with the arguments from the args tuple n times."""
for _ in range(n):
f(*args)
e.g. you could use it like this:
>>> def ft(x, L):
... L.append(x)
>>> a = 7
>>> nums = [4, 1]
>>> run(ft, 10, a, nums)
>>> nums
[4, 1, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7]
A Python program I’m writing is to read a set number of lines from the top of a file, and the program needs to preserve this header for future use. Currently, I’m doing something similar to the following:
header = ''
header_len = 4
for i in range(1, header_len):
header += file_handle.readline()
Pylint complains that I’m not using the variable i
. What would be a more pythonic way to do this?
Edit: The purpose of the program is to intelligently split the original file into smaller files, each of which contains the original header and a subset of the data. So, I need to read and preserve just the header before reading the rest of the file.
May be this:
header_len = 4
header = open("file.txt").readlines()[:header_len]
But, it will be troublesome for long files.
I’m not sure what the Pylint rules are, but you could use the ‘_’ throwaway variable name.
header = ''
header_len = 4
for _ in range(1, header_len):
header += file_handle.readline()
import itertools
header_lines = list(itertools.islice(file_handle, header_len))
# or
header = "".join(itertools.islice(file_handle, header_len))
Note that with the first, the newline chars will still be present, to strip them:
header_lines = list(n.rstrip("n")
for n in itertools.islice(file_handle, header_len))
My best answer is as follows:
file test.dat:
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
Python script:
f = open('test.dat')
nlines = 4
header = "".join(f.readline() for _ in range(nlines))
Output:
>>> header
'This is line 1nThis is line 2nThis is line 3nThis is line 4n'
Notice that you don’t need to call any modules; also that you could use any dummy variable in place of _
(it works with i
, or j
, or ni
, or whatever) but I recomend you don’t (to avoid confusion). You could strip the newline characters (though I don’t recommend you do – this way you can distinguish among lines) or do anything that you can do with strings in Python.
Notice that I did not provide a mode for opening the file, so it defaults to “read only” – this is not Pythonic; in Python “explicit is better than implicit”. Finally, nice people close their files; in this case it is automatic (because the script ends) but it is best practice to close them using f.close()
.
Happy Pythoning.
Edit: As pointed out by Roger Pate the square brackets are unnecessary in the list comprehension, thereby reducing the line by two characters. The original script has been edited to reflect this.
s=""
f=open("file")
for n,line in enumerate(f):
if n<=3 : s=s+line
else:
# do something here to process the rest of the lines
print s
f.close()
I do not see any thing wrong with your solution, may be just replace i with _, I also do not like invoking itertools everywhere where simpler solution will work, it is like people using jQuery for trivial javascript tasks. anyway just to have itertools revenge here is my solution
as you want to read whole file anyway line by line, why not just first read header and after that do whatever you want to do
header = ''
header_len = 4
for i, line in enumerate(file_handle):
if i < header_len:
header += line
else:
# output chunks to separate files
pass
print header
What about:
header = []
for i,l in enumerate(file_handle):
if i <= 3:
header += l
continue
#proc rest of file here
f = open('fname')
header = [next(f) for _ in range(header_len)]
Since you’re going to write header back to the new files, you don’t need to do anything with it. To write it back to the new file:
open('new', 'w').writelines(header + list_of_lines)
if you know the number of lines in the old file, list_of_lines
would become:
list_of_lines = [next(f) for _ in range(chunk_len)]
One problem with using _ as a dummy variable is that it only solves the problem on one level, consider something like the following.
def f(n, m):
"""A function to run g() n times and run h() m times per g."""
for _ in range(n):
g()
for _ in range(m):
h()
return 0
This function works fine but the _ iterator over m runs is problematic as it may conflict with the upper _. In any case PyCharm is complaining about this kind of syntax.
So I would argue that _ is not as “throwaway” as was suggested before.
Perhaps you might like to just create a function to do it!
def run(f, n, *args):
"""Runs f with the arguments from the args tuple n times."""
for _ in range(n):
f(*args)
e.g. you could use it like this:
>>> def ft(x, L):
... L.append(x)
>>> a = 7
>>> nums = [4, 1]
>>> run(ft, 10, a, nums)
>>> nums
[4, 1, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7]