How to read first N lines of a file?

Question:

We have a large raw data file that we would like to trim to a specified size.

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?

Asked By: Russell

||

Answers:

Python 3:

with open("datafile") as myfile:
    head = [next(myfile) for x in range(N)]
print(head)

Python 2:

with open("datafile") as myfile:
    head = [next(myfile) for x in xrange(N)]
print head

Here’s another way (both Python 2 & 3):

from itertools import islice

with open("datafile") as myfile:
    head = list(islice(myfile, N))
print(head)
Answered By: John La Rooy

There is no specific method to read number of lines exposed by file object.

I guess the easiest way would be following:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))
Answered By: artdanil

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)
Answered By: John Machin
N = 10
with open("file.txt", "a") as file:  # the a opens it in append mode
    for i in range(N):
        line = next(file).strip()
        print(line)
Answered By: ghostdog74

Based on gnibbler top voted answer (Nov 20 ’09 at 0:27): this class add head() and tail() method to file object.

class File(file):
    def head(self, lines_2find=1):
        self.seek(0)                            #Rewind file
        return [self.next() for x in xrange(lines_2find)]

    def tail(self, lines_2find=1):  
        self.seek(0, 2)                         #go to end of file
        bytes_in_file = self.tell()             
        lines_found, total_bytes_scanned = 0, 0
        while (lines_2find+1 > lines_found and
               bytes_in_file > total_bytes_scanned): 
            byte_block = min(1024, bytes_in_file-total_bytes_scanned)
            self.seek(-(byte_block+total_bytes_scanned), 2)
            total_bytes_scanned += byte_block
            lines_found += self.read(1024).count('n')
        self.seek(-total_bytes_scanned, 2)
        line_list = list(self.readlines())
        return line_list[-lines_2find:]

Usage:

f = File('path/to/file', 'r')
f.head(3)
f.tail(3)
Answered By: fdb

most convinient way on my own:

LINE_COUNT = 3
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

Solution based on List Comprehension
The function open() supports an iteration interface. The enumerate() covers open() and return tuples (index, item), then we check that we’re inside an accepted range (if i < LINE_COUNT) and then simply print the result.

Enjoy the Python. 😉

Answered By: Maxim Plaksin

Starting at Python 2.6, you can take advantage of more sophisticated functions in the IO base clase. So the top rated answer above can be rewritten as:

    with open("datafile") as myfile:
       head = myfile.readlines(N)
    print head

(You don’t have to worry about your file having less than N lines since no StopIteration exception is thrown.)

Answered By: Steve Bading

If you want to read the first lines quickly and you don’t care about performance you can use .readlines() which returns list object and then slice the list.

E.g. for the first 5 lines:

with open("pathofmyfileandfileandname") as myfile:
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

Note: the whole file is read so is not the best from the performance point of view but it
is easy to use, fast to write and easy to remember so if you want just perform
some one-time calculation is very convenient

print firstNlines

One advantage compared to the other answers is the possibility to select easily the range of lines e.g. skipping the first 10 lines [10:30] or the lasts 10 [:-10] or taking only even lines [::2].

Answered By: G M

If you have a really big file, and assuming you want the output to be a numpy array, using np.genfromtxt will freeze your computer. This is so much better in my experience:

def load_big_file(fname,maxrows):
'''only works for well-formed text file of space-separated doubles'''

rows = []  # unknown number of lines, so use list

with open(fname) as f:
    j=0        
    for line in f:
        if j==maxrows:
            break
        else:
            line = [float(s) for s in line.split()]
            rows.append(np.array(line, dtype = np.double))
            j+=1
return np.vstack(rows)  # convert list of vectors to array
Answered By: Alejandro D. Somoza

For first 5 lines, simply do:

N=5
with open("data_file", "r") as file:
    for i in range(N):
       print file.next()
Answered By: Surya

What I do is to call the N lines using pandas. I think the performance is not the best, but for example if N=1000:

import pandas as pd
yourfile = pd.read_csv('path/to/your/file.csv',nrows=1000)
Answered By: RRuiz
#!/usr/bin/python

import subprocess

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE)

output, err = p.communicate()

print  output

This Method Worked for me

Answered By: Mansur Ul Hasan

The two most intuitive ways of doing this would be:

  1. Iterate on the file line-by-line, and break after N lines.

  2. Iterate on the file line-by-line using the next() method N times. (This is essentially just a different syntax for what the top answer does.)

Here is the code:

# Method 1:
with open("fileName", "r") as f:
    counter = 0
    for line in f:
        print line
        counter += 1
        if counter == N: break

# Method 2:
with open("fileName", "r") as f:
    for i in xrange(N):
        line = f.next()
        print line

The bottom line is, as long as you don’t use readlines() or enumerateing the whole file into memory, you have plenty of options.

Answered By: FatihAkici

This worked for me

f = open("history_export.csv", "r")
line= 5
for x in range(line):
    a = f.readline()
    print(a)
Answered By: Sukanta

This works for Python 2 & 3:

from itertools import islice

with open('/tmp/filename.txt') as inf:
    for line in islice(inf, N, N+M):
        print(line)
Answered By: sandyp

fname = input("Enter file name: ")
num_lines = 0

with open(fname, 'r') as f: #lines count
    for line in f:
        num_lines += 1

num_lines_input = int (input("Enter line numbers: "))

if num_lines_input <= num_lines:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)

else:
    f = open(fname, "r")
    for x in range(num_lines_input):
        a = f.readline()
        print(a)
        print("Don't have", num_lines_input, " lines print as much as you can")


print("Total lines in the text",num_lines)

Answered By: Shakirul

I would like to handle the file with less than n-lines by reading the whole file

def head(filename: str, n: int):
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

Credit go to John La Rooy and Ilian Iliev. Use the function for the best performance with exception handle

Revise 1: Thanks FrankM for the feedback, to handle file existence and read permission we can futher add

import errno
import os

def head(filename: str, n: int):
    if not os.path.isfile(filename):
        raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), filename)  
    if not os.access(filename, os.R_OK):
        raise PermissionError(errno.EACCES, os.strerror(errno.EACCES), filename)     
   
    try:
        with open(filename) as f:
            head_lines = [next(f).rstrip() for x in range(n)]
    except StopIteration:
        with open(filename) as f:
            head_lines = f.read().splitlines()
    return head_lines

You can either go with second version or go with the first one and handle the file exception later. The check is quick and mostly free from performance standpoint

Answered By: Linh K Ha

Simply Convert your CSV file object to a list using list(file_data)

import csv;
with open('your_csv_file.csv') as file_obj:
    file_data = csv.reader(file_obj);
    file_list = list(file_data)
    for row in file_list[:4]:
        print(row)
Answered By: shivam singh

Here’s another decent solution with a list comprehension:

file = open('file.txt', 'r')

lines = [next(file) for x in range(3)]  # first 3 lines will be in this list

file.close()
Answered By: Oleksandr Novik

An easy way to get first 10 lines:

with open('fileName.txt', mode = 'r') as file:
    list = [line.rstrip('n') for line in file][:10]
    print(list)
Answered By: Gelzone
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.