keep x rows and delete all from csv file

Question:

I want to be able to specify how many rows I want to keep and delete the rest, also preserving the header.

I found some code which let’s you delete the first 5 rows but how can I make it do what I want?

with open ('myfile.csv', 'wb') as outfile:
    outfile.writelines(data_in[1])
    outfile.writelines(data_in[5:])

For example if I have this CSV

6.5, 5.4, 0, 000
6.5, 5.4, 1, 610
1.2, 4.0, 0, 530
3.2, 5.4, 1, 330
4.2, 3.0, 0, 320
5.5, 2.3, 1, 780
1.3, 4.4, 0, 520
5.3, 1.0, 0, 420

I just want to specify a number to my script… let’s say (2) and it will KEEP 2 rows and remove all others

output would become:

6.5, 5.4, 0, 000
6.5, 5.4, 1, 610

Can i also make it save it with a different name?

Asked By: Saffik

||

Answers:

If you first read your original CSV-file into variable data_in with commands

with open('my_original_file.csv') as inp:
     data_in = inp.readlines()

you may continue:

n = int(input("How many rows after header you want to write: "))

with open('myfile.csv', 'w') as outfile:
    outfile.writelines(data_in[:n+1])

This will write

  • the header row — data_in[0], and
  • subsequent n rows — data_in[1] to data_in[n]
Answered By: MarianD

With pandas it is very easy to do, you can use head:

#reading the csv file (remove header=None if you have column names)
df = pd.read_csv('myfile.csv',header=None)

#selecting only first 2 rows
df = df.head(2)

#saving the csv file (remove header= None if you have column names)
df.to_csv('output.csv',index=False, header=False)

Or simply:

df = pd.read_csv('myfile.csv',header=None)
df.head(2).to_csv('output.csv',index=False, header=False)

Output:

6.5,5.4,0,0
6.5,5.4,1,610
Answered By: Grayrigel

Keeping the first n lines and remove everything else:

with open(filename, 'r+') as f:
    for i in range(n):
        f.readline() # read each line
    f.truncate(f.tell()) # terminate the file here
Answered By: user3503711
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.