keep x rows and delete all from csv file
Question:
I want to be able to specify how many rows I want to keep and delete the rest, also preserving the header.
I found some code which let’s you delete the first 5 rows but how can I make it do what I want?
with open ('myfile.csv', 'wb') as outfile:
outfile.writelines(data_in[1])
outfile.writelines(data_in[5:])
For example if I have this CSV
6.5, 5.4, 0, 000
6.5, 5.4, 1, 610
1.2, 4.0, 0, 530
3.2, 5.4, 1, 330
4.2, 3.0, 0, 320
5.5, 2.3, 1, 780
1.3, 4.4, 0, 520
5.3, 1.0, 0, 420
I just want to specify a number to my script… let’s say (2) and it will KEEP 2 rows and remove all others
output would become:
6.5, 5.4, 0, 000
6.5, 5.4, 1, 610
Can i also make it save it with a different name?
Answers:
If you first read your original CSV-file into variable data_in
with commands
with open('my_original_file.csv') as inp:
data_in = inp.readlines()
you may continue:
n = int(input("How many rows after header you want to write: "))
with open('myfile.csv', 'w') as outfile:
outfile.writelines(data_in[:n+1])
This will write
- the header row —
data_in[0]
, and
- subsequent n rows —
data_in[1]
to data_in[n]
With pandas
it is very easy to do, you can use head
:
#reading the csv file (remove header=None if you have column names)
df = pd.read_csv('myfile.csv',header=None)
#selecting only first 2 rows
df = df.head(2)
#saving the csv file (remove header= None if you have column names)
df.to_csv('output.csv',index=False, header=False)
Or simply:
df = pd.read_csv('myfile.csv',header=None)
df.head(2).to_csv('output.csv',index=False, header=False)
Output:
6.5,5.4,0,0
6.5,5.4,1,610
Keeping the first n
lines and remove everything else:
with open(filename, 'r+') as f:
for i in range(n):
f.readline() # read each line
f.truncate(f.tell()) # terminate the file here
I want to be able to specify how many rows I want to keep and delete the rest, also preserving the header.
I found some code which let’s you delete the first 5 rows but how can I make it do what I want?
with open ('myfile.csv', 'wb') as outfile:
outfile.writelines(data_in[1])
outfile.writelines(data_in[5:])
For example if I have this CSV
6.5, 5.4, 0, 000
6.5, 5.4, 1, 610
1.2, 4.0, 0, 530
3.2, 5.4, 1, 330
4.2, 3.0, 0, 320
5.5, 2.3, 1, 780
1.3, 4.4, 0, 520
5.3, 1.0, 0, 420
I just want to specify a number to my script… let’s say (2) and it will KEEP 2 rows and remove all others
output would become:
6.5, 5.4, 0, 000
6.5, 5.4, 1, 610
Can i also make it save it with a different name?
If you first read your original CSV-file into variable data_in
with commands
with open('my_original_file.csv') as inp:
data_in = inp.readlines()
you may continue:
n = int(input("How many rows after header you want to write: "))
with open('myfile.csv', 'w') as outfile:
outfile.writelines(data_in[:n+1])
This will write
- the header row —
data_in[0]
, and - subsequent n rows —
data_in[1]
todata_in[n]
With pandas
it is very easy to do, you can use head
:
#reading the csv file (remove header=None if you have column names)
df = pd.read_csv('myfile.csv',header=None)
#selecting only first 2 rows
df = df.head(2)
#saving the csv file (remove header= None if you have column names)
df.to_csv('output.csv',index=False, header=False)
Or simply:
df = pd.read_csv('myfile.csv',header=None)
df.head(2).to_csv('output.csv',index=False, header=False)
Output:
6.5,5.4,0,0
6.5,5.4,1,610
Keeping the first n
lines and remove everything else:
with open(filename, 'r+') as f:
for i in range(n):
f.readline() # read each line
f.truncate(f.tell()) # terminate the file here