Remove specific rows from csv file if matching the elements of a list – python / windows
Question:
I have a csv file with a name and url per row (on the first column).
On the other hand I have a list with names coming from a script.
I would like to remove the rows in the csv file containing the names in the list.
It sounds simple but I tried several options and none works.
The csv format is:
John Doe, johndoe.blog.com
Jane Doe, janedoe.blog.com
Jim Foe, jimfoe.blog.com
The list:
not_ok_name= [John Doe , Jim Foe]
The output of the csv file should be:
Jane Doe, janedoe.blog.com
On the last attempt I tried the following solution :
count= 0
while count< len(not_ok_name):
writer = csv.writer(open('corrected.csv'))
for row in csv.reader('myfile.csv.csv'):
if not row[0].startswith(not_ok_name[count]):
writer.writerow(row)
writer.close()
Since I am still a newbie I look forward for some simple suggestions.
Thanks.
EDIT:
Just in case there could be some formatting issues with the original data, I am pasting the result of:
print repr(open("myfile.csv", "rb").read())
John Doe ,johndoe.blog.comrnJane Doe , janedoe.blog.com
I hope this could help.
Thanks
EDIT 2:
Here’s a code that partially does the work. It removes ONE name. Maybe it helps for developing one for the entire list.
reader = csv.reader(open("myfile.csv", "rb"), delimiter=',')
with open('corrected.csv', 'wb') as outfile:
writer = csv.writer(outfile)
for line in reader:
#for item in Names:
if not any ("Jim Foe" in x for x in line):
writer.writerow(line)
print line
Thanks again.
Answers:
Try this. It uses a generator to exclude the names in the not_ok_name
list.
import csv
with open("C:/path/a.csv","rU") as f,open("C:/path/des.csv","wb") as w:
not_ok_name= ["John Doe" , "Jim Foe"]
reader = csv.reader(f)
for row in reader:
name = row[0]
if name not in not_ok_name:
w.write(row)
not_ok_name = ["John", "Jim"]
not_ok_name = set(not_ok_name) # sets give us O(1) lookup times
with open('myfile.csv') as infile, open('corrected.csv', 'w') as outfile:
writer = csv.writer(outfile)
for name, url in csv.reader(infile): # for each row in the input file
fname = name.split(None, 1)[0]
if fname in not_ok_name:
continue # if the first name is in the list, ignore the row
writer.writerow([name, url])
I have a csv file with a name and url per row (on the first column).
On the other hand I have a list with names coming from a script.
I would like to remove the rows in the csv file containing the names in the list.
It sounds simple but I tried several options and none works.
The csv format is:
John Doe, johndoe.blog.com
Jane Doe, janedoe.blog.com
Jim Foe, jimfoe.blog.com
The list:
not_ok_name= [John Doe , Jim Foe]
The output of the csv file should be:
Jane Doe, janedoe.blog.com
On the last attempt I tried the following solution :
count= 0
while count< len(not_ok_name):
writer = csv.writer(open('corrected.csv'))
for row in csv.reader('myfile.csv.csv'):
if not row[0].startswith(not_ok_name[count]):
writer.writerow(row)
writer.close()
Since I am still a newbie I look forward for some simple suggestions.
Thanks.
EDIT:
Just in case there could be some formatting issues with the original data, I am pasting the result of:
print repr(open("myfile.csv", "rb").read())
John Doe ,johndoe.blog.comrnJane Doe , janedoe.blog.com
I hope this could help.
Thanks
EDIT 2:
Here’s a code that partially does the work. It removes ONE name. Maybe it helps for developing one for the entire list.
reader = csv.reader(open("myfile.csv", "rb"), delimiter=',')
with open('corrected.csv', 'wb') as outfile:
writer = csv.writer(outfile)
for line in reader:
#for item in Names:
if not any ("Jim Foe" in x for x in line):
writer.writerow(line)
print line
Thanks again.
Try this. It uses a generator to exclude the names in the not_ok_name
list.
import csv
with open("C:/path/a.csv","rU") as f,open("C:/path/des.csv","wb") as w:
not_ok_name= ["John Doe" , "Jim Foe"]
reader = csv.reader(f)
for row in reader:
name = row[0]
if name not in not_ok_name:
w.write(row)
not_ok_name = ["John", "Jim"]
not_ok_name = set(not_ok_name) # sets give us O(1) lookup times
with open('myfile.csv') as infile, open('corrected.csv', 'w') as outfile:
writer = csv.writer(outfile)
for name, url in csv.reader(infile): # for each row in the input file
fname = name.split(None, 1)[0]
if fname in not_ok_name:
continue # if the first name is in the list, ignore the row
writer.writerow([name, url])