Check whether string is in CSV

Question:

I want to search a CSV file and print either True or False, depending on whether or not I found the string. However, I’m running into the problem whereby it will return a false positive if it finds the string embedded in a larger string of text. E.g.: It will return True if string is foo and the term foobar is in the CSV file. I need to be able to return exact matches.

username = input()

if username in open('Users.csv').read():
    print("True")
else:
    print("False")

I’ve looked at using mmap, re and csv module functions, but I haven’t got anywhere with them.

EDIT: Here is an alternative method:

import re
import csv

username = input()

with open('Users.csv', 'rt') as f:
     reader = csv.reader(f)
     for row in reader:
          re.search(r'bNOTSUREHEREb', username)
Asked By: jars121

||

Answers:

You should have a look at the csv module in python.

is_in_file = False
with open('my_file.csv', 'rb') as csvfile:
    my_content = csv.reader(csvfile, delimiter=',')
    for row in my_content:
        if username in row:
            is_in_file = True
print is_in_file

It assumes that your delimiter is a comma (replace with the your delimiter. Note that username must be defined previously. Also change the name of the file.
The code loops through all the lines in the CSV file. row a list of string containing each element of your row. For example, if you have this in your CSV file: Joe,Peter,Michel the row will be ['Joe', 'Peter', 'Michel']. Then you can check if your username is in that list.

Answered By: Paco

when you look inside a csv file using the csv module, it will return each row as a list of columns. So if you want to lookup your string, you should modify your code as such:

import csv

username = input()

with open('Users.csv', 'rt') as f:
     reader = csv.reader(f, delimiter=',') # good point by @paco
     for row in reader:
          for field in row:
              if field == username:
                  print "is in file"

but as it is a csv file, you might expect the username to be at a given column:

with open('Users.csv', 'rt') as f:
     reader = csv.reader(f, delimiter=',')
     for row in reader:
          if username == row[2]: # if the username shall be on column 3 (-> index 2)
              print "is in file"
Answered By: zmo
import csv
scoresList=[]
with open ("playerScores_v2.txt") as csvfile:
           scores=csv.reader(csvfile, delimiter= ",")
           for row in scores:
              scoresList.append(row)


playername=input("Enter the player name you would like the score for:")
print("{0:40} {1:10} {2:10}".format("Name","Level","Score"))

for i in range(0,len(scoresList)):
   print("{0:40} {1:10} {2:10}".format(scoresList[i] [0],scoresList[i] [1], scoresList[i] [2]))
Answered By: OllieTaiani
#!/usr/bin/python
import csv

with open('my.csv', 'r') as f:
    lines = f.readlines()
    cnt = 0

    for entry in lines:
        if 'foo' in entry:
            cnt += 1

    print"No of foo entry Count :".ljust(20, '.'), cnt
Answered By: akD

I have used the top comment, it works and looks OK, but it was too slow for me.

I had an array of many strings that I wanted to check if they were in a large csv-file. No other requirements.

For this purpose I used (simplified, I iterated through a string of arrays and did other work than print):

with open('my_csv.csv', 'rt') as c:
    str_arr_csv = c.readlines()

Together with:

if str(my_str) in str(str_arr_csv):
    print("True")

The reduction in time was about ~90% for me. Code locks ugly but I’m all about speed. Sometimes.

Answered By: AlexanderKLMR

EXTENDED ALGO:
As i can have in my csv some values with space:
", atleft,atright , both " ,
I patch the code of zmo as follow

                if field.strip() == username:

and it’s ok, thanks.

OLD FASHION ALGO
i had previously coded an ‘old fashion’ algorithm that takes care of any allowed separators ( here comma, space and newline),so i was curious to compare performances.
With 10000 rounds on a very simple csv file, i got:

—————— algo 1 old fashion —————
done in 1.931804895401001 s.
—————— algo 2 with csv —————
done in 1.926626205444336 s.

As this is not too bad, 0.25% longer, i think that this good old hand made algo can help somebody (and will be useful if more parasitic chars as strip is only for spaces)
This algo uses bytes and can be used for anything else than strings.
It search for a name not embedded in another by checking left and right bytes that must be in the allowed separators.
It mainly uses loops with ejection asap through break or continue.

def separatorsNok(x):
    return (x!=44) and (x!=32) and (x!=10) and (x!=13) #comma space lf cr

# set as a function to be able to run several chained tests 
def searchUserName(userName, fileName):

    # read file as binary (supposed to be utf-8 as userName)
    f = open(fileName, 'rb')
    contents = f.read()
    lenOfFile = len(contents)   

    
    # set username in bytes 
    userBytes = bytearray(userName.encode('utf-8'))
    lenOfUser = len(userBytes)
    posInFile = 0
    posInUser = 0

    while posInFile < lenOfFile:
        found = False
        posInUser = 0

        # search full name
        while posInFile < lenOfFile:
            if (contents[posInFile] == userBytes[posInUser]):
                posInUser += 1
                if (posInUser == lenOfUser):
                    found = True
                    break
            posInFile += 1
        if not found:
            continue

        # found a fulll name, check if isolated on left and on right
        # left ok at very beginning or space or comma or new line 
        if (posInFile > lenOfUser):
            if separatorsNok(contents[posInFile-lenOfUser]): #previousLeft
                continue
        # right ok at very end or space or comma or new line 
        if (posInFile < lenOfFile-1):
            if separatorsNok(contents[posInFile+1]):  # nextRight
                continue
        # found and bordered
        break
    # main while
    if found:
            print(userName, "is in file") # at posInFile-lenOfUser+1)
    else:
        pass

to check: searchUserName('pirla','test.csv')

As other answers, code exit at first match but can be easily extended to find all.

HTH

Answered By: pirela
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.