I'm trying to find words from a text file in another text file

Question:

I built a simple graphical user interface (GUI) with basketball info to make finding information about players easier. The GUI utilizes data that has been scraped from various sources using the ‘requests’ library. It works well but there is a problem; within my code lies a list of players which must be compared against this scraped data in order for everything to work properly. This means that if I want to add or remove any names from this list, I have to go into my IDE or directly into my code – I need to change this. Having an external text file where all these player names can be stored would provide much needed flexibility when managing them.

#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]

#This is how the info in the scrapped file looks like:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.

with open("freeze.csv",'r') as freeze:
    for word in basketball:
        if word in freeze:
            freeze.write(word)

# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way

# I tried differents solutions but none of them work for me

with open('final_G_league.csv') as text,  open('freeze1.csv') as filter_words:
    st = set(map(str.rstrip,filter_words))
    txt = next(text).split()
    out = [word  for word in txt if word not in st]

# This one gives me the first line of the scrapped text

import csv

file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')

data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)

# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]

for i in range(len(data1)):
    if data1[i] != data2[i]:
        print("Line " + str(i) + " is a mismatch.")
        print(f"{data1[i]} doesn't match {data2[i]}")

file1.close()
file2.close()

#This one returns a list with a bunch of names and a list index error.

file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if j in i:

# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file
Asked By: DrFox

||

Answers:

If file2 is just a list of names and want to extract those rows in first file where the name column matches a name in the list.

Suggest you make the "freeze" file a text file with one-name per line and remove the single quotes from the names then can more easily parse it.

Can then do something like this to match the names from one file against the other.

import csv

# convert the names data to a list
with open("freeze1.txt",'r') as file2:
  names = [s.strip() for s in file2]
  print("names:", names)

# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
  reader = csv.reader(file1)
  next(reader) # skip header
  for row in reader:
    if row[0] in names:
      # print matching name that matches
      print(row[0])

If names don’t match exactly as appears in the final_G_league file then may need to adjust accordingly such as doing a case-insensitive match or normalizing names (last, first vs first last), etc.

Answered By: CodeMonkey

One solution would be to store the list of players in a separate text file, and then read that file in your code.

You can use the open() function to open the text file, and the readlines() method to read the file and store it in a list. Once you have the list, you can iterate through it just like you would with the current list in your code.

Here is an example of how you could read the player names from a text file called players.txt and store it in a list called basketball:

with open('players.txt') as f:
    basketball = f.readlines()

You can then use this list in your code in the same way that you were using the original list.
You can use the with open statement to open the file, it will automatically close the file after the indented block of code is executed.

You can also use the csv module to read the data from a CSV file that contains the player names, it will help you to read the data as list of rows, thus you can iterate through the data easily.

import csv

with open('players.csv') as f:
    basketball = [row for row in csv.reader(f)]

This way you can easily manage the player names in an external file and change it without modifying the code.

You can also try to use the pandas library to handle csv files, it’s a powerful tool for data manipulation.

Answered By: ArthYork

Let’s assume we have following input files:

freeze_list.txt – comma separated list of filter words (players) enclosed in quotes:

'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'

final_G_league.csv – scrapped lines that we want to filter, using words from the freeze_list.txt file:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

I would split the responsibilities of the script in code segments to make it more readable and manageable:

  1. Define constants (later you could make them parameters)
  2. Read filter words from a file
  3. Filter scrapped lines
  4. Dump output to a file

The constants:

FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"

Read filter words from a file:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = eval('(' + filter_words_file.read() + ')')

Filter lines from the scrapped file:

matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
    for line in scrapped_file:
        # Check if any of the keywords is found in the line
        for filter_word in filter_words:
            if filter_word in line:
                matched_lines.append(line)
                # stop checking other words for performance and 
                # to avoid sending same line multipe times to the output
                break

Dump filtered lines into a file:

with open(FILTERED_FILE_NAME, "w") as filtered_file:
    for line in matched_lines:
        filtered_file.write(line)

The output freeze.csv after running above segments in a sequence is:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

Suggestion

Not sure why you have chosen to store the filter words in a comma separated list. I would prefer using a plain list of words – one word per line.

freeze_list.txt:

Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn

The reading becomes straightforward:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = [word.strip() for word in filter_words_file]

The output freeze.csv is the same:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
Answered By: Ivan Georgiev
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.