Check if a string matches any of the lines of a txt file, and if it doesn't match any then add it to that txt file

Question:

import os

if (os.path.isfile('data_file.txt')): 
    data_memory_file_path = 'data_file.txt'
else:
    open('data_file.txt', "w").close()
    data_memory_file_path = 'data_file.txt'

#Example input list with info in sublists
reordered_input_info_lists = [
    [['corre'], ['en el patio'], ['2023-02-05 00:00 am']], 
    [['corre'], ['en el patio'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']], 
    [['salta'], ['en el bosque'], ['2023-02-05 00:00 am']], 
    [['salta'], ['en el patio'], ['2023-02-05 00:00 am']], 
    [['dibuja'], ['en el bosque'], ['2023-02-05 00:00 am']], 
    [['dibuja'], ['en el patio'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]]

#I decompose the main list into the sublists that compose it, and each sublist will be a string
# that will be evaluated if it matches any of the already existing lines in the .txt
for info_list in reordered_input_info_lists:
    #I convert the list to string to have it ready to compare it with the lines of the txt file
    info_list_str = repr(info_list)

    #THIS IS WHERE I HAVE THE PROBLEM, AND IT IS WHERE THE CHECK OF THE TXT LINES SHOULD BE

This is the text content that is contained in data_file.txt (assuming it is already created in this case)

[['analiza'], ['en la oficina'], ['2022-02-05 00:00 am']]
[['corre'], ['en el bosque'], ['2023-02-05 00:00 am']]
[['corre'], ['en el bosque'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]
[['corre'], ['en el patio'], ['2023-02-05 00:00 am']]
[['corre'], ['en el patio'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]
[['dibuja'], ['en el estudio de animación'], ['2023-02-05 00:00 am']]
[['dibuja'], ['en el estudio de animación'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]
[['dibuja'], ['en la escuela'], ['2023-02-05 00:00 am']]
[['dibuja'], ['en la escuela'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]]

After adding all the lines that were not there in the data_file.txt, the content of the file would look like this:

[['analiza'], ['en la oficina'], ['2022-02-05 00:00 am']]
[['corre'], ['en el bosque'], ['2023-02-05 00:00 am']]
[['corre'], ['en el bosque'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]
[['corre'], ['en el patio'], ['2023-02-05 00:00 am']]
[['corre'], ['en el patio'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]
[['dibuja'], ['en el bosque'], ['2023-02-05 00:00 am']], 
[['dibuja'], ['en el estudio de animación'], ['2023-02-05 00:00 am']]
[['dibuja'], ['en el estudio de animación'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]
[['dibuja'], ['en el patio'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]]
[['dibuja'], ['en la escuela'], ['2023-02-05 00:00 am']]
[['dibuja'], ['en la escuela'], ['2022-12-29 12:33 am _--_ 2023-01-25 19:13 pm']]]
[['salta'], ['en el bosque'], ['2023-02-05 00:00 am']]
[['salta'], ['en el patio'], ['2023-02-05 00:00 am']]

One thing that is important is that the lines within the file must be arranged alphabetically. For code speed reasons, I don’t know if it’s convenient to alphabetize the lines at the end (ie after adding all the necessary lines) or if it’s better for the program to put it line by line in its alphabetical order, assuming that the file You already have your previous lines sorted.

data_memory_file = open(data_memory_file_path)
for line in sorted(data_memory_file.readlines()): print (line)
Asked By: Matt095

||

Answers:

Turn the contents of the file into a set. Then combine that with the lines in your list to add all the lines that don’t exist. Finally alphabetize that and write it back to the file.

with open(data_memory_file_path) as f:
    file_contents = set(map(str.strip, f)) # str.stripe to remove newlines before merging

new_file_contents = file_contents.union(map(repr, reordered_input_info_lists))

with open(data_memory_file_path, 'w') as f:
    for line in sorted(new_file_contents):
        f.write(line + 'n')
Answered By: Barmar