I want to keep only the lines before a certain string in a txt file

Question:

I want all lines before the line that has string ‘VarList’. I cannot understand why the solutions proposed elsewhere do not work for my txt file.

To simplify:

I have many .txt files that look like this:

    text1=text
    text2=text
    (...)
    textN=text
    VarList=text
    (...)
    End

I just want this:

    text1=text
    text2=text
    (...)
    textN=text

How can I get it for all txt files in a directory path?

First I have tried this:

import os

for subdir, dirs, files in os.walk('C:\Users\nigel\OneDrive\Documents\LAB\lean\.txt'):
    for file in files:
        output=[]
        with open(file, 'r') as inF:
            for line in inF:
                output.append(line)
                if 'VarList' in line: break
        f=open(file, 'w')
        blank=['']
        [f.write(x) for x in output]
        [f.write(x+'n') for x in blank]
        f.close()

Nothing at all changes in the txt file, but the file has string ‘VarList’ in one of the lines. So, why isn’t it working?

Then:

import re

def trim(test_string, removal_string):
    return re.sub(r'^(.*?)('+ removal_string + ')(.*)$', r'1' + r'2', test_string)

def cleanFile(file_path, removal_string):
    with open(file_path) as master_text:
        return trim(master_text, removal_string)

cleanFile(r'C:UsersnigelOneDriveDocumentsLABleansample_01.02_R00.txt', 'VarList')

and I get this error:

--------------------------------------------------------------------------- TypeError                                 Traceback (most recent call last) Input In [2], in <cell line: 16>()
     13     with open(file_path) as master_text:
     14         return trim(master_text, removal_string)
---> 16 cleanFile(r'C:UsersnigelOneDriveDocumentsLABleansample_01.02_R00.txt', 'VarList')

Input In [2], in cleanFile(file_path, removal_string)
     12 def cleanFile(file_path, removal_string):
     13     with open(file_path) as master_text:
---> 14         return trim(master_text, removal_string)

Input In [2], in trim(test_string, removal_string)
      9 def trim(test_string, removal_string):
---> 10     return re.sub(r'^(.*?)('+ removal_string + ')(.*)$', r'1' + r'2', test_string)

File ~Anaconda3libre.py:210, in sub(pattern, repl, string, count, flags)
    203 def sub(pattern, repl, string, count=0, flags=0):
    204     """Return the string obtained by replacing the leftmost
    205     non-overlapping occurrences of the pattern in string by the
    206     replacement repl.  repl can be either a string or a callable;
    207     if a string, backslash escapes in it are processed.  If it is
    208     a callable, it's passed the Match object and must return
    209     a replacement string to be used."""
--> 210     return _compile(pattern, flags).sub(repl, string, count)

TypeError: expected string or bytes-like object

Finally, I have tried:

with open(r'C:UsersnigelOneDriveDocumentsLABleansample_01.02_R00.txt', 'r') as importFile, open(r'C:UsersnigelOneDriveDocumentsLABleansample_01.02_R00_temp.txt', 'w') as exportFile:
    head, sep, tail = importFile.partition('VarList')
    exportFile = head

importFile.close()
exportFile.close()

Error:

————————————————————————— AttributeError Traceback (most recent call
last) Input In [2], in <cell line: 3>()
1 # Solution 3
3 with open(r’C:UsersnigelOneDriveDocumentsLABleansample_01.02_R00.txt’, ‘r’) as importFile,
open(r’C:UsersnigelOneDriveDocumentsLABleansample_01.02_R00_temp.txt’, ‘w’) as
exportFile:
—-> 4 head, sep, tail = importFile.partition(‘VarList’)
5 exportFile = head
7 importFile.close()

AttributeError: ‘_io.TextIOWrapper’ object has no attribute
‘partition’

Does anyone have a clue about what is going on in here?

Asked By: NigelBlainey

||

Answers:

You’re appending to the output before you check for "VarList". The correct way would be:

with open(file, 'r') as inF:
    for line in inF:      
        if 'VarList' in line:
            break
        output.append(line)
Answered By: Jakob

I think this task could be made easier by using Python’s pathlib as it has some useful methods for reading and writing text files.

pathlib also has glob functionality that allows the addition of “**” to mean “this directory and all subdirectories, recursively”.

For truncating the file, I have chosen to use Python’s list comprehension to find the line that starts with the required string and then slice the list of lines at that point.

For example:

from pathlib import Path


def trim_file(filename: Path, end_before: str) -> None:
    content = filename.read_text().splitlines()
    location = [content.index(line)
                for line in content if end_before in line]
    if location:
        filename.write_text("n".join(content[:location[0]]))


def uppercase_file(filename: Path):
    """extra method to answer a question in the comments below"""
    content = []
    for line in filename.read_text().splitlines():
        content.append(line.upper())
    filename.write_text("n".join(content))


def main():
    search_directory = Path.home().joinpath('Documents', 'LAB')
    for txt_file in search_directory.glob("**/*.txt"):
        trim_file(txt_file, 'VarList')
        # Example of adding second function to work on same file
        uppercase_file(txt_file)


if __name__ == '__main__':
    main()

Answered By: ukBaz
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.