Split a large txt file into small ones but not all the data is read using python

Question:

There is a .txt file with data like following:

1 00000001.setts 0x 
2 00000002.setts 0x 
3 00000003.setts 0x 
4 00000004.setts 0x 
...
59876 0000e9e4.setts 0x 
59877 0000e9e5.setts 0x 
59878 0000e9e6.setts 0x 

the number of strings are always dynamic and far from ending in a round number (there are about 100k of them). How is it possible to implement in the form of code the division of such a large file with strings into smaller files (1500 for each small file) of the txt format?

I should clarify that I tried to implement this task, but unfortunately I encounter the fact that it doesn’t read everything and some of the data is lost
( Read only 59514 strings out of 59878 )

The file is pretty big, the structure consists of two values, followed by a space and /n

Asked By: schaef

||

Answers:

Hi and welcome to Stackoverflow. I recommend reading some basic tutorials about Reading and Writing Files in Python. The only thing you need more is a simple for-loop. This is also some basics you should understand.

If you have your own code and concrete questions about it, you can post it here for help.

Answered By: tturbo

welcome to Stack Overflow. This question has already been answered elsewhere, see this post: Splitting large text file into smaller text files by line numbers using Python.

Answered By: mowgle

you can do something like this:

def split_file(filename, filename_prefix, filename_suffix, maximal_filesize):
    line_counter = 1
    file_counter = 1
    with open(filename, "r") as f_in:
        # Read file line by line
        for line in f_in:
            # Increase the counter if there are too many lines in the new file
            if line_counter => maximal_filesize:
                file_counter += 1
                line_counter = 1

            # If you need to do something with data (e.g. remove the number in the beginning), you can add the relevant code here

            # Write data into new file
            with open(f"{filename_prefix}{file_counter}{filename_suffix}", "a") as f_out:
                f_out.write(line + "n")

            line_counter += 1

# Usage
split_file("long_file.txt", "new_short_file_", ".txt", 1500)

# New files will be named sequentially:
# new_short_file_1.txt
# new_short_file_2.txt
# etc.

This should split your large file into smaller ones that have 1500 lines each.

Answered By: marek
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.