Split a large txt file into small ones but not all the data is read using python
Question:
There is a .txt file with data like following:
1 00000001.setts 0x
2 00000002.setts 0x
3 00000003.setts 0x
4 00000004.setts 0x
...
59876 0000e9e4.setts 0x
59877 0000e9e5.setts 0x
59878 0000e9e6.setts 0x
the number of strings are always dynamic and far from ending in a round number (there are about 100k of them). How is it possible to implement in the form of code the division of such a large file with strings into smaller files (1500 for each small file) of the txt format?
I should clarify that I tried to implement this task, but unfortunately I encounter the fact that it doesn’t read everything and some of the data is lost
( Read only 59514 strings out of 59878 )
The file is pretty big, the structure consists of two values, followed by a space and /n
Answers:
Hi and welcome to Stackoverflow. I recommend reading some basic tutorials about Reading and Writing Files in Python. The only thing you need more is a simple for-loop. This is also some basics you should understand.
If you have your own code and concrete questions about it, you can post it here for help.
welcome to Stack Overflow. This question has already been answered elsewhere, see this post: Splitting large text file into smaller text files by line numbers using Python.
you can do something like this:
def split_file(filename, filename_prefix, filename_suffix, maximal_filesize):
line_counter = 1
file_counter = 1
with open(filename, "r") as f_in:
# Read file line by line
for line in f_in:
# Increase the counter if there are too many lines in the new file
if line_counter => maximal_filesize:
file_counter += 1
line_counter = 1
# If you need to do something with data (e.g. remove the number in the beginning), you can add the relevant code here
# Write data into new file
with open(f"{filename_prefix}{file_counter}{filename_suffix}", "a") as f_out:
f_out.write(line + "n")
line_counter += 1
# Usage
split_file("long_file.txt", "new_short_file_", ".txt", 1500)
# New files will be named sequentially:
# new_short_file_1.txt
# new_short_file_2.txt
# etc.
This should split your large file into smaller ones that have 1500 lines each.
There is a .txt file with data like following:
1 00000001.setts 0x
2 00000002.setts 0x
3 00000003.setts 0x
4 00000004.setts 0x
...
59876 0000e9e4.setts 0x
59877 0000e9e5.setts 0x
59878 0000e9e6.setts 0x
the number of strings are always dynamic and far from ending in a round number (there are about 100k of them). How is it possible to implement in the form of code the division of such a large file with strings into smaller files (1500 for each small file) of the txt format?
I should clarify that I tried to implement this task, but unfortunately I encounter the fact that it doesn’t read everything and some of the data is lost
( Read only 59514 strings out of 59878 )
The file is pretty big, the structure consists of two values, followed by a space and /n
Hi and welcome to Stackoverflow. I recommend reading some basic tutorials about Reading and Writing Files in Python. The only thing you need more is a simple for-loop. This is also some basics you should understand.
If you have your own code and concrete questions about it, you can post it here for help.
welcome to Stack Overflow. This question has already been answered elsewhere, see this post: Splitting large text file into smaller text files by line numbers using Python.
you can do something like this:
def split_file(filename, filename_prefix, filename_suffix, maximal_filesize):
line_counter = 1
file_counter = 1
with open(filename, "r") as f_in:
# Read file line by line
for line in f_in:
# Increase the counter if there are too many lines in the new file
if line_counter => maximal_filesize:
file_counter += 1
line_counter = 1
# If you need to do something with data (e.g. remove the number in the beginning), you can add the relevant code here
# Write data into new file
with open(f"{filename_prefix}{file_counter}{filename_suffix}", "a") as f_out:
f_out.write(line + "n")
line_counter += 1
# Usage
split_file("long_file.txt", "new_short_file_", ".txt", 1500)
# New files will be named sequentially:
# new_short_file_1.txt
# new_short_file_2.txt
# etc.
This should split your large file into smaller ones that have 1500 lines each.