large-files

using a python generator to process large text files

using a python generator to process large text files Question: I’m new to using generators and have read around a bit but need some help processing large text files in chunks. I know this topic has been covered but example code has very limited explanations making it difficult to modify the code if one doesn’t …

Total answers: 4

python: read lines from compressed text files

python: read lines from compressed text files Question: Is it easy to read a line from a gz-compressed text file using python without extracting the file completely? I have a text.gz file which is aroud 200mb. When I extract it, it becomes 7.4gb. And this is not the only file I have to read. For …

Total answers: 4

How to parse a large file taking advantage of threading in Python?

How to parse a large file taking advantage of threading in Python? Question: I have a huge file and need to read it and process. with open(source_filename) as source, open(target_filename) as target: for line in source: target.write(do_something(line)) do_something_else() Can this be accelerated with threads? If I spawn a thread per line, will this have a …

Total answers: 4

Using Python Iterparse For Large XML Files

Using Python Iterparse For Large XML Files Question: I need to write a parser in Python that can process some extremely large files ( > 2 GB ) on a computer without much memory (only 2 GB). I wanted to use iterparse in lxml to do it. My file is of the format: <item> <title>Item …

Total answers: 6

Searching for a string in a large text file – profiling various methods in python

Searching for a string in a large text file – profiling various methods in python Question: This question has been asked many times. After spending some time reading the answers, I did some quick profiling to try out the various methods mentioned previously… I have a 600 MB file with 6 million lines of strings …

Total answers: 6

large amount of data in many text files – how to process?

large amount of data in many text files – how to process? Question: I have large amounts of data (a few terabytes) and accumulating… They are contained in many tab-delimited flat text files (each about 30MB). Most of the task involves reading the data and aggregating (summing/averaging + additional transformations) over observations/rows based on a …

Total answers: 8

Is there a memory efficient and fast way to load big JSON files?

Is there a memory efficient and fast way to load big JSON files? Question: I have some json files with 500MB. If I use the "trivial" json.load() to load its content all at once, it will consume a lot of memory. Is there a way to read partially the file? If it was a text, …

Total answers: 11

Python: How to read huge text file into memory

Python: How to read huge text file into memory Question: I’m using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file $ ls -l links.csv; file links.csv; tail links.csv -rw-r–r– 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line terminators 4757187,59883 …

Total answers: 6