Mmap throws an error: mmap object has no attribute 'split'

Question:

I am writing a code that should search faster in a large file using mmap.

import re
import mmap

with open('words.txt', 'r', encoding="utf8") as f1, open('source.txt', 'r', encoding="utf8") as f2, open('output.txt', 'w', encoding="utf8") as output_file:

    # Map the contents of file1 and file2 into memory
    file1_contents = mmap.mmap(f1.fileno(), 0, access=mmap.ACCESS_READ)
    file2_contents = mmap.mmap(f2.fileno(), 0, access=mmap.ACCESS_READ)

    # Create a regular expression pattern to match the words in file2
    pattern = re.compile(b'(' + b'|'.join(file2_contents.split()) + b')')

    # Search for the regular expression pattern in the mapped file1 contents
    for line in iter(file1_contents.readline, b""):
        if pattern.search(line):
            # Write the line to the output file
            output_file.write(line.decode())`

When I try to run the code, it throws an error: AttributeError: 'mmap.mmap' object has no attribute 'split'

Asked By: Bobby TB

||

Answers:

The mmap object does not have a split() method like a regular string object. You need to convert the mmap object to a regular string object before you can use the split() method. To do this, you can call the decode() method on the mmap object to convert it to a string, like this:

pattern = re.compile(b'(' + b'|'.join(file2_contents.decode().split()) + b')')
Answered By: Jan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.