Mmap throws an error: mmap object has no attribute 'split'
Question:
I am writing a code that should search faster in a large file using mmap.
import re
import mmap
with open('words.txt', 'r', encoding="utf8") as f1, open('source.txt', 'r', encoding="utf8") as f2, open('output.txt', 'w', encoding="utf8") as output_file:
# Map the contents of file1 and file2 into memory
file1_contents = mmap.mmap(f1.fileno(), 0, access=mmap.ACCESS_READ)
file2_contents = mmap.mmap(f2.fileno(), 0, access=mmap.ACCESS_READ)
# Create a regular expression pattern to match the words in file2
pattern = re.compile(b'(' + b'|'.join(file2_contents.split()) + b')')
# Search for the regular expression pattern in the mapped file1 contents
for line in iter(file1_contents.readline, b""):
if pattern.search(line):
# Write the line to the output file
output_file.write(line.decode())`
When I try to run the code, it throws an error: AttributeError: 'mmap.mmap' object has no attribute 'split'
Answers:
The mmap object does not have a split()
method like a regular string object. You need to convert the mmap
object to a regular string object before you can use the split()
method. To do this, you can call the decode()
method on the mmap
object to convert it to a string, like this:
pattern = re.compile(b'(' + b'|'.join(file2_contents.decode().split()) + b')')
I am writing a code that should search faster in a large file using mmap.
import re
import mmap
with open('words.txt', 'r', encoding="utf8") as f1, open('source.txt', 'r', encoding="utf8") as f2, open('output.txt', 'w', encoding="utf8") as output_file:
# Map the contents of file1 and file2 into memory
file1_contents = mmap.mmap(f1.fileno(), 0, access=mmap.ACCESS_READ)
file2_contents = mmap.mmap(f2.fileno(), 0, access=mmap.ACCESS_READ)
# Create a regular expression pattern to match the words in file2
pattern = re.compile(b'(' + b'|'.join(file2_contents.split()) + b')')
# Search for the regular expression pattern in the mapped file1 contents
for line in iter(file1_contents.readline, b""):
if pattern.search(line):
# Write the line to the output file
output_file.write(line.decode())`
When I try to run the code, it throws an error: AttributeError: 'mmap.mmap' object has no attribute 'split'
The mmap object does not have a split()
method like a regular string object. You need to convert the mmap
object to a regular string object before you can use the split()
method. To do this, you can call the decode()
method on the mmap
object to convert it to a string, like this:
pattern = re.compile(b'(' + b'|'.join(file2_contents.decode().split()) + b')')