Find all words in binary buffer using Python
Question:
I want to find in binary buffer (bytes
) all the "words" build from ascii lowercase and digits that only 5 chars length.
For example:
bytes(b'ax1109ertx01x03a54bbx05')
contains a54bb
and 09ert
.
Note the string abcdef121212
is larger than 5 chars so I don’t want it
I have build that set
set([ord(i) for i in string.ascii_lowercase + string.digits])
What is the fastest way to do that using Python?
Answers:
My instinct would be to just go with regex here:
>>> import re
>>> buffer = b'ax1109ertx01x03a54bbx05'
>>> re.findall(b"[a-zA-Z0-9]{5}", buffer)
[b'09ert', b'a54bb']
EDIT:
After your clarification, I would try just doing:
re.findall(b"[a-zA-Z0-9]+", buffer)
And then filtering for bytes of exactly length 5, so:
[x for x in re.findall(b"[a-zA-Z0-9]+", buffer) if len(x) == 5]
I want to find in binary buffer (bytes
) all the "words" build from ascii lowercase and digits that only 5 chars length.
For example:
bytes(b'ax1109ertx01x03a54bbx05')
contains a54bb
and 09ert
.
Note the string abcdef121212
is larger than 5 chars so I don’t want it
I have build that set
set([ord(i) for i in string.ascii_lowercase + string.digits])
What is the fastest way to do that using Python?
My instinct would be to just go with regex here:
>>> import re
>>> buffer = b'ax1109ertx01x03a54bbx05'
>>> re.findall(b"[a-zA-Z0-9]{5}", buffer)
[b'09ert', b'a54bb']
EDIT:
After your clarification, I would try just doing:
re.findall(b"[a-zA-Z0-9]+", buffer)
And then filtering for bytes of exactly length 5, so:
[x for x in re.findall(b"[a-zA-Z0-9]+", buffer) if len(x) == 5]