Find all words in binary buffer using Python

Question:

I want to find in binary buffer (bytes) all the "words" build from ascii lowercase and digits that only 5 chars length.

For example:

bytes(b'ax1109ertx01x03a54bbx05') contains a54bb and 09ert .

Note the string abcdef121212 is larger than 5 chars so I don’t want it

I have build that set

set([ord(i) for i in string.ascii_lowercase + string.digits])

What is the fastest way to do that using Python?

Asked By: vtable

||

Answers:

My instinct would be to just go with regex here:

>>> import re
>>> buffer = b'ax1109ertx01x03a54bbx05'
>>> re.findall(b"[a-zA-Z0-9]{5}", buffer)
[b'09ert', b'a54bb']

EDIT:

After your clarification, I would try just doing:

re.findall(b"[a-zA-Z0-9]+", buffer)

And then filtering for bytes of exactly length 5, so:

[x for x in re.findall(b"[a-zA-Z0-9]+", buffer) if len(x) == 5]
Answered By: juanpa.arrivillaga
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.