Python: Extracting bits from a byte
Question:
I’m reading a binary file in python and the documentation for the file format says:
Flag (in binary)Meaning
1 nnn nnnn Indicates that there is one data byte to follow
that is to be duplicated nnn nnnn (127 maximum)
times.
0 nnn nnnn Indicates that there are nnn nnnn bytes of image
data to follow (127 bytes maximum) and that
there are no duplications.
n 000 0000 End of line field. Indicates the end of a line
record. The value of n may be either zero or one.
Note that the end of line field is required and
that it is reflected in the length of line record
field mentioned above.
When reading the file I’m expecting the byte I’m at to return 1 nnn nnnn
where the nnn nnnn
part should be 50.
I’ve been able to do this using the following:
flag = byte >> 7
numbytes = int(bin(byte)[3:], 2)
But the numbytes calculation feels like a cheap workaround.
Can I do more bit math to accomplish the calculation of numbytes?
How would you approach this?
Answers:
The classic approach of checking whether a bit is set, is to use binary “and” operator, i.e.
x = 10 # 1010 in binary
if x & 0b10: # explicitly: x & 0b0010 != 0
print('First bit is set')
To check, whether n^th bit is set, use the power of two, or better bit shifting
def is_set(x, n):
return x & 2 ** n != 0
# a more bitwise- and performance-friendly version:
return x & 1 << n != 0
is_set(10, 1) # 1 i.e. first bit - as the count starts at 0-th bit
>>> True
not sure I got you correctly, but if I did, this should do the trick:
>>> x = 154 #just an example
>>> flag = x >> 1
>>> flag
1
>>> nb = x & 127
>>> nb
26
If I read your description correctly:
if (byte & 0x80) != 0:
num_bytes = byte & 0x7F
You can strip off the leading bit using a mask ANDed with a byte from file. That will leave you with the value of the remaining bits:
mask = 0b01111111
byte_from_file = 0b10101010
value = mask & byte_from_file
print bin(value)
>> 0b101010
print value
>> 42
I find the binary numbers easier to understand than hex when doing bit-masking.
EDIT: Slightly more complete example for your use case:
LEADING_BIT_MASK = 0b10000000
VALUE_MASK = 0b01111111
values = [0b10101010, 0b01010101, 0b0000000, 0b10000000]
for v in values:
value = v & VALUE_MASK
has_leading_bit = v & LEADING_BIT_MASK
if value == 0:
print "EOL"
elif has_leading_bit:
print "leading one", value
elif not has_leading_bit:
print "leading zero", value
You can do it like this:
def GetVal(b):
# mask off the most significant bit, see if it's set
flag = b & 0x80 == 0x80
# then look at the lower 7 bits in the byte.
count = b & 0x7f
# return a tuple indicating the state of the high bit, and the
# remaining integer value without the high bit.
return (flag, count)
>>> testVal = 50 + 0x80
>>> GetVal(testVal)
(True, 50)
there you go:
class ControlWord(object):
"""Helper class to deal with control words.
Bit setting and checking methods are implemented.
"""
def __init__(self, value = 0):
self.value = int(value)
def set_bit(self, bit):
self.value |= bit
def check_bit(self, bit):
return self.value & bit != 0
def clear_bit(self, bit):
self.value &= ~bit
Instead of int(bin(byte)[3:], 2), you could simply use: int(bin(byte>>1),2)
I’m reading a binary file in python and the documentation for the file format says:
Flag (in binary)Meaning
1 nnn nnnn Indicates that there is one data byte to follow
that is to be duplicated nnn nnnn (127 maximum)
times.0 nnn nnnn Indicates that there are nnn nnnn bytes of image
data to follow (127 bytes maximum) and that
there are no duplications.n 000 0000 End of line field. Indicates the end of a line
record. The value of n may be either zero or one.
Note that the end of line field is required and
that it is reflected in the length of line record
field mentioned above.
When reading the file I’m expecting the byte I’m at to return 1 nnn nnnn
where the nnn nnnn
part should be 50.
I’ve been able to do this using the following:
flag = byte >> 7
numbytes = int(bin(byte)[3:], 2)
But the numbytes calculation feels like a cheap workaround.
Can I do more bit math to accomplish the calculation of numbytes?
How would you approach this?
The classic approach of checking whether a bit is set, is to use binary “and” operator, i.e.
x = 10 # 1010 in binary
if x & 0b10: # explicitly: x & 0b0010 != 0
print('First bit is set')
To check, whether n^th bit is set, use the power of two, or better bit shifting
def is_set(x, n):
return x & 2 ** n != 0
# a more bitwise- and performance-friendly version:
return x & 1 << n != 0
is_set(10, 1) # 1 i.e. first bit - as the count starts at 0-th bit
>>> True
not sure I got you correctly, but if I did, this should do the trick:
>>> x = 154 #just an example
>>> flag = x >> 1
>>> flag
1
>>> nb = x & 127
>>> nb
26
If I read your description correctly:
if (byte & 0x80) != 0:
num_bytes = byte & 0x7F
You can strip off the leading bit using a mask ANDed with a byte from file. That will leave you with the value of the remaining bits:
mask = 0b01111111
byte_from_file = 0b10101010
value = mask & byte_from_file
print bin(value)
>> 0b101010
print value
>> 42
I find the binary numbers easier to understand than hex when doing bit-masking.
EDIT: Slightly more complete example for your use case:
LEADING_BIT_MASK = 0b10000000
VALUE_MASK = 0b01111111
values = [0b10101010, 0b01010101, 0b0000000, 0b10000000]
for v in values:
value = v & VALUE_MASK
has_leading_bit = v & LEADING_BIT_MASK
if value == 0:
print "EOL"
elif has_leading_bit:
print "leading one", value
elif not has_leading_bit:
print "leading zero", value
You can do it like this:
def GetVal(b):
# mask off the most significant bit, see if it's set
flag = b & 0x80 == 0x80
# then look at the lower 7 bits in the byte.
count = b & 0x7f
# return a tuple indicating the state of the high bit, and the
# remaining integer value without the high bit.
return (flag, count)
>>> testVal = 50 + 0x80
>>> GetVal(testVal)
(True, 50)
there you go:
class ControlWord(object):
"""Helper class to deal with control words.
Bit setting and checking methods are implemented.
"""
def __init__(self, value = 0):
self.value = int(value)
def set_bit(self, bit):
self.value |= bit
def check_bit(self, bit):
return self.value & bit != 0
def clear_bit(self, bit):
self.value &= ~bit
Instead of int(bin(byte)[3:], 2), you could simply use: int(bin(byte>>1),2)