Python bytes to binary string – how 4 bytes can be 29 bits?
Question:
I need to read time data from sensor. Here are the instructions from manual:
I have written code in Python, but I feel like there should be some better way:
# 1
data_bytes = b'x43x32x21x10'
print('data_bytes: ', data_bytes, len(data_bytes)) # why it changes b'x43x32x21x10' to b'C2!x10' ???
# 2
data_binary = bin(int.from_bytes(data_bytes, 'little')) # remove '0b' from string
print('data_binary: ', data_binary, len(data_binary))
data_binary = data_binary[2:]
print('data_binary: ', data_binary, len(data_binary)) # should be: 0001 0000 0010 0001 0011 0010 0100 0011, 32
# 3
sec = data_binary[0:-20]
print(sec, len(sec)) # should be: 0001 0000 0010, 12
sec = int(sec, 2)
print(sec)
usec = data_binary[-20:]
print(usec, len(usec)) # 0001 0011 0010 0100 0011, 20
usec = int(usec, 2)
print(usec)
# 4
print('time: ', sec + usec/1000000) # should be: 258.078403
Results:
data_bytes: b'C2!x10' 4
data_binary: 0b10000001000010011001001000011 31
data_binary: 10000001000010011001001000011 29
100000010 9
258
00010011001001000011 20
78403
time: 258.078403
I have questions:
- Why Python changes b’x43x32x21x10′ to b’C2!x10′?
- Why is the length of the message 29 bits and not 32?
- Is it possible to do this in better/cleaner/faster way?
Thanks!
Answers:
Is it possible to do this in better/cleaner/faster way?
I suggest taking look at struct
part of standard library, consider following simple example, say you have message which is big-endian and contain one char and unsigned int then you could do
import struct
data_bytes = b'x56xFFxFFxFFxFF'
character, value = struct.unpack('>cI',data_bytes)
print(character) # b'V'
print(value) # 4294967295
One example would be to convert the bytestring into an int, and extract relevant information using bit operations.
code00.py:
#!/usr/bin/env python
import sys
def decode(byte_str):
i = int.from_bytes(byte_str, byteorder="little") # Convert to int reversing bytes (little endian)
#i = int.from_bytes(byte_str[::-1], byteorder="big") # Equivalent as the above line: reverse explicitly and convert without reversing bytes
print(hex(i)) # @TODO - cfati: Comment this line
secs = i >> 20 # Discard last 20 bytes (that belong to usecs)
usecs = i & 0x000FFFFF # Only retain last 20 bytes (5 hex digits)
return secs + usecs / 1000000
def main(*argv):
rec = b"x43x32x21x10"
dec = decode(rec)
print("Result: {:.6f}".format(dec))
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("nDone.n")
sys.exit(rc)
Output:
[cfati@CFATI-5510-0:e:WorkDevStackOverflowq075472971]> "e:WorkDevVEnvspy_pc064_03.10_test0Scriptspython.exe" ./code00.py
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32
0x10213243
Result: 258.078403
Done.
Might also worth reading:
-
[Python.Docs]: Built-in Types – Numeric Types — int, float, complex
-
[SO]: Python struct.pack() behavior (@CristiFati’s answer) for an explanation regarding your "weird" outputs and hex representations
-
[SO]: Output of crc32b in PHP is not equal to Python (@CristiFati’s answer)
-
Both are the same data. Per the documentation, bytes
objects are represented as ASCII characters or, for values over 127, by the appropriate hexadecimal literal. If you check an ASCII table, the hexadecimal values 0x43 0x32 and 0x21 are the characters C2!
-
As noted by other comments, leading zeros are stripped
-
you can avoid most of the conversions to and from strings by using binary operations:
data_bytes = b'x43x32x21x10'
# convert to int
data = int.from_bytes(data_bytes,'little')
# zero out the leading bits, leaving only the 20 bits corresponding to the microseconds
microseconds = data & 0x0fffff
# shift right by 20, thus keeping only the seconds (upper bits)
seconds = data >> 20
print(seconds) # 258
print(microseconds) #78403
Have you tried using divmod ?
Dividing the integer value by 2^20 or 1048576 will directly split it in the two parts you are looking for.
sec,usec = divmod(int.from_bytes(data_bytes, 'little'),1048576)
print(sec,usec)
# 258 78403
I need to read time data from sensor. Here are the instructions from manual:
I have written code in Python, but I feel like there should be some better way:
# 1
data_bytes = b'x43x32x21x10'
print('data_bytes: ', data_bytes, len(data_bytes)) # why it changes b'x43x32x21x10' to b'C2!x10' ???
# 2
data_binary = bin(int.from_bytes(data_bytes, 'little')) # remove '0b' from string
print('data_binary: ', data_binary, len(data_binary))
data_binary = data_binary[2:]
print('data_binary: ', data_binary, len(data_binary)) # should be: 0001 0000 0010 0001 0011 0010 0100 0011, 32
# 3
sec = data_binary[0:-20]
print(sec, len(sec)) # should be: 0001 0000 0010, 12
sec = int(sec, 2)
print(sec)
usec = data_binary[-20:]
print(usec, len(usec)) # 0001 0011 0010 0100 0011, 20
usec = int(usec, 2)
print(usec)
# 4
print('time: ', sec + usec/1000000) # should be: 258.078403
Results:
data_bytes: b'C2!x10' 4
data_binary: 0b10000001000010011001001000011 31
data_binary: 10000001000010011001001000011 29
100000010 9
258
00010011001001000011 20
78403
time: 258.078403
I have questions:
- Why Python changes b’x43x32x21x10′ to b’C2!x10′?
- Why is the length of the message 29 bits and not 32?
- Is it possible to do this in better/cleaner/faster way?
Thanks!
Is it possible to do this in better/cleaner/faster way?
I suggest taking look at struct
part of standard library, consider following simple example, say you have message which is big-endian and contain one char and unsigned int then you could do
import struct
data_bytes = b'x56xFFxFFxFFxFF'
character, value = struct.unpack('>cI',data_bytes)
print(character) # b'V'
print(value) # 4294967295
One example would be to convert the bytestring into an int, and extract relevant information using bit operations.
code00.py:
#!/usr/bin/env python
import sys
def decode(byte_str):
i = int.from_bytes(byte_str, byteorder="little") # Convert to int reversing bytes (little endian)
#i = int.from_bytes(byte_str[::-1], byteorder="big") # Equivalent as the above line: reverse explicitly and convert without reversing bytes
print(hex(i)) # @TODO - cfati: Comment this line
secs = i >> 20 # Discard last 20 bytes (that belong to usecs)
usecs = i & 0x000FFFFF # Only retain last 20 bytes (5 hex digits)
return secs + usecs / 1000000
def main(*argv):
rec = b"x43x32x21x10"
dec = decode(rec)
print("Result: {:.6f}".format(dec))
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("nDone.n")
sys.exit(rc)
Output:
[cfati@CFATI-5510-0:e:WorkDevStackOverflowq075472971]> "e:WorkDevVEnvspy_pc064_03.10_test0Scriptspython.exe" ./code00.py Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32 0x10213243 Result: 258.078403 Done.
Might also worth reading:
-
[Python.Docs]: Built-in Types – Numeric Types — int, float, complex
-
[SO]: Python struct.pack() behavior (@CristiFati’s answer) for an explanation regarding your "weird" outputs and hex representations
-
[SO]: Output of crc32b in PHP is not equal to Python (@CristiFati’s answer)
-
Both are the same data. Per the documentation,
bytes
objects are represented as ASCII characters or, for values over 127, by the appropriate hexadecimal literal. If you check an ASCII table, the hexadecimal values 0x43 0x32 and 0x21 are the characters C2! -
As noted by other comments, leading zeros are stripped
-
you can avoid most of the conversions to and from strings by using binary operations:
data_bytes = b'x43x32x21x10'
# convert to int
data = int.from_bytes(data_bytes,'little')
# zero out the leading bits, leaving only the 20 bits corresponding to the microseconds
microseconds = data & 0x0fffff
# shift right by 20, thus keeping only the seconds (upper bits)
seconds = data >> 20
print(seconds) # 258
print(microseconds) #78403
Have you tried using divmod ?
Dividing the integer value by 2^20 or 1048576 will directly split it in the two parts you are looking for.
sec,usec = divmod(int.from_bytes(data_bytes, 'little'),1048576)
print(sec,usec)
# 258 78403