How do I program Python to read bytes from a file as base10 and organize each byte into a list?

Question:

I’m pretty amateur with python and computer data, so bare with me –
I have a file called "input.nodes" and I want to have python read each byte, convert them to base10, and then organize each byte chronologically into a list. I have almost succeeded with this.
I’m using Python 3.10, which if I recall correctly, is the latest version.

To put it even more simply — How do I make python read each byte as base10, and organize all of them chronologically into a list with NO extra characters whatsoever?

Here’s how I attempted to do this:

with open('input.nodes') as f:
    dataImport = f.read()

dataImportSplit = [] # << The reason we are adding an empty list is because we will 
                     # sort all of the bytes of the stickfigure into this list.

for chr in dataImport:               # << As stated, this will take all of the bytes
    dataImportSplit.append(ord(chr)) # from input.nodes and sort them into a list.

print('BYTE LIST (in Base10):n' + str(dataImportSplit)) # << Prints out "dataImportSplit", the
                                                         # list we just sorted bytes into.
                                                         # Just for debugging purposes.

To elaborate on this code:

  1. Import input.nodes as a string, and let "dataImport" be the variable for it.
  2. Create an empty list to later store all of the bytes into, called "dataImportSplit" .
  3. Find the Unicode code point of each character in dataImport (via the ord function), and append each of them individually into a list.
  4. Print the list into the console as a string for debugging.

That code almost worked; most objects in the list were the base10 representation of the byte and I double-checked with a decimal-to-binary and decimal-to-hex calculator. However, there seemed to be a few outliers that have zero correlation to anything, to my knowledge.

Here is the output in the python terminal:

BYTE LIST (in Base10):
[0, 0, 1, 78, 63, 8364, 0, 0, 255, 112, 8250, 68, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 8364, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 31, 31, 31, 255, 127, 127, 127, 255, 127, 127, 127, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 8364, 0, 0, 0, 0, 0, 100, 0, 0, 0, 100, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 0, 0, 0, 45, 0, 0, 0, 45, 255, 112, 8250, 68, 255, 112, 8250, 68, 255, 112, 8250, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Obviously you cant understand what this means if I dont give the actual contents of input.nodes , so for reference, here is input.nodes in a hex editor:
[input.nodes in hex(https://i.stack.imgur.com/6zfKV.png)
If this is still not enough information, you can download input.nodes for yourself here.

As you can see, the list objects containing ‘8364’ or ‘8250’ dont appear to have any correlation to anything — no matter what its converted to.

What am I missing here? What do the numbers "8364" and "8250" have to do with anything?

Asked By: Vuice

||

Answers:

The bytes type in Python represents a sequence of byte-valued integers (0-255), but displays by default as an immutable byte string. If you open your file in binary mode ('rb') the data you read is a byte string that can be accessed as integers individually through indexing or iteration, or you can convert it explicitly to a list.

Opening in text mode (the default) uses an implicit encoding that varies by OS if the encoding parameters is not used and converts bytes to Unicode code points via that encoding.

If you want the individual bytes, read in binary to prevent any conversion:

with open('downloadsinput.nodes', 'rb') as f:
    data = f.read()

print(data[:20])          # displays first 20 bytes as a byte string
print(data[:20].hex(' ')) # hexadecimal dump separated by spaces

for b in data[20:40]: # prints next 20 bytes as integers
    print(b)

print(list(data)) # convert to list

Output:

b'x00x00x01N?x80x00x00xffpx9bDxffx00x00x00x00x00x00x00'
00 00 01 4e 3f 80 00 00 ff 70 9b 44 ff 00 00
0
0
0
0
0
0
0
0
0
0
0
0
0
0
63
128
0
0
0
0
[0, 0, 1, 78, 63, 128, 0, 0, 255, 112, 155, 68, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 63, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 31, 31, 31, 255, 127, 127, 127, 255, 127, 127, 127, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 128, 0, 0, 0, 0, 0, 100, 0, 0, 0, 100, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 0, 0, 0, 45, 0, 0, 0, 45, 255, 112, 155, 68, 255, 112, 155, 68, 255, 112, 155, 68, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Answered By: Mark Tolonen
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.