Java ByteBuffer similar function in Python

Question:

I need to reimplement following function which converts 8 digit List to 32 digit List via ByteBuffer to python. So it can be used a python pipeline.
As a data scientist, I have been scratching my head how to do it right. I assume I can try to find a way to call java code in python as work around. But I think just implement in python is more elegant if it is achievable. Any help is appreciated!

static public List<Float> decodeEmbedding(List<Integer> embedding8d) {
    List<Float> embedding32d = new ArrayList<>();
    for (int index = 0; index < embedding8d.size(); ++index) {
        Integer x = embedding8d.get(index);
        byte[] bytes = ByteBuffer.allocate(4).putInt(x).array();
        for (int byteIndex = 0; byteIndex < 4; ++byteIndex) {
            embedding32d.add((float) bytes[byteIndex]);
        }
    }
    return embedding32d;
}

//------------ main ----------
List<Integer> input = Arrays.asList(
    -839660988,
    826572561,
    1885995405,
    819888698,
    -2010625315,
    -1614561409,
    -1220275962,
    -2135440498
);

System.out.println(decodeEmbedding(input));
//[-51.0, -13.0, -54.0, 68.0, 49.0, 68.0, 127.0, 17.0, 112.0, 106.0, 1.0, -115.0, 48.0, -34.0, -126.0, 58.0, -120.0, 40.0, 74.0, -35.0, -97.0, -61.0, -65.0, 127.0, -73.0, 68.0, 17.0, 6.0, -128.0, -73.0, -61.0, -114.0]

Asked By: user1269298

||

Answers:

Here is one approach:

  1. We pack the integer into a bytes array of the format we want
import struct
bstring = struct.pack('>i', intvalue)
  1. Iterating over the bytes object gives us integers range [0, 256). But from the looks of it, you want signed range [-128, 128). So we adjust for this
for byte in bstring:
    if byte >= 128:
        byte -= 256
    embedding32d.append(float(byte))

To put the whole thing together:


import struct

def decodeEmbedding(embedding8d):
    embedding32d = []
    for intvalue in embedding8d:
        bstring = struct.pack('>i', intvalue)
        signed = (byte - 256 if byte >= 128 else byte for byte in bstring)
        embedding32d.extend(float(byte) for byte in signed)
    return embedding32d

There is also a one-liner version.

def decodeEmbedding(embedding8d):
    return [float(byte - 256 if byte > 127 else byte) for byte in
            struct.pack('>' + 'i' * len(embedding8d), *embedding8d)]
Answered By: Homer512