How to convert raw bytes in Hexadecimal from to RGB image

Question:

I am trying to get my hands dirty by exploring the act of detecting malware using machine learning.

So I stumbled across a Microsoft Big 2015 dataset on Kaggle. The dataset contains bytes data in hexadecimal form. i.e

00401000 00 00 80 40 40 28 00 1C 02 42 00 C4 00 20 04 20
00401010 00 00 20 09 2A 02 00 00 00 00 8E 10 41 0A 21 01
00401020 40 00 02 01 00 90 21 00 32 40 00 1C 01 40 C8 18
00582FF0 ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

The link to the full byte and grayscale image generated from the byte.

hexadecimal bytes

sample image

I want to convert this hexadecimal byte to an RGB image using python.

Here is a script that converts this byte to a grayscale image, I would like to replicate this for an RGB image.


import sys
import os
from math import log
import numpy as np
import scipy as sp
from PIL import Image
import matplotlib.pyplot as plt
def saveimg(array,name):
    if array.shape[1]!=16:
        assert(False)
    
    b=int((array.shape[0]*16)**(0.5))
    b=2**(int(log(b)/log(2))+1)
    a=int(array.shape[0]*16/b)
    # #print a,b,array.shape
    array=array[:int(a*b/16),:]
    array=np.reshape(array,(a,b))
    print(array.shape)
    im = Image.fromarray(np.uint8(array))
    im.save(name+'.jpg', "JPEG")

path = bytes_folder # "/test/0ACDbR5M3ZhBJajygTuf.bytes"
files=os.listdir(path)
c=0
for cc,x in enumerate(files):
    if '.bytes' != x[-6:]:
        continue
    f=open(os.path.join(path,x))
    array=[]
    c+=1
    for line in f:
        xx=line.split()
        if len(xx)!=17:
            continue
        #if xx[1]=='??':
        #    break
        array.append([int(i,16) if i!='??' else 0 for i in xx[1:] ])
    saveimg(np.array(array),x)
    del array
    f.close()

'''
Asked By: benjamin olise

||

Answers:

For getting RGB image, all we need is reshaping the array to height x width x 3 (the third dimension must be 3 applying R,G,B color components).

There are many options for converting arbitrary data to shape height x width x 3.
There is an important limitation: total number of elements before conversion must be a equal to height*width*3.

Example:

We may define: width = 320
Then total number of elements must a multiple of 320*3.
In case array.size it is not a multiple of 3, we may cut the remainder elements.

In case we want square image we may choose: width = int((array.size / 3)**0.5).

  • Flatten the array to 1D:

     array = array.ravel()
    
  • Compute the remainder from (width*3):

     remainder = array.size % (width*3)
    
  • Remove remainder elements from the end of array:

     if remainder > 0:
         array = array[0:-remainder]
    
  • Reshape the array to height x width x 3 (height is computed automatically – height is going to be array.size/(320*3)

     rgb_array = array.reshape((-1, width, 3))
    
  • Convert rgb_array to PIL image:

     im = Image.fromarray(rgb_array, mode='RGB')
    

Note:
This is just an example – there are many options for selecting width and height.


Code sample:

import numpy as np
from PIL import Image

def conv(val):
    """ Convert val from hex string to int, return 0 if val is not a valid in hex format """
    try:
        return int(val, 16)
    except ValueError:
        return 0

# https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html
# https://stackoverflow.com/questions/31344892/using-numpys-readtxt-to-read-hex-numbers
array = np.loadtxt('0ACDbR5M3ZhBJajygTuf.bytes.txt', np.uint8, converters={x: conv for x in range(17)}, usecols=list(range(1, 17)))

width = 320  # Select constant width (just an example)
# Or: width = int((array.size / 3)**0.5)

# Assume we want the width to be 320, the number of elements before reshape must be a multiple of 320*3
array = array.ravel()  # Make the arre 1D
remainder = array.size % (width*3)
if remainder > 0:
    array = array[0:-remainder]  # May remove up to 320*3-1 elements from the end of array (make the length a multiple of 320*3)

#https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
# Make the width 320, the number of channels 3, and the height is computed automatically.
rgb_array = array.reshape((-1, width, 3))

im = Image.fromarray(rgb_array, mode='RGB')
im.save('im.jpg')

Another option is making 3 color planes (R plane, G plane, B plane), and using np.transpose:

rgb_array = array.reshape((3, width, -1))
rgb_array = np.transpose(rgb_array, (2, 1, 0))

Note:
I think that using JPEG format is not a good idea, because JPEG applies lossy compression.
I recommend using PNG file format (PNG is lossless).

Answered By: Rotem
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.