Is there any efficient way to store large raw data as text in a .py file?

Question:

This may be a little bit confusing, but I will explain.
Basically, there is a contest platform for a chess game. Everyone needs to upload only one .py file to it as their own AI program, and matches will be played among everyone to get a final rank, like ranking mode in video games. I want to paste my reinforcement learning model as text in the file, but the size of the .py file is limited. Therefore, an efficient way to store data is needed.
(Numpy is the only package allowed to use)

Here is what I’ve tried:
I use numpy.savez_compressed to store data as a .npz file, trying to paste the non-ASCII text data to my .py file. Apparently, just pasting won’t work because of some encoding and decoding problems. Then I use cat command to make sure the data can be correctly pasted to the .py file, using a triple-quoted string literal. It doesn’t work either, which returns SyntaxError: EOF while scanning triple-quoted string literal due to encoding problems as well, I guess.
Is there any way to solve the problem, or are there better ways? Thanks!

Asked By: Haoson Q

||

Answers:

Step 1: A single ( compressed) String out of a file ( with base64 )

  • cat myfile| base64 -w0

will give you a 1 line string

  • cat myfile|gzip|base64 -w0
    will give you a gzipped variant, it should start with H4s and end with one or multiple = OR A

Step 2 : Python Example

in python, check if your decompresssion works by e.g. using a text file as input

sample code to just echo your file in plain:

import base64
import gzip
base64_message = 'H4sIAAAAAAAAAwtJLS4JLinKzEvnAgCL9Yi6CwAAAA=='
base64_bytes = base64_message.encode('ascii')
message_bytes = base64.b64decode(base64_bytes)
print(gzip.decompress(message_bytes))

( the line base64_message = 'H4sIAAAAAAAAAwtJLS4JLinKzEvnAgCL9Yi6CwAAAA==' is what you need to script , this one is the result of echo TestString|gzip|base64 -w0)

Step 3 : Sample Insert/Rewerite Code

  • to "compile" your thingy , use e.g.
    (change insertpoint to real line number)
#!/bin/bash
insertpoint=3
(
head -n$insertpoint raw.py
echo import base64
echo import gzip
base64 -w0 binaryfile.bin  |sed 's/^/base64_message = "/g;s/$/"/g'
echo "base64_bytes = base64_message.encode('utf-8')"
echo "message_bytes = base64.b64decode(base64_bytes)"
echo "realdata=gzip.decompress(message_bytes)"
tail -n$((1+$insertpoint)) raw.py
) > real.py

Hint:
Due to encoding , you might have to use

base64_bytes = base64_message.encode('ascii')

The Sample Thingy of python gzip base64

Answered By: Bencho Naut
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.