Compress data in Java and decompress in Python

Question:

So I am trying to compress (gzip or similar format) a JSON object before I throw it in my MySQL database. I am currently storing the data as BLOB. I have tried to use the following Java method to compress the data:

public static byte[] compress(String str) throws Exception {
    if (str == null || str.length() == 0) {
        return null;
    }

    ByteArrayOutputStream obj = new ByteArrayOutputStream();
    GZIPOutputStream gzip = new GZIPOutputStream(obj);
    gzip.write(str.getBytes("UTF-8"));
    gzip.close();
    return obj.toByteArray();
}

and then store it in the database using setBytes() with a PreparedStatement and am having no issues with this. What I am having issues with is decrypting the data in Python 2.7 I have tried using zlib.decompress() to no avail. It can’t seem to read the data that Java is storing. I also need to write a conversion script in Python to compress the old rows into this new format. So whatever format I needs to be readable by the Python decompress() whether it was compressed with Java or Python 2.7

I am happy to provide anymore information that can assist in helping to find a solution to my dilemma.

Thanks.

EDIT: Some of the Python Code:

class KitPvPMatch(Base):
    """ The SQLAlchemy declarative model class for a User object. """
    __tablename__ = 'kit_pvp_matches'
    __table_args__ = {
        'mysql_engine': 'InnoDB',
        'mysql_charset': 'utf8'
    }

    match_id = Column(INTEGER(11), autoincrement=True, primary_key=True, nullable=False)
    season = Column(Unicode(5), nullable=False)
    winner = Column(Unicode(16), nullable=False)
    loser = Column(Unicode(16), nullable=False)
    ladder_id = Column(TINYINT(4), nullable=False)
    data = Column(BLOB, nullable=False)

# The line in question
jsonData = json.loads(zlib.decompress(match.data))

# The error
error: Error -3 while decompressing data: incorrect header check
Asked By: MasterGberry

||

Answers:

Here is a post that goes over unzipping using zlib with a stream.

Otherwise, have you tried the gzip docs for gzip.py. You may need a temp file. The documentation for gzip is here. There is a fairly decent solution for this approach in the following post on decompression..

If you haven’t already, ensure that you are getting bytes back from SQL. Python is flexible so it may be a string. Call bytearray(string) on your string if this is the case.

If that doesn’t work:

  1. What format is the data in when returned by your SQL command?
  2. What error, if any are you getting?
Answered By: Andrew Scott Evans

I’m rehashing the answer from here: https://stackoverflow.com/a/12572031/7298096 because the question of this thread is exactly the topic I was looking for. In my case the Java code is compressing the content with DeflaterOutputStream and then encoding with Base64.encodeBase64String
The
error: Error -3 while decompressing data: incorrect header check is resolved if I provide a
32 signal offset to zlip decompres:

import base64
import zlib

data = "ENCODED_COMPRESSED_STRING_FROM_JAVA"

output_str = zlib.decompress(base64.b64decode(data), 32 + zlib.MAX_WBITS).decode('utf-8')
Answered By: ahfx
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.