Python Kafka consumer message deserialisation using AVRO, without schema registry – problem

Question:

I have a problem with Kafka message deserializing. I use confluent kafka.

There is no schema registry – schemas are hardcoded.

I can connect consumer to any topic and receive messages, but I can’t deserialise these messages.

Output after deserialisation looks something like this:

print(reader) line:

<avro.io.DatumReader object at 0x000002354235DBB0>

I think, that I’ve wrong code for deserializaing, but hove to solve this problem?

At the end I want to extract deserialized key and value

from confluent_kafka import Consumer, KafkaException, KafkaError
import sys
import time
import avro.schema
from avro.io import DatumReader, DatumWriter

def kafka_conf():
    conf = {''' MY CONFIGURATION'''
            }
    return conf


if __name__ == '__main__':

    conf = kafka_conf()
    topic = """MY TOPIC"""
    c = Consumer(conf)
    c.subscribe([topic])
    try:
        while True:
            msg = c.poll(timeout=200.0)
            if msg is None:
                continue
            if msg.error():
                # Error or event
                if msg.error().code() == KafkaError._PARTITION_EOF:
                    # End of partition event
                    sys.stderr.write('%% %s [%d] reached end at offset %dn' %
                                     (msg.topic(), msg.partition(), msg.offset()))
                else:
                    # Error
                    raise KafkaException(msg.error())
            else:
                print("key: ", msg.key())
                print("value: ", msg.value())
                print("offset: ", msg.offset())
                print("topic: ", msg.topic())
                print("timestamp: ", msg.timestamp())
                print("headers: ", msg.headers())
                print("partition: ", msg.partition())
                print("latency: ", msg.latency())

                schema = avro.schema.parse(open("MY_AVRO_SCHEMA.avsc", "rb").read())
                print(schema)

                reader = DatumReader(msg.value, reader_schema=schema)
                print(reader)

            time.sleep(5)  # only on test

    except KeyboardInterrupt:
        print('nAborted by usern')
    finally:
        c.close()
Asked By: token

||

Answers:

You’re printing a reader object, not deserializing data, which you do with reader.read()

You need a BinaryDecoder as well.

The DeserializingConsumer in the Confluent library source code does the exact same thing, after it fetches the schema from the registry, rather than local filesystem, so I suggest you follow what they do.

Answered By: OneCricketeer
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.