avro

How can I create an Avro schema from a python class?

How can I create an Avro schema from a python class? Question: How can I transform my simple python class like the following into a avro schema? class Testo(SQLModel): name: str mea: int This is the Testo.schema() output { "title": "Testo", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "mea": { "title": …

Total answers: 1

Write multiple Avro files from pyspark to the same directory

Write multiple Avro files from pyspark to the same directory Question: I’m trying to write out dataframe as Avro files from PySpark dataframe to the path /my/path/ to HDFS, and partition by the col ‘partition’, so under /my/path/ , there should be the following sub directory structures partition= 20230101 partition= 20230102 …. Under these sub …

Total answers: 1

Python Kafka consumer message deserialisation using AVRO, without schema registry – problem

Python Kafka consumer message deserialisation using AVRO, without schema registry – problem Question: I have a problem with Kafka message deserializing. I use confluent kafka. There is no schema registry – schemas are hardcoded. I can connect consumer to any topic and receive messages, but I can’t deserialise these messages. Output after deserialisation looks something …

Total answers: 1

Avro, Hive or HBASE – What to use for 10 mio. records daily?

Avro, Hive or HBASE – What to use for 10 mio. records daily? Question: I have the following requirements: i need to process per day around 20.000 elements (lets call them baskets) which generate each between 100 and 1.000 records (lets call them products in basket). A single record has about 10 columns, each row …

Total answers: 1

consuming Kafka Avro massages in Python

consuming Kafka Avro massages in Python Question: I am trying to consume messages from Kafka Avro in Python. We have it in Java, and it’s working, but when trying to consume it in the Jupyter notebook, Parsing does not work. I followed the example given by the documentation: (I’ve removed conf information for security reasons) …

Total answers: 1

How to programatically register Avro Schema in Kafka Schema Registry using Python

How to programatically register Avro Schema in Kafka Schema Registry using Python Question: I put data and schema to kafka and schema registry with python. from confluent_kafka import avro from confluent_kafka.avro import AvroProducer value_schema_str = """ { "type":"record", "name":"myrecord", "fields":[ { "name":"ID", "type":["null", "int"], "default":null }, { "name":"PRODUCT", "type":["null", "string"], "default":null }, { "name":"QUANTITY", "type":["null", …

Total answers: 1

Kafka AvroConsumer consume from timestamp using offsets_for_times

Kafka AvroConsumer consume from timestamp using offsets_for_times Question: Trying to use confluent_kafka.AvroConsumer to consume messages from a given time stamp. if flag: # creating a list topic_partitons_to_search = list( map(lambda p: TopicPartition(‘my_topic2’, p, int(time.time())), range(0, 1))) print(“Searching for offsets with %s” % topic_partitons_to_search) offsets = c.offsets_for_times(topic_partitons_to_search, timeout=1.0) print(“offsets_for_times results: %s” % offsets) for x in …

Total answers: 2

CSV to AVRO using python

CSV to AVRO using python Question: I have the following csv : field1;field2;field3;field4;field5;field6;field7;field8;field9;field10;field11;field12; eu;4523;35353;01/09/1999; 741 ; 386 ; 412 ; 86 ; 1.624 ; 1.038 ; 469 ; 117 ; and I want to convert it to avro. I have created the following avro schema: {“namespace”: “forecast.avro”, “type”: “record”, “name”: “forecast”, “fields”: [ {“name”: “field1”, …

Total answers: 2

Do we need to manually cache schema registry?

Do we need to manually cache schema registry? Question: We are currently using Protocol Buffers as serialization mechanism for kafak message. We are going to move to Avro. We tested Avro Confluent consumer with Schema Registry and according to those tests, Avro consumer is little bit slow compare to protobuff consumer. My question is do …

Total answers: 2

How to decode/deserialize Avro with Python from Kafka

How to decode/deserialize Avro with Python from Kafka Question: I am receiving from a remote server Kafka Avro messages in Python (using the consumer of Confluent Kafka Python library), that represent clickstream data with json dictionaries with fields like user agent, location, url, etc. Here is what a message looks like: b’x01x00x00xdex9exa8xd5x8fWxecx9axa8xd5x8fWx1axxx.xxx.xxx.xxxx02:https://website.in/rooms/x02Hhttps://website.in/wellness-spa/x02xaax14x02x9cnx02xaax14x02xd0x0bx02V0:j3lcu1if:rTftGozmxSPo96dz1kGH2hvd0CREXmf2x02V0:j3lj1xt7:YD4daqNRv_Vsea4wuFErpDaWeHu4tW7ex02x08nullx02nnull0x10pageviewx00x00x00x00x00x00x00x00x00x02x10Thailandx02xa6x80xc4x01x02x0eBangkokx02x8cxbaxc4x01x020*xa9x13xd0x84+@x02xecxc09#Jx1fY@x02x8ax02Mozilla/5.0 (X11; Linux x86_64) …

Total answers: 4