Saving Python dictionary as JSON file using Kafka and NiFi

Question:

I am attempting to send JSON data from Python, through Kafka, to a NiFi container where it will be saved. The Python script is below:

server_address = "foobar:1234"
producer = KafkaProducer(bootstrap_servers=server_address)
data = {"type": "record", "name": "CSV", "namespace": "nifi", "fields": {"Id": 0, "Value": 10.1}}
producer.send("outputs", json.dumps(data, default=json_util.default).encode("utf-8"))
producer.flush()

The NiFi flow consists of the processors: ConsumeKafka_2_6 -> ConvertRecord -> PutFile. I have already validated that the ConsumeKafka and PutFile processors work without ConvertRecord (though the format isn’t correct). For the ConvertRecord processor, the RecordReader is JsonTreeReader and the RecordWriter is a CSVRecordSetWriter. Both are using AvroSchemaRegistry, where the schema I’ve selected is:

{
  "type": "record",
  "name": "CSV",
  "namespace: "nifi",
  "fields": [
    {"name": "Id", "type": "int"},
    {"name": "Value", "type": "float"}
  ]
}

The data is flowing successfully from the ConsumeKafka processor to the ConvertRecord processor, but from there, it is routed to failure.
My question is, how do I make the entire flow successful? I have found various sources online, but have not been able to get it to work for myself.

EDIT: Despite the mess in line 3 of my Python script, I really just want to transfer "Id": 0, "Value": 10.1.

Asked By: wb1210

||

Answers:

The issue with your current configuration is that the schema you are using in the ConvertRecord processor does not match the schema of the data being sent from your Python script.

The schema you are using in the ConvertRecord processor defines a record with two fields: "Id" (an integer) and "Value" (a float). However, the JSON data you are sending from your Python script has a different structure. It is a dictionary with one key "fields" that has a value that is itself a dictionary containing "Id" and "Value" fields.

To fix this issue, you need to update the schema used in the ConvertRecord processor to match the structure of the JSON data being sent from your Python script. Here is the updated schema:

{
  "type": "record",
  "name": "JSON",
  "fields": [
    {"name": "fields", "type": {
        "type": "record",
        "name": "CSV",
        "fields": [
          {"name": "Id", "type": "int"},
          {"name": "Value", "type": "float"}
        ]
      }
    }
  ]
}

This schema defines a record with one field "fields", which itself is a record with "Id" and "Value" fields. With this schema, the ConvertRecord processor should be able to correctly parse the JSON data sent from your Python script.

Once you have updated the schema, you should also update the RecordReader and RecordWriter configurations in the ConvertRecord processor to use this new schema.

After making these changes, the data should flow successfully from the ConsumeKafka processor through the ConvertRecord processor and to the PutFile processor, where it will be saved.

Answered By: Dhananjay Yadav