Ensure faust consumer/agent seeks to offset 0 after rebalance

Question:

Not much point explaining why this needs to happen, but we need to have a worker always read from the beginning of a topic on startup and after a rebalance.

The following snippet does work on start to ensure that when you start a worker it retrieves the first offset in the topic, but it doesn’t mean the rebalanced worker will reset the offset to zero.

app = faust.App(
  'my-app',
  broker=BROKER_LIST,
  store=STORE,
)

topics = []
master_topic = app.topic('master', value_type=Master, value_serializer='json')

@app.task()
async def on_start():
  for topic_partition in topics:
    print(f"Seeking for {topic_partition} due to start up")
    await app.consumer.seek(topic_partition, 0)
  

@app.on_partitions_assigned.connect
async def on_partitions_assigned(app: AppT, assigned: Set[TP], **kwargs) -> None:
  global topics
  topics=[]
  for topic_partition in assigned:
    topics.append(topic_partition)

Clearly the on_start function doesn’t do anything to the rebalanced worker because it’s already started, so I thought I could add an @app.on_rebalance_complete.connect with a function to run the same seek as the on_start function, but it doesn’t work.

A workaround was to use the @app.on_rebalance_complete.connect to set a global variable to True, and then set a timer method (@app.timer(1)) to read that variable and if it’s true, then perform the seek. The on_start function is now obsolete and this works – the agent will print the events from the first to the last, but it seems stupid to do it this way, and unnecessarily taxing.

Can anyone shed some light on how to perform the seek after a rebalance to actually force the zero offset and then read from that?

Also, if this is a compacting topic, what happens when offset zero is removed? Is there a better way to always read from the first event in the topics both on restart and on rebalance?

Asked By: Fonty

||

Answers:

While there is a way to modify the Faust code after installing to override the default behavior, it was deemed that the above method is likely the most workable solution is the one above.

Answered By: Fonty
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.