Kafka AvroConsumer consume from timestamp using offsets_for_times

Question:

Trying to use confluent_kafka.AvroConsumer to consume messages from a given time stamp.

if flag:

    # creating a list
    topic_partitons_to_search = list(
        map(lambda p: TopicPartition('my_topic2', p, int(time.time())), range(0, 1)))

    print("Searching for offsets with %s" % topic_partitons_to_search)
    offsets = c.offsets_for_times(topic_partitons_to_search, timeout=1.0)
    print("offsets_for_times results: %s" % offsets)

    for x in offsets:
        c.seek(x)
    flag=False 

console returns this

Searching for offsets with [TopicPartition{topic=my_topic2,partition=0,offset=1543584425,error=None}]
offsets_for_times results: [TopicPartition{topic=my_topic2,partition=0,offset=0,error=None}]
{'name': 'Hello'}
{'name': 'Hello'}
{'name': 'Hello1'}
{'name': 'Hello3'}
{'name': 'Hello3'}
{'name': 'Hello3'}
{'name': 'Hello3'}
{'name': 'Hello3'}
{'name': 'Offset 8'}
{'name': 'Offset 9'}
{'name': 'Offset 10'}
{'name': 'Offset 11'}
{'name': 'New'} 

These are all my messages in partition 0 of my_topic2 (have nothing in partition 1), we should get nothing back because we have no messages produced from current time (time.time()). I would then like to be able to use something like time.time() - 60000 to get all the messages in the last 60000 miliseconds

Asked By: AnonymousAlias

||

Answers:

Pythons time.time() returns the amount of seconds since the epoch, the offsets_for_times uses the amount of milliseconds from the epoch, so when I was sending in amount of seconds it was calculating a date much earlier than today which meant we should include all my offsets.

Answered By: AnonymousAlias

Instead of c.seek, you can manually assign an offset

for p, o in zip(topic_partitons_to_search, offsets):
   p.offset = o.offset
consumer.assign(topic_partitons_to_search)

Instead of using something like time.time() – 60000 you can use datetime+timedelta and convert it to timestamp

from datetime import datetime, timedelta
from_date = (datetime.now()) - timedelta(days=1) # e.g. 1 day
from_date_ts = int(from_date.timestamp() * 1000)  # millisecond timestamps

topic_partitons_to_search = list(map(lambda p: TopicPartition('my_topic2', p, from_date_ts), range(0, 1))))

(using isoformat instead of datetime+timedelta see How to consume messages in last N days using confluent-kafka-python?)

Answered By: Angela Heumann