ivan.engineering
Light Dark

Lessons learned about Kafka's Auto Offset Reset

Tech

Lesson 1: With auto.offset.reset set to latest, you will miss the first few messages from a new topic if your consumer subscribes by pattern.

New topics will not be assigned straight away to the consumers subscribed to a matching pattern. According to the documentation, The pattern matching will be done periodically against topics existing at the time of check.

When the pattern matching reoccurs, the consumer, not knowing the offset it’s supposed to start from, will use the auto.offset.reset configuration. When set to latest, it will point it to the last message in the partition at that time, which might already be far from the partition beginning.

If the consumer uses a pattern, prefer auto.offset.reset set to earliest.

Lesson 2: With auto.offset.reset set to latest, “Offset Reset” only happens when a new message gets published to a partition.

Applying the first lesson means we can’t create a pattern consumer that will start from the latest offset and function well when new topics are created.

You might think an easy fix would be to create a consumer group with latest auto offset reset first, let it “meet” all partitions, and set it back to the safe earliest value? It can work, but there is a catch: Kafka consumer will communicate its Current Offset only when it sees a message in a partition.

In other words, for an offset to reset to the end of the partition, your consumer has to see at least one message in each partition while it is running with auto.offset.reset=latest.

The easiest way to check if a consumer has reset its offset in regards to a particular partition is by using the following CLI command:

bin/kafka-consumer-groups.sh --bootstrap-server ... --group ... --describe

If you see a dash in the CURRENT-OFFSET column, this partition has not yet been seen by your consumer.

While we’re touching CLI tools, it’s worth mentioning that, unfortunately, using bin/kafka-consumer-groups.sh --reset-offsets --to-latest will also only manipulate offsets where Current Offset has been set.

The simplest solution would be to first create your consumer with auto.offset.reset=earliest and let it run through partitions without any business logic attached. Once it’s at the end of all partitions - enable your business logic.