Now that we understand a little about how producers and consumers work,
let's discuss the semantic guarantees Kafka provides between producer
and consumer. Clearly there are multiple possible message delivery
guarantees that could be provided:
At most once—Messages may be lost but are never redelivered.
最多一次 --- 消息可能丢失，但绝不会重发。
At least once—Messages are never lost but may be redelivered.
至少一次 --- 消息绝不会丢失，但有可能重新发送。
Exactly once—this is what people actually want, each message is delivered once and only once.
正好一次 --- 这是人们真正想要的，每个消息传递一次且仅一次。
It's worth noting that this breaks down into two problems: the durability guarantees for publishing a message and the guarantees when consuming a message.
Many systems claim to provide "exactly once" delivery semantics, but it is important to read the fine print, most of these claims are misleading (i.e. they don't translate to the case where consumers or producers can fail, or cases where there are multiple consumer processes, or cases where data written to disk can be lost).
Kafka's semantics are straight-forward. When publishing a message we have a notion of the message being "committed" to the log. Once a published message is committed it will not be lost as long as one broker that replicates the partition to which this message was written remains "alive". The definition of alive as well as a description of which types of failures we attempt to handle will be described in more detail in the next section. For now let's assume a perfect, lossless broker and try to understand the guarantees to the producer and consumer. If a producer attempts to publish a message and experiences a network error it cannot be sure if this error happened before or after the message was committed. This is similar to the semantics of inserting into a database table with an autogenerated key.
kafka的语义是很直接的，我们有一个概念，当发布一条消息时，该消息 “committed（承诺）” 到了日志，一旦发布的消息是”承诺“的，只要副本分区写入了此消息的一个broker仍然"活着”，它就不会丢失。“活着”的定义以及描述的类型，我们处理失败的情况将在下一节中详细描述。现在让我们假设一个完美的不会丢消息的broker，并去了解如何保障生产者和消费者的，如果一个生产者发布消息并且正好遇到网络错误，就不能确定已提交的消息是否是在这个错误发生之前或之后。这类似于用自动生成key插入到一个数据库表。
These are not the strongest possible semantics for publishers. Although we cannot be sure of what happened in the case of a network error, it is possible to allow the producer to generate a sort of "primary key" that makes retrying the produce request idempotent. This feature is not trivial for a replicated system because of course it must work even (or especially) in the case of a server failure. With this feature it would suffice for the producer to retry until it receives acknowledgement of a successfully committed message at which point we would guarantee the message had been published exactly once. We hope to add this in a future Kafka version.
Not all use cases require such strong guarantees. For uses which are latency sensitive we allow the producer to specify the durability level it desires. If the producer specifies that it wants to wait on the message being committed this can take on the order of 10 ms. However the producer can also specify that it wants to perform the send completely asynchronously or that it wants to wait only until the leader (but not necessarily the followers) have the message.
Now let's describe the semantics from the point-of-view of the consumer. All replicas have the exact same log with the same offsets. The consumer controls its position in this log. If the consumer never crashed it could just store this position in memory, but if the consumer fails and we want this topic partition to be taken over by another process the new process will need to choose an appropriate position from which to start processing. Let's say the consumer reads some messages -- it has several options for processing the messages and updating its position.
It can read the messages, then save its position in the log, and
finally process the messages. In this case there is a possibility that
the consumer process crashes after saving its position but before saving
the output of its message processing. In this case the process that
took over processing would start at the saved position even though a few
messages prior to that position had not been processed. This
corresponds to "at-most-once" semantics as in the case of a consumer
failure messages may not be processed.
It can read the messages, process the messages, and finally
save its position. In this case there is a possibility that the consumer
process crashes after processing messages but before saving its
position. In this case when the new process takes over the first few
messages it receives will already have been processed. This corresponds
to the "at-least-once" semantics in the case of consumer failure. In
many cases messages have a primary key and so the updates are idempotent
(receiving the same message twice just overwrites a record with another
copy of itself).
So what about exactly once semantics (i.e. the thing you
actually want)? The limitation here is not actually a feature of the
messaging system but rather the need to co-ordinate the consumer's
position with what is actually stored as output. The classic way of
achieving this would be to introduce a two-phase commit between the
storage for the consumer position and the storage of the consumers
output. But this can be handled more simply and generally by simply
letting the consumer store its offset in the same place as its output.
This is better because many of the output systems a consumer might want
to write to will not support a two-phase commit. As an example of this,
our Hadoop ETL that populates data in HDFS stores its offsets in HDFS
with the data it reads so that it is guaranteed that either data and
offsets are both updated or neither is. We follow similar patterns for
many other data systems which require these stronger semantics and for
which the messages do not have a primary key to allow for deduplication.
So effectively Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward.
发表于: 1年前 最后更新时间: 9分钟前 游览量:4455