半兽人 发表于: 2015-03-10   最后更新时间: 2016-04-09  
  •   40 订阅,1544 游览


Kafka always immediately writes all data to the filesystem and supports the ability to configure the flush policy that controls when data is forced out of the OS cache and onto disk using the and flush. This flush policy can be controlled to force data to disk after a period of time or after a certain number of messages has been written. There are several choices in this configuration.


Kafka must eventually call fsync to know that data was flushed. When recovering from a crash for any log segment not known to be fsync'd Kafka will check the integrity of each message by checking its CRC and also rebuild the accompanying offset index file as part of the recovery process executed on startup.

kafka最终必须调用fsync知道数据被写入。当从崩溃中恢复未知的任何日志部分时是Kafka fsync通过检查CRC来检查每条消息的完整性并也重建偏移量,这些都是在启动执行恢复过程的一部分。

Note that durability in Kafka does not require syncing data to disk, as a failed node will always recover from its replicas


We recommend using the default flush settings which disable application fsync entirely. This means relying on the background flush done by the OS and Kafka's own background flush. This provides the best of all worlds for most uses: no knobs to tune, great throughput and latency, and full recovery guarantees. We generally feel that the guarantees provided by replication are stronger than sync to local disk, however the paranoid still may prefer having both and application level fsync policies are still supported.


The drawback of using application level flush settings are that this is less efficient in it's disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency as fsync in most Linux filesystems blocks writes to the file whereas the background flushing does much more granular page-level locking.


In general you don't need to do any low-level tuning of the filesystem, but in the next few sections we will go over some of this in case it is useful.


发表于: 1年前   最后更新时间: 9月前   游览量:1544
上一条: kafka磁盘和文件系统
下一条: kafka了解Linux操作系统的冲洗行为

  • 评论…
    • in this conversation