kafka应用程序与操作系统的冲洗管理

原创
半兽人 发表于: 2015-03-10   最后更新时间: 2019-11-09 13:40:32  
{{totalSubscript}} 订阅, 13,242 游览

Kafka always immediately writes all data to the filesystem and supports the ability to configure the flush policy that controls when data is forced out of the OS cache and onto disk using the and flush. This flush policy can be controlled to force data to disk after a period of time or after a certain number of messages has been written. There are several choices in this configuration.
Kafka一直都是立即把所有数据写入文件系统,并支持使用flush(冲洗)功能将数据从操作系统缓存冲洗到磁盘上。这个冲洗策略可控制在“一段时间之后”或“消息到一定数量之后”强制数据写入磁盘,在这个配置中有几个选择。

Kafka must eventually call fsync to know that data was flushed. When recovering from a crash for any log segment not known to be fsync'd Kafka will check the integrity of each message by checking its CRC and also rebuild the accompanying offset index file as part of the recovery process executed on startup.
Kafka最终必须调用fsync知道数据被刷新。 当从崩溃中恢复任何未知为fsync的日志段时,Kafka将通过检查每个消息的CRC来检查每个消息的完整性,并且还将重新生成伴随的offset索引文件,作为启动时执行的恢复过程的一部分。

Note that durability in Kafka does not require syncing data to disk, as a failed node will always recover from its replicas
注意,kafka的耐久性不需要同步数据到磁盘,因为失败的节点会从它的副本恢复。

We recommend using the default flush settings which disable application fsync entirely. This means relying on the background flush done by the OS and Kafka's own background flush. This provides the best of all worlds for most uses: no knobs to tune, great throughput and latency, and full recovery guarantees. We generally feel that the guarantees provided by replication are stronger than sync to local disk, however the paranoid still may prefer having both and application level fsync policies are still supported.
我们推荐使用默认的设置,完全禁用fsync应用。这意味着依赖操作系统和kafka自己的后台冲洗,最适合大多数使用:无需调整,大吞吐量和延迟,以及全面恢复保证,我们一般认为,通过副本提供的保证比同步到本地磁盘更强,但是,偏执狂仍然支持应用级fsync策略。

The drawback of using application level flush settings are that this is less efficient in it's disk usage pattern (it gives the OS less leeway to re-order writes) and it can introduce latency as fsync in most Linux filesystems blocks writes to the file whereas the background flushing does much more granular page-level locking.
使用应用程序级别刷新设置的缺点是它的磁盘使用模式效率较低(它给操作系统减少了重新排序写操作的余地),并且可能引入延迟,因为fsync在大多数Linux文件系统中阻塞写入文件,而后台刷新进行更细粒度的页面级锁定。

In general you don't need to do any low-level tuning of the filesystem, but in the next few sections we will go over some of this in case it is useful.
一般情况下你不需要做任何底层文件系统的调优,但在接下来的几节中,我们将讨论一些这样的情况。

更新于 2019-11-09

查看kafka更多相关的文章或提一个关于kafka的问题,也可以与我们一起分享文章