Ext4 may or may not be the best filesystem for Kafka. Filesystems like XFS supposedly handle locking during fsync better. We have only tried Ext4, though.
EXT4 is a serviceable choice of filesystem for the Kafka data directories, however getting the most performance out of it will require adjusting several mount options. In addition, these options are generally unsafe in a failure scenario, and will result in much more data loss and corruption. For a single broker failure, this is not much of a concern as the disk can be wiped and the replicas rebuilt from the cluster. In a multiple-failure scenario, such as a power outage, this can mean underlying filesystem (and therefore data) corruption that is not easily recoverable. The following options can be adjusted:
It is not necessary to tune these settings, however those wanting to optimize performance have a few knobs that will help:
data=writeback: Ext4 defaults to data=ordered which puts a strong
order on some writes. Kafka does not require this ordering as it does
very paranoid data recovery on all unflushed log. This setting removes
the ordering constraint and seems to significantly reduce latency.
Disabling journaling: Journaling is a tradeoff: it makes
reboots faster after server crashes but it introduces a great deal of
additional locking which adds variance to write performance. Those who
don't care about reboot time and want to reduce a major source of write
latency spikes can turn off journaling entirely.
commit=num_secs: This tunes the frequency with which ext4
commits to its metadata journal. Setting this to a lower value reduces
the loss of unflushed data during a crash. Setting this to a higher
value will improve throughput.
nobh: This setting controls additional ordering guarantees
when using data=writeback mode. This should be safe with Kafka as we do
not depend on write ordering and improves throughput and latency.
delalloc: Delayed allocation means that the filesystem avoid
allocating any blocks until the physical write occurs. This allows ext4
to allocate a large extent instead of smaller pages and helps ensure the
data is written sequentially. This feature is great for throughput. It
does seem to involve some locking in the filesystem which adds a bit of