Kafka Broker配置

3.1 broker配置


The essential configurations are the following:

基本配置如下:
  • broker.id
  • log.dirs
  • zookeeper.connect
Topic-level configurations and defaults are discussed in more detail below.
下文将详细论述了主题级别配置和默认值。
Property Default Description
broker.id
Each broker is uniquely identified by a non-negative integer id. This id serves as the broker's "name" and allows the broker to be moved to a different host/port without confusing consumers. You can choose any number you like so long as it is unique.
(每一个broker都有一个唯一的id,这是一个非负整数,这个id就是broker的"名字",这样就允许broker迁移到别的机器而不会影响消费者。你可以选择任意一个数字,只要它是唯一的。)
log.dirs /tmp/kafka-logs A comma-separated list of one or more directories in which Kafka data is stored. Each new partition that is created will be placed in the directory which currently has the fewest partitions.
(一个用逗号分隔的目录列表,可以有多个,用来为Kafka存储数据。每当需要为一个新的partition分配一个目录时,会选择当前的存储partition最少的目录来存储。)
port 9092 The port on which the server accepts client connections.
(server用来接受client连接的端口。)
zookeeper.connect null Specifies the ZooKeeper connection string in the formhostname:port, where hostname and port are the host and port for a node in your ZooKeeper cluster. To allow connecting through other ZooKeeper nodes when that host is down you can also specify multiple hosts in the formhostname1:port1,hostname2:port2,hostname3:port3.

ZooKeeper also allows you to add a "chroot" path which will make all kafka data for this cluster appear under a particular path. This is a way to setup multiple Kafka clusters or other applications on the same ZooKeeper cluster. To do this give a connection string in the formhostname1:port1,hostname2:port2,hostname3:port3/chroot/pathwhich would put all this cluster's data under the path/chroot/path. Note that you must create this path yourself prior to starting the broker and consumers must use the same connection string.
(指定了ZooKeeper的connect string,以hostname:port的形式,hostname和port就是ZooKeeper集群各个节点的hostname和port。ZooKeeper集群中的某个节点可能会挂掉,所以可以指定多个节点的connect string。如下所式:
hostname1:port1,hostname2:port2,hostname3:port3 .
ZooKeeper也可以允许你指定一个"chroot"的路径,可以让Kafka集群将需要存储在ZooKeeper的数据存储到指定的路径下这可以让多个Kafka集群或其他应用程序公用同一个ZooKeeper集群。可以使用如下的connect string:
hostname1:port1,hostname2:port2,hostname3:port3/chroot/path
这样就可以讲这个集群的所有数据存放在/chroot/path路径下。注意在启动集群前,一定要先自己创建这个路径,consumer也得使用相同的connect string。)

message.max.bytes 1000000 The maximum size of a message that the server can receive. It is important that this property be in sync with the maximum fetch size your consumers use or else an unruly producer will be able to publish messages too large for consumers to consume.
(server能接收的一条消息的最大的大小。这个属性跟consumer使用的最大fetch大小是一致的,这很重要,否则一个不守规矩的producer会发送一个太大的消息。)
num.network.threads 3 The number of network threads that the server uses for handling network requests. You probably don't need to change this.
(处理网络的线程的数量,server端用来处理网络请求,一般不需要改变它。)
num.io.threads 8 The number of I/O threads that the server uses for executing requests. You should have at least as many threads as you have disks.
(server端处理请求时的I/O线程的数量,不要小于磁盘的数量。)
background.threads 10 The number of threads to use for various background processing tasks such as file deletion. You should not need to change this.
(用来处理各种不同的后台任务的线程数量,比如删除文件,一般不需要改变它。)
queued.max.requests 500 The number of requests that can be queued up for processing by the I/O threads before the network threads stop reading in new requests.
(I/O线程等待队列中的最大的请求数,超过这个数量,network线程就不会再接收一个新的请求。)
host.name null

Hostname of broker. If this is set, it will only bind to this address. If this is not set, it will bind to all interfaces, and publish one to ZK.
(broker的hostname,如果设置了它,会仅绑定这个地址。如果没有设置,则会绑定所有的网络接口,并提交一个给ZK。)

advertised.host.name null

If this is set this is the hostname that will be given out to producers, consumers, and other brokers to connect to.
(如果设置了这个hostname,会分发给所有的producer,consumer和其他broker来连接自己。)

advertised.port null

The port to give out to producers, consumers, and other brokers to use in establishing connections. This only needs to be set if this port is different from the port the server should bind to.
(分发这个端口给所有的producer,consumer和其他broker来建立连接。如果此端口跟server绑定的端口不同,则才有必要设置。)

socket.send.buffer.bytes 100 * 1024 The SO_SNDBUFF buffer the server prefers for socket connections.
(server端用来处理socket连接的SO_SNDBUFF缓冲大小。)
socket.receive.buffer.bytes 100 * 1024 The SO_RCVBUFF buffer the server prefers for socket connections.
(server端用来处理socket连接的SO_RCVBUFF缓冲大小。)
socket.request.max.bytes 100 * 1024 * 1024 The maximum request size the server will allow. This prevents the server from running out of memory and should be smaller than the Java heap size.
(server能接受的请求的最大的大小,这是为了防止server跑光内存,不能大于Java堆的大小。)
num.partitions 1 The default number of partitions per topic if a partition count isn't given at topic creation time.
(如果在创建topic的时候没有指定partition的数量,则使用这个值来设置。)
log.segment.bytes 1024 * 1024 * 1024 The log for a topic partition is stored as a directory of segment files. This setting controls the size to which a segment file will grow before a new segment is rolled over in the log. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(一个topic的一个partition对应的所有segment文件称为log。这个设置控制着一个segment文件的最大的大小,如果超过了此大小,就会生成一个新的segment文件。此配置可以被覆盖,参考 the per-topic configuration section。)
log.roll.{ms,hours} 24 * 7 hours This setting will force Kafka to roll a new log segment even if the log.segment.bytes size has not been reached. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(这个设置会强制Kafka去roll一个新的log segment文件,即使当前使用的segment文件的大小还没有超过log.segment.bytes。此配置可以被覆盖,参考 the per-topic configuration section。)
log.cleanup.policy delete This can take either the value delete or compact. If delete is set, log segments will be deleted when they reach the size or time limits set. If compact is set log compaction will be used to clean out obsolete records. This setting can be overridden on a per-topic basis (see the per-topic configuration section) .
(此配置可以设置成delete或compact。如果设置为delete,当log segment文件的大小达到上限,或者roll时间达到上限,文件将会被删除。如果设置成compact,则此文件会被清理,标记成已过时状态,详见 log compaction 。此配置可以被覆盖,参考 the per-topic configuration section。)
log.retention.{ms,minutes,hours} 7 days The amount of time to keep a log segment before it is deleted, i.e. the default data retention window for all topics. Note that if both log.retention.minutes and log.retention.bytes are both set we delete a segment when either limit is exceeded. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(在删除log文件之前,保存在磁盘的时间,单位为分钟,这是所有topic的默认值。注意如果同时设置了log.retention.minutes和log.retention.bytes,如果达到任意一个条件的限制,都会马上删掉。此配置可以被覆盖,参考 the per-topic configuration section。)
log.retention.bytes -1 The amount of data to retain in the log for each topic-partitions. Note that this is the limit per-partition so multiply by the number of partitions to get the total data retained for the topic. Also note that if both log.retention.hours and log.retention.bytes are both set we delete a segment when either limit is exceeded. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(topic每个分区的最大文件大小,一个topic的大小限制 = 分区数 * log.retention.bytes。-1没有大小限log.retention.bytes和log.retention.minutes任意一个达到要求,都会执行删除。此配置可以被覆盖,参考 the per-topic configuration section。)
log.retention.check.interval.ms 5 minutes The period with which we check whether any log segment is eligible for deletion to meet the retention policies.
(检查任意一个log segment文件是否需要进行retention处理的时间间隔。)
log.cleaner.enable false This configuration must be set to true for log compaction to run.
(设置为true就开启了log compaction功能。)
log.cleaner.threads 1 The number of threads to use for cleaning logs in log compaction.
(使用log compaction功能来清理log的线程的数量。)
log.cleaner.io.max.bytes.per.second Double.MaxValue The maximum amount of I/O the log cleaner can do while performing log compaction. This setting allows setting a limit for the cleaner to avoid impacting live request serving.
(在执行log compaction的过程中,限制了cleaner每秒钟I/O的数据量,以免cleaner影响正在执行的请求。)
log.cleaner.dedupe.buffer.size 500*1024*1024 The size of the buffer the log cleaner uses for indexing and deduplicating logs during cleaning. Larger is better provided you have sufficient memory.
(日志压缩去重时候的缓存空间,在空间允许的情况下,越大越好。)
log.cleaner.io.buffer.size 512*1024 The size of the I/O chunk used during log cleaning. You probably don't need to change this.
(日志清理时候用到的I/O块(chunk)大小,一般不需要修改。)
log.cleaner.io.buffer.load.factor 0.9 The load factor of the hash table used in log cleaning. You probably don't need to change this.
(日志清理中hash表的扩大因子,一般不需要修改。)
log.cleaner.backoff.ms 15000 The interval between checks to see if any logs need cleaning.
(检查log是否需要clean的时间间隔。)
log.cleaner.min.cleanable.ratio 0.5 This configuration controls how frequently the log compactor will attempt to clean the log (assuming log compaction is enabled). By default we will avoid cleaning a log where more than 50% of the log has been compacted. This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but will mean more wasted space in the log. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(控制了log compactor进行clean操作的频率。默认情况下,当log的50%以上已被clean时,就不用继续clean了。此配置可以被覆盖,参考 the per-topic configuration section。)
log.cleaner.delete.retention.ms 1 day The amount of time to retain delete tombstone markers for log compacted topics. This setting also gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan). This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(对于压缩的日志保留的最长时间,也是客户端消费消息的最长时间,同log.retention.minutes的区别在于一个控制未压缩数据,一个控制压缩后的数据,参考 the per-topic configuration section。)
log.index.size.max.bytes 10 * 1024 * 1024 The maximum size in bytes we allow for the offset index for each log segment. Note that we will always pre-allocate a sparse file with this much space and shrink it down when the log rolls. If the index fills up we will roll a new log segment even if we haven't reached the log.segment.bytes limit. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(每一个log segment文件的offset index文件的最大的size。注意总是预分配一个稀疏(sparse)文件,当roll这个文件时再shrink down。如果index文件被写满,那么就roll一个新的log segment文件,即使还没达到log.segment.byte限制。参考 the per-topic configuration section。)
log.index.interval.bytes 4096 The byte interval at which we add an entry to the offset index. When executing a fetch request the server must do a linear scan for up to this many bytes to find the correct position in the log to begin and end the fetch. So setting this value to be larger will mean larger index files (and a bit more memory usage) but less scanning. However the server will never add more than one index entry per log append (even if more than log.index.interval worth of messages are appended). In general you probably don't need to mess with this value.
(当执行一个fetch操作后,需要一定的空间来扫描最近的offset大小,设置越大,代表扫描速度越快,但是也更耗内存,一般情况下不需要改变这个参数。)
log.flush.interval.messages Long.MaxValue The number of messages written to a log partition before we force an fsync on the log. Setting this lower will sync data to disk more often but will have a major impact on performance. We generally recommend that people make use of replication for durability rather than depending on single-server fsync, however this setting can be used to be extra certain.
(在强制fsync一个partition的log文件之前暂存的消息数量。调低这个值会更频繁的sync数据到磁盘,影响性能。通常建议人家使用replication来确保持久性,而不是依靠单机上的fsync,但是这可以带来更多的可靠性。)
log.flush.scheduler.interval.ms Long.MaxValue The frequency in ms that the log flusher checks whether any log is eligible to be flushed to disk.
(log flusher检查是否需要把log刷到磁盘的时间间隔,单位为ms。)
log.flush.interval.ms Long.MaxValue The maximum time between fsync calls on the log. If used in conjuction with log.flush.interval.messages the log will be flushed when either criteria is met.
(2次fsync调用之间最大的时间间隔,单位为ms。即使log.flush.interval.messages没有达到,只要这个时间到了也需要调用fsync。)
log.delete.delay.ms 60000 The period of time we hold log files around after they are removed from the in-memory segment index. This period of time allows any in-progress reads to complete uninterrupted without locking. You generally don't need to change this.
(在log文件被移出索引后,log文件的保留时间。在这段时间内运行的任意正在进行的读操作完成操作,不用去打断它。通常不需要改变。)
log.flush.offset.checkpoint.interval.ms 60000 The frequency with which we checkpoint the last flush point for logs for recovery. You should not need to change this.
(记录上次把log刷到磁盘的时间点的频率,用来日后的recovery。通常不需要改变。)
log.segment.delete.delay.ms 60000 the amount of time to wait before deleting a file from the filesystem.
(等待的时间从文件系统中删除一个文件。)
auto.create.topics.enable true Enable auto creation of topic on the server. If this is set to true then attempts to produce data or fetch metadata for a non-existent topic will automatically create it with the default replication factor and number of partitions.
(是否允许自动创建topic。如果设为true,那么produce,consume或者fetch metadata一个不存在的topic时,就会自动创建一个默认replication factor和partition number的topic。)
controller.socket.timeout.ms 30000 The socket timeout for commands from the partition management controller to the replicas.
(partition管理控制器发向replica的命令的socket超时时间。)
controller.message.queue.size Int.MaxValue The buffer size for controller-to-broker-channels
(partition leader与replicas数据同步时的消息的队列大小。)
default.replication.factor 1 The default replication factor for automatically created topics.
(自动创建topic时的默认replication factor的个数。)
replica.lag.time.max.ms 10000 If a follower hasn't sent any fetch requests for this window of time, the leader will remove the follower from ISR (in-sync replicas) and treat it as dead.
(如果一个follower在有一个时间窗口内没有发送任意fetch请求,leader就会把这个follower从ISR(in-sync replicas)移除,并认为它已挂掉。)
replica.lag.max.messages 4000 If a replica falls more than this many messages behind the leader, the leader will remove the follower from ISR and treat it as dead.
(如果一个replica落后leader此配置指定的消息条数,leader就会把它移除ISR,并认为它挂掉。)
replica.socket.timeout.ms 30 * 1000 The socket timeout for network requests to the leader for replicating data.
(复制数据过程中,replica发送给leader的网络请求的socket超时时间。)
replica.socket.receive.buffer.bytes 64 * 1024 The socket receive buffer for network requests to the leader for replicating data.
(复制数据过程中,replica发送网络请求给leader的socket receiver buffer的大小。)
replica.fetch.max.bytes 1024 * 1024 The number of byes of messages to attempt to fetch for each partition in the fetch requests the replicas send to the leader.
(复制数据过程中,replica发送给leader的fetch请求试图获取数据的最大的字节数。)
replica.fetch.wait.max.ms 500 The maximum amount of time to wait time for data to arrive on the leader in the fetch requests sent by the replicas to the leader.
(复制数据过程中,为了fetch数据,replica发送请求给leader的最大的等待时间。)
replica.fetch.min.bytes 1 Minimum bytes expected for each fetch response for the fetch requests from the replica to the leader. If not enough bytes, wait up to replica.fetch.wait.max.ms for this many bytes to arrive.
(复制数据过程中,replica收到的每个fetch响应,期望的最小的字节数,如果没有收到足够的字节数,就会等待期望更多的数据,直到达到replica.fetch.wait.max.ms。)
num.replica.fetchers 1

Number of threads used to replicate messages from leaders. Increasing this value can increase the degree of I/O parallelism in the follower broker.
(用来从leader复制消息的线程数量,增大这个值可以增加follow的I/O并行度。)

replica.high.watermark.checkpoint.interval.ms 5000 The frequency with which each replica saves its high watermark to disk to handle recovery.
(每一个replica存储自己的high watermark到磁盘的频率,用来日后的recovery。)
fetch.purgatory.purge.interval.requests 1000 The purge interval (in number of requests) of the fetch request purgatory.
(含义暂不明,日后研究。The purge interval (in number of requests) of the fetch request purgatory.)
producer.purgatory.purge.interval.requests 1000 The purge interval (in number of requests) of the producer request purgatory.
(含义暂不明,日后研究。The purge interval (in number of requests) of the producer request purgatory.)
zookeeper.session.timeout.ms 6000 ZooKeeper session timeout. If the server fails to heartbeat to ZooKeeper within this period of time it is considered dead. If you set this too low the server may be falsely considered dead; if you set it too high it may take too long to recognize a truly dead server.
(ZooKeeper的session的超时时间,如果在这段时间内没有收到ZK的心跳,则会被认为该Kafka server挂掉了。如果把这个值设置得过低可能被误认为挂掉,如果设置得过高,如果真的挂了,则需要很长时间才能被server得知。)
zookeeper.connection.timeout.ms 6000 The maximum amount of time that the client waits to establish a connection to zookeeper.
(client连接到ZK server的超时时间。)
zookeeper.sync.time.ms 2000 How far a ZK follower can be behind a ZK leader.
(一个ZK follower能落后leader多久。)
controlled.shutdown.enable true Enable controlled shutdown of the broker. If enabled, the broker will move all leaders on it to some other brokers before shutting itself down. This reduces the unavailability window during shutdown.
(如果为true,在关闭一个broker前,会把当前broker上的所有partition,如果有为leader的话,会把leader权交给其他broker上的相应的partition。这会降低在关闭期间不可用的时间窗口。)
controlled.shutdown.max.retries 3 Number of retries to complete the controlled shutdown successfully before executing an unclean shutdown.
(在执行一个unclean(强行关闭?)的关闭操作前,为了成功完成关闭操作,最大的重试次数。)
controlled.shutdown.retry.backoff.ms 5000 Backoff time between shutdown retries.
(在关闭重试期间的回退(backoff)时间。)
auto.leader.rebalance.enable true If this is enabled the controller will automatically try to balance leadership for partitions among the brokers by periodically returning leadership to the "preferred" replica for each partition if it is available.
(如果设为true,复制控制器会周期性的自动尝试,为所有的broker的每个partition平衡leadership,为更优先(preferred)的replica分配leadership。)
leader.imbalance.per.broker.percentage 10 The percentage of leader imbalance allowed per broker. The controller will rebalance leadership if this ratio goes above the configured value per broker.
(每个broker允许的不平衡的leader的百分比。如果每个broker超过了这个百分比,复制控制器会重新平衡leadership。)
leader.imbalance.check.interval.seconds 300 The frequency with which to check for leader imbalance.
(检测leader不平衡的时间间隔。)
offset.metadata.max.bytes 4096 The maximum amount of metadata to allow clients to save with their offsets.
(允许client(消费者)保存它们元数据(offset)的最大的数据量。)
max.connections.per.ip Int.MaxValue The maximum number of connections that a broker allows from each ip address.
(一个broker允许从每个ip地址连接的最大数目)
max.connections.per.ip.overrides
Per-ip or hostname overrides to the default maximum number of connections.
(每个IP或主机名会覆盖默认的最大连接数)
connections.max.idle.ms 600000 Idle connections timeout: the server socket processor threads close the connections that idle more than this.
(空闲连接超时: 服务器socket处理器线程关闭更多空闲的连接。)
log.roll.jitter.{ms,hours} 0 The maximum jitter to subtract from logRollTimeMillis.
(最大抖动从logRollTimeMillis减去。)
num.recovery.threads.per.data.dir 1 The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
(每个数据目录,用于在启动时的日志恢复和冲洗在关机时的线程数目。)
unclean.leader.election.enable true Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss.
(指示是否启用复制副本不在 ISR 中的设置被选为leader作为最后的手段,即使这样做可能会导致数据丢失。)
delete.topic.enable false Enable delete topic.
(启用删除Topic。)
offsets.topic.num.partitions 50 The number of partitions for the offset commit topic. Since changing this after deployment is currently unsupported, we recommend using a higher setting for production (e.g., 100-200).
(偏移的提交topic的分区数目。 由于目前不支持部署之后改变,我们建议您使用生产较高的设置(例如,100-200)。)
offsets.topic.retention.minutes 1440 Offsets that are older than this age will be marked for deletion. The actual purge will occur when the log cleaner compacts the offsets topic.
(偏移量早于这个时间将被标记为删除。当日志压缩偏移量的topic时,就会真正的清除。)
offsets.retention.check.interval.ms 600000 The frequency at which the offset manager checks for stale offsets.
(offset manager检查旧offset的频率)
offsets.topic.replication.factor 3 The replication factor for the offset commit topic. A higher setting (e.g., three or four) is recommended in order to ensure higher availability. If the offsets topic is created when fewer brokers than the replication factor then the offsets topic will be created with fewer replicas.
(复制因子的offset提交topic。较高的设置(例如三个或四个),建议以确保更高的可用性。如果offset topic创建时,broker比复制因子少,offset topic将以较少的副本创建。)
offsets.topic.segment.bytes 104857600 Segment size for the offsets topic. Since it uses a compacted topic, this should be kept relatively low in order to facilitate faster log compaction and loads.
(offset topic的Segment大小。因为它使用压缩的topic,所有Sgment的大小应该保持小一点,以促进更快的日志压实和负载。)
offsets.load.buffer.size 5242880 An offset load occurs when a broker becomes the offset manager for a set of consumer groups (i.e., when it becomes a leader for an offsets topic partition). This setting corresponds to the batch size (in bytes) to use when reading from the offsets segments when loading offsets into the offset manager's cache.
(当一个broker成为一个消费组的offset manager时(也就是说,当它成为一个offset的topic分区的leader)。此设置对应于offset manager在读取缓存offset segment的批量大小(以字节为单位)。)
offsets.commit.required.acks -1 The number of acknowledgements that are required before the offset commit can be accepted. This is similar to the producer's acknowledgement setting. In general, the default should not be overridden.
(在offset的commit之前,需要设置确认的数目,一般情况下,不应重写默认值。)
offsets.commit.timeout.ms 5000 The offset commit will be delayed until this timeout or the required number of replicas have received the offset commit. This is similar to the producer request timeout.
(offset commit会延迟,直至此超时或所需的副本数都收到offset commit,这类似于producer请求的超时。)


More details about broker configuration can be found in the scala classkafka.server.KafkaConfig.
(Scala classkafka.server.KafkaConfig 中,可以发现有关broker配置的更多详细信息。)

Topic-level configuration(topic-level配置)

Configurations pertinent to topics have both a global default as well an optional per-topic override. If no per-topic configuration is given the global default is used. The override can be set at topic creation time by giving one or more--configoptions. This example creates a topic named my-topic with a custom max message size and flush rate: 
有关topics的配置既有全局的又有每个topic独有的配置。如果没有给定特定topic设置,则应用默认的全局设置。这些覆盖会在每次创建topic发生。下面的例子:创建一个topic,命名为my-topic,自定义最大消息尺寸以及刷新比率为:
 > bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 
        --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1 
Overrides can also be changed or set later using the alter topic command. This example updates the max message size for my-topic:
此外可以更改或稍后使用 alter 主题命令设置重写。本示例更新my-topic的最大消息大小:
 > bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic 
    --config max.message.bytes=128000 
To remove an override you can do
要删除一个覆盖,你可以做
 > bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic 
    --deleteConfig max.message.bytes 
The following are the topic-level configurations. The server's default configuration for this property is given under the Server Default Property heading, setting this default in the server config allows you to change the default given to topics that have no override specified.
以下是topic级别配置。此属性的服务器默认配置是根据服务器的默认属性给出的,允许您更改默认给有没有重写指定的主题。
Property Default Server Default Property Description
cleanup.policy delete log.cleanup.policy A string that is either "delete" or "compact". This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.
(日志清理策略选择有:delete和compact主要针对过期数据的处理,或是日志文件达到限制的额度,默认为delete,删除或压缩。
delete.retention.ms 86400000 (24 hours) log.cleaner.delete.retention.ms The amount of time to retain delete tombstone markers for log compacted topics. This setting also gives a bound on the time in which a consumer must complete a read if they begin from offset 0 to ensure that they get a valid snapshot of the final stage (otherwise delete tombstones may be collected before they complete their scan).
(对于压缩日志保留的最长时间,也是客户端消费消息的最长时间,通log.retention.minutes的区别在于一个控制未压缩数据,一个控制压缩后的数据。此项配置可以在topic创建时的置顶参数覆盖
flush.messages None log.flush.interval.messages This setting allows specifying an interval at which we will force an fsync of data written to the log. For example if this was set to 1 we would fsync after every message; if it were 5 we would fsync after every five messages. In general we recommend you not set this and use replication for durability and allow the operating system's background flush capabilities as it is more efficient. This setting can be overridden on a per-topic basis (see the per-topic configuration section).
(此项配置指定时间间隔:强制进行fsync日志。例如,如果这个选项设置为1,那么每条消息之后都需要进行fsync,如果设置为5,则每5条消息就需要 进行一次fsync。一般来说,建议你不要设置这个值。此参数的设置,需要在"数据可靠性"与"性能"之间做必要的权衡.如果此值过大,将会导致每 次"fsync"的时间较长(IO阻塞),如果此值过小,将会导致"fsync"的次数较多,这也意味着整体的client请求有一定的延迟.物理 server故障,将会导致没有fsync的消息丢失.
flush.ms None log.flush.interval.ms This setting allows specifying a time interval at which we will force an fsync of data written to the log. For example if this was set to 1000 we would fsync after 1000 ms had passed. In general we recommend you not set this and use replication for durability and allow the operating system's background flush capabilities as it is more efficient.
(此项配置用来置顶强制进行fsync日志到磁盘的时间间隔;例如,如果设置为1000,那么每1000ms就需要进行一次fsync。一般不建议使用这个选项)
index.interval.bytes 4096 log.index.interval.bytes This setting controls how frequently Kafka adds an index entry to it's offset index. The default setting ensures that we index a message roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position in the log but makes the index larger. You probably don't need to change this.
默认设置保证了我们每4096个字节就对消息添加一个索引,更多的索引使得阅读的消息更加靠近,但是索引规模却会由此增大;一般不需要改变这个选项
max.message.bytes 1,000,000 message.max.bytes This is largest message size Kafka will allow to be appended to this topic. Note that if you increase this size you must also increase your consumer's fetch size so they can fetch messages this large.
(kafka追加消息的最大尺寸。注意如果你增大这个尺寸,你也必须增大你consumer的fetch 尺寸,这样consumer才能fetch到这些最大尺寸的消息。
min.cleanable.dirty.ratio 0.5 log.cleaner.min.cleanable.ratio This configuration controls how frequently the log compactor will attempt to clean the log (assuming log compaction is enabled). By default we will avoid cleaning a log where more than 50% of the log has been compacted. This ratio bounds the maximum space wasted in the log by duplicates (at 50% at most 50% of the log could be duplicates). A higher ratio will mean fewer, more efficient cleanings but will mean more wasted space in the log.
这项配置控制log  compactor试图清理日志的频率(假定log compaction是打开的)。默认避免清理压缩超过50%的日志。这个比率绑定了备份日志所消耗的最大空间(50%的日志备份时压缩率为50%)。更高的比率则意味着浪费消耗更少,也就可以更有效的清理更多的空间。这项设置在每个topic设置中可以覆盖。
查看the per-topic  configuration section
min.insync.replicas 1 min.insync.replicas When a producer sets request.required.acks to -1, min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and request.required.acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with request.required.acks of -1. This will ensure that the producer raises an exception if a majority of replicas do not receive a write. (当producer设置request.required.acks为-1时,min.insync.replicas指定replicas的最小数目(必须确认每一个repica的写数据都是成功的),如果这个数目没有达到,producer会产生异常。)
retention.bytes None log.retention.bytes This configuration controls the maximum size a log can grow to before we will discard old log segments to free up space if we are using the "delete" retention policy. By default there is no size limit only a time limit.
(如果使用“delete”的retention 策略,这项配置就是指在删除日志之前,日志所能达到的最大尺寸。默认情况下,没有尺寸限制而只有时间限制)
retention.ms 7 days log.retention.minutes This configuration controls the maximum time we will retain a log before we will discard old log segments to free up space if we are using the "delete" retention policy. This represents an SLA on how soon consumers must read their data.
(如果使用“delete”的retention策略,这项配置就是指删除日志前日志保存的时间。)
segment.bytes 1 GB log.segment.bytes This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
(kafka中log日志是分成一块块存储的,此配置是指log日志划分成块的大小。)
segment.index.bytes 10 MB log.index.size.max.bytes This configuration controls the size of the index that maps offsets to file positions. We preallocate this index file and shrink it only after log rolls. You generally should not need to change this setting.
(此配置是有关offsets和文件位置之间映射的索引文件的大小;一般不需要修改这个配置)
segment.ms 7 days log.roll.hours This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn't full to ensure that retention can delete or compact old data.
(即使log的分块文件没有达到需要删除、压缩的大小,一旦log 的时间达到这个上限,就会强制新建一个log分块文件)
segment.jitter.ms 0 log.roll.jitter.{ms,hours} The maximum jitter to subtract from logRollTimeMillis.
(最大抖动从logRollTimeMillis减去)







发表于: 3年前   最后更新时间: 1年前   游览量:22301
上一条: KafkaStreams客户端(0.10.0.0 API)
下一条: Consumer配置

评论…


  • This configuration must be set to true for log compaction to run.
    (设置为true就开启了log compaction功能。)
    应该改为 如果开启了日志压缩,则此项必须设置为true
    好详细,学习了,楼主辛苦了
    这个配置文件很详细,很有用啊。
  • 评论…
    • in this conversation