3.4 新生产者配置
We are working on a replacement for our existing producer. The code is
available in trunk now and can be considered beta quality. Below is the
configuration for the new producer.
我们正在研究一种替换现有的producer,现在代码在trunk中是可用的,下面是配置新的producer。
Name | Type | Default | Importance | Description |
---|---|---|---|---|
bootstrap.servers | list |
|
high |
A list
of host/port pairs to use for establishing the initial connection to
the Kafka cluster. Data will be load balanced over all servers
irrespective of which servers are specified here for bootstrapping—this
list only impacts the initial hosts used to discover the full set of
servers. This list should be in the formhost1:port1,host2:port2,....
Since these servers are just used for the initial connection to
discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more
than one, though, in case a server is down). If no server in this list
is available sending data will fail until on becomes available. (用于建立与kafka集群连接的host/port组。数据将会在所有servers上均衡加载,不管哪些server是指定用于bootstrapping。这个列表仅仅影响初始化的hosts(用于发现全部的servers)。这个列表格式: host1:port1,host2:port2,... 因为这些server仅仅是用于初始化的连接,以发现集群所有成员关系(可能会动态的变化),这个列表不需要包含所有的servers(你可能想要不止一 个server,尽管这样,可能某个server宕机了)。如果没有server在这个列表出现,则发送数据会一直失败,直到列表可用。) |
acks | string | 1 | high |
The number of
acknowledgments the producer requires the leader to have received before
considering a request complete. This controls the durability of
records that are sent. The following settings are common:
(producer需要server接收到数据之后发出的确认接收的信号,此项配置就是指procuder需要多少个这样的确认信号。此配置实际上代表了数据备份的可用性。以下设置为常用选项: |
buffer.memory | long | 33554432 | high |
The
total bytes of memory the producer can use to buffer records waiting to
be sent to the server. If records are sent faster than they can be
delivered to the server the producer will either block or throw an
exception based on the preference specified byblock.on.buffer.full.
This
setting should correspond roughly to the total memory the producer will
use, but is not a hard bound since not all memory the producer uses is
used for buffering. Some additional memory will be used for compression
(if compression is enabled) as well as for maintaining in-flight
requests. |
compression.type | string | none | high |
The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid values arenone,gzip, orsnappy.
Compression is of full batches of data, so the efficacy of batching
will also impact the compression ratio (more batching means better
compression). (producer用于压缩数据的压缩类型。默认是无压缩。正确的选项值是none、gzip、snappy。 压缩最好用于批量处理,批量处理消息越多,压缩性能越好。) |
retries | int | 0 | high |
Setting a value
greater than zero will cause the client to resend any record whose send
fails with a potentially transient error. Note that this retry is no
different than if the client resent the record upon receiving the error.
Allowing retries will potentially change the ordering of records
because if two records are sent to a single partition, and the first
fails and is retried but the second succeeds, then the second record may
appear first. (设置大于0的值将使客户端重新发送任何数据,一旦这些数据发送失败。注意,这些重试与客户端接收到发送错误时的重试没有什么不同。允许重试将潜在的改变数据的顺序,如果这两个消息记录都是发送到同一个partition,则第一个消息失败第二个发送成功,则第二条消息会比第一条消息出现要早。) |
batch.size | int | 16384 | medium |
The
producer will attempt to batch records together into fewer requests
whenever multiple records are being sent to the same partition. This
helps performance on both the client and the server. This configuration
controls the default batch size in bytes.
No attempt will be made to batch records larger than this size. Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.
A
small batch size will make batching less common and may reduce
throughput (a batch size of zero will disable batching entirely). A very
large batch size may use memory a bit more wastefully as we will always
allocate a buffer of the specified batch size in anticipation of
additional records. |
client.id | string |
|
medium |
The id
string to pass to the server when making requests. The purpose of this
is to be able to track the source of requests beyond just ip/port by
allowing a logical application name to be included with the request. The
application can set any string it wants as this has no functional
purpose other than in logging and metrics. (当向server发出请求时,这个字符串会发送给server。目的是能够追踪请求源头,以此来允许ip/port许可列表之外的一些应用可以发送信息。这项应用可以设置任意字符串,因为没有任何功能性的目的,除了记录和跟踪) |
linger.ms | long | 0 | medium |
The
producer groups together any records that arrive in between request
transmissions into a single batched request. Normally this occurs only
under load when records arrive faster than they can be sent out. However
in some circumstances the client may want to reduce the number of
requests even under moderate load. This setting accomplishes this by
adding a small amount of artificial delay—that is, rather than
immediately sending out a record the producer will wait for up to the
given delay to allow other records to be sent so that the sends can be
batched together. This can be thought of as analogous to Nagle's
algorithm in TCP. This setting gives the upper bound on the delay for
batching: once we getbatch.sizeworth of records for a
partition it will be sent immediately regardless of this setting,
however if we have fewer than this many bytes accumulated for this
partition we will 'linger' for the specified time waiting for more
records to show up. This setting defaults to 0 (i.e. no delay). Settinglinger.ms=5,
for example, would have the effect of reducing the number of requests
sent but would add up to 5ms of latency to records sent in the absense
of load. |
max.request.size | int | 1048576 | medium |
The
maximum size of a request. This is also effectively a cap on the
maximum record size. Note that the server has its own cap on record size
which may be different from this. This setting will limit the number of
record batches the producer will send in a single request to avoid
sending huge requests. (请求的最大字节数。这也是对最大记录尺寸的有效覆盖。注意:server具有自己对消息记录尺寸的覆盖,这些尺寸和这个设置不同。此项设置将会限制producer每次批量发送请求的数目,以防发出巨量的请求。) |
receive.buffer.bytes | int | 32768 | medium |
The size of the TCP receive buffer to use when reading data (TCP receive缓存大小,当阅读数据时使用) |
send.buffer.bytes | int | 131072 | medium |
The size of the TCP send buffer to use when sending data (TCP send缓存大小,当发送数据时使用) |
timeout.ms | int | 30000 | medium |
The
configuration controls the maximum amount of time the server will wait
for acknowledgments from followers to meet the acknowledgment
requirements the producer has specified with theacksconfiguration. If the requested number of acknowledgments are not met
when the timeout elapses an error will be returned. This timeout is
measured on the server side and does not include the network latency of
the request. (此配置选项控制server等待来自followers的确认的最大时间。如果确认的请求数目在此时间内没有实现,则会返回一个错误。这个超时限制是以server端度量的,没有包含请求的网络延迟) |
block.on.buffer.full | boolean | true | low |
When
our memory buffer is exhausted we must either stop accepting new
records (block) or throw errors. By default this setting is true and we
block, however in some scenarios blocking is not desirable and it is
better to immediately give an error. Setting this tofalsewill accomplish that: the producer will throw a BufferExhaustedException if a recrord is sent and the buffer space is full. (当我们内存缓存用尽时,必须停止接收新消息记录或者抛出错误。默认情况下,这个设置为真,然而某些阻塞可能不值得期待,因此立即抛出错误更好。设置为 false则会这样:producer会抛出一个异常错误:BufferExhaustedException, 如果记录已经发送同时缓存已满) |
metadata.fetch.timeout.ms | long | 60000 | low |
The
first time data is sent to a topic we must fetch metadata about that
topic to know which servers host the topic's partitions. This
configuration controls the maximum amount of time we will block waiting
for the metadata fetch to succeed before throwing an exception back to
the client. (是指我们所获取的一些元素据的第一个时间数据。元素据包含:topic,host,partitions。此项配置是指当等待元素据fetch成功完成所需要的时间,否则会跑出异常给客户端。) |
metadata.max.age.ms | long | 300000 | low |
The
period of time in milliseconds after which we force a refresh of
metadata even if we haven't seen any partition leadership changes to
proactively discover any new brokers or partitions. (以微秒为单位的时间,是在我们强制更新metadata的时间间隔。即使我们没有看到任何partition leadership改变。) |
metric.reporters | list | [] | low |
A list of classes to use as metrics reporters. Implementing theMetricReporterinterface allows plugging in classes that will be notified of new
metric creation. The JmxReporter is always included to register JMX
statistics. (类的列表,用于衡量指标。实现MetricReporter接口,将允许增加一些类,这些类在新的衡量指标产生时就会改变。JmxReporter总会包含用于注册JMX统计) |
metrics.num.samples | int | 2 | low |
The number of samples maintained to compute metrics. (用于维护metrics的样本数) |
metrics.sample.window.ms | long | 30000 | low |
The
metrics system maintains a configurable number of samples over a fixed
window size. This configuration controls the size of the window. For
example we might maintain two samples each measured over a 30 second
period. When a window expires we erase and overwrite the oldest window. (metrics系统维护可配置的样本数量,在一个可修正的window size。这项配置配置了窗口大小,例如。我们可能在30s的期间维护两个样本。当一个窗口推出后,我们会擦除并重写最老的窗口) |
reconnect.backoff.ms | long | 10 | low |
The
amount of time to wait before attempting to reconnect to a given host
when a connection fails. This avoids a scenario where the client
repeatedly attempts to connect to a host in a tight loop. (连接失败时,当我们重新连接时的等待时间。这避免了客户端反复重连) |
retry.backoff.ms | long | 100 | low |
The
amount of time to wait before attempting to retry a failed produce
request to a given topic partition. This avoids repeated
sending-and-failing in a tight loop. (在试图重试失败的produce请求之前的等待时间。避免陷入发送-失败的死循环中。) |