We are working on a replacement for our existing producer. The code is
available in trunk now and can be considered beta quality. Below is the
configuration for the new producer.
of host/port pairs to use for establishing the initial connection to
the Kafka cluster. Data will be load balanced over all servers
irrespective of which servers are specified here for bootstrapping—this
list only impacts the initial hosts used to discover the full set of
servers. This list should be in the formhost1:port1,host2:port2,....
Since these servers are just used for the initial connection to
discover the full cluster membership (which may change dynamically),
this list need not contain the full set of servers (you may want more
than one, though, in case a server is down). If no server in this list
is available sending data will fail until on becomes available.
The number of
acknowledgments the producer requires the leader to have received before
considering a request complete. This controls the durability of
records that are sent. The following settings are common:
total bytes of memory the producer can use to buffer records waiting to
be sent to the server. If records are sent faster than they can be
delivered to the server the producer will either block or throw an
exception based on the preference specified byblock.on.buffer.full.
setting should correspond roughly to the total memory the producer will
use, but is not a hard bound since not all memory the producer uses is
used for buffering. Some additional memory will be used for compression
(if compression is enabled) as well as for maintaining in-flight
The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid values arenone,gzip, orsnappy.
Compression is of full batches of data, so the efficacy of batching
will also impact the compression ratio (more batching means better
Setting a value
greater than zero will cause the client to resend any record whose send
fails with a potentially transient error. Note that this retry is no
different than if the client resent the record upon receiving the error.
Allowing retries will potentially change the ordering of records
because if two records are sent to a single partition, and the first
fails and is retried but the second succeeds, then the second record may
producer will attempt to batch records together into fewer requests
whenever multiple records are being sent to the same partition. This
helps performance on both the client and the server. This configuration
controls the default batch size in bytes.
No attempt will be made to batch records larger than this size.
Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.
small batch size will make batching less common and may reduce
throughput (a batch size of zero will disable batching entirely). A very
large batch size may use memory a bit more wastefully as we will always
allocate a buffer of the specified batch size in anticipation of
string to pass to the server when making requests. The purpose of this
is to be able to track the source of requests beyond just ip/port by
allowing a logical application name to be included with the request. The
application can set any string it wants as this has no functional
purpose other than in logging and metrics.
producer groups together any records that arrive in between request
transmissions into a single batched request. Normally this occurs only
under load when records arrive faster than they can be sent out. However
in some circumstances the client may want to reduce the number of
requests even under moderate load. This setting accomplishes this by
adding a small amount of artificial delay—that is, rather than
immediately sending out a record the producer will wait for up to the
given delay to allow other records to be sent so that the sends can be
batched together. This can be thought of as analogous to Nagle's
algorithm in TCP. This setting gives the upper bound on the delay for
batching: once we getbatch.sizeworth of records for a
partition it will be sent immediately regardless of this setting,
however if we have fewer than this many bytes accumulated for this
partition we will 'linger' for the specified time waiting for more
records to show up. This setting defaults to 0 (i.e. no delay). Settinglinger.ms=5,
for example, would have the effect of reducing the number of requests
sent but would add up to 5ms of latency to records sent in the absense
maximum size of a request. This is also effectively a cap on the
maximum record size. Note that the server has its own cap on record size
which may be different from this. This setting will limit the number of
record batches the producer will send in a single request to avoid
sending huge requests.
The size of the TCP receive buffer to use when reading data
The size of the TCP send buffer to use when sending data
configuration controls the maximum amount of time the server will wait
for acknowledgments from followers to meet the acknowledgment
requirements the producer has specified with theacksconfiguration. If the requested number of acknowledgments are not met
when the timeout elapses an error will be returned. This timeout is
measured on the server side and does not include the network latency of
our memory buffer is exhausted we must either stop accepting new
records (block) or throw errors. By default this setting is true and we
block, however in some scenarios blocking is not desirable and it is
better to immediately give an error. Setting this tofalsewill accomplish that: the producer will throw a BufferExhaustedException if a recrord is sent and the buffer space is full.
（当我们内存缓存用尽时，必须停止接收新消息记录或者抛出错误。默认情况下，这个设置为真，然而某些阻塞可能不值得期待，因此立即抛出错误更好。设置为 false则会这样：producer会抛出一个异常错误：BufferExhaustedException， 如果记录已经发送同时缓存已满）
first time data is sent to a topic we must fetch metadata about that
topic to know which servers host the topic's partitions. This
configuration controls the maximum amount of time we will block waiting
for the metadata fetch to succeed before throwing an exception back to
period of time in milliseconds after which we force a refresh of
metadata even if we haven't seen any partition leadership changes to
proactively discover any new brokers or partitions.
A list of classes to use as metrics reporters. Implementing theMetricReporterinterface allows plugging in classes that will be notified of new
metric creation. The JmxReporter is always included to register JMX
The number of samples maintained to compute metrics.
metrics system maintains a configurable number of samples over a fixed
window size. This configuration controls the size of the window. For
example we might maintain two samples each measured over a 30 second
period. When a window expires we erase and overwrite the oldest window.
amount of time to wait before attempting to reconnect to a given host
when a connection fails. This avoids a scenario where the client
repeatedly attempts to connect to a host in a tight loop.
amount of time to wait before attempting to retry a failed produce
request to a given topic partition. This avoids repeated
sending-and-failing in a tight loop.
发表于: 1年前 最后更新时间: 10月前 游览量:8245