kafka生产者

半兽人 发表于: 2015-03-10   最后更新时间: 2017-01-09  
  •   40 订阅,3388 游览

负载平衡

The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier. To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriate direct its requests.

生产者将数据直接发送到分区leader的broker上(没有任何干预的路由层)。为了帮助producer做到这一点,Kafka所有节点都可应答给producer哪些服务器是正常的,哪些topic分区的leader允许producer在给定的时间内可以直接请求。


The client controls which partition it publishes messages to. This can be done at random, implementing a kind of random load balancing, or it can be done by some semantic partitioning function. We expose the interface for semantic partitioning by allowing the user to specify a key to partition by and using this to hash to a partition (there is also an option to override the partition function if need be). For example if the key chosen was a user id then all data for a given user would be sent to the same partition. This in turn will allow consumers to make locality assumptions about their consumption. This style of partitioning is explicitly designed to allow locality-sensitive processing in consumers.

客户端控制消息发布到哪个parition,可以随机,实现一种的随机负载平衡,或者也可以通过语义分区函数,我们公开接口,允许用户去指定分区的key和使用使用hash分区(如果需要,重写分区函数)。例如:如果选择的key是用户ID,然后对给定的用户ID的所有数据将被发送到这个分区。这种设计风格,让消费者对敏感性的消息局部处理。

异步发送 asynchronous send

Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer will attempt to accumulate data in memory and to send out larger batches in a single request. The batching can be configured to accumulate no more than a fixed number of messages and to wait no longer than some fixed latency bound (say 64k or 10 ms). This allows the accumulation of more bytes to send, and few larger I/O operations on the servers. This buffering is configurable and gives a mechanism to trade off a small amount of additional latency for better throughput.
批处理是效率的一大驱动力,kafka生产者使用批处理试图在内存中积累数据,在单个请求发送积累的大批量数据,可以配置批处理积累不大于一定的消息数,并等待时间不超过配置的延迟(64k 或 10毫秒)。这将允许积累更多消息应用于少数较大的I/O操作的服务器上,为了更好的吞吐量,这种缓存是可配置,并给出一种来权衡极少量额外的延迟的机制。


Details on configuration and api for the producer can be found elsewhere in the documentation.
生产者的配置和api的详细信息可以在其他文档中找到。







发表于: 1年前   最后更新时间: 12天前   游览量:3388
上一条: kafka效率
下一条: kafka消费者
评论…

  • To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriate direct its requests.
    为了帮助生产者获得所有Kafka节点的元数据,通过应答请求判断哪些服务器是活着,哪里的topic的分区的leader在给定的时间允许生产者直接请求。

    这个翻译的不太对吧,
    为了帮助producer做这个,所有的kafka节点都可以响应对一些元数据的请求,哪些服务器是活着的?哪些topic的partition的leader可以允许producer在给定的时间内直接请求?
  • 评论…
    • in this conversation
      提问