kafka效率

半兽人 发表于: 2015-03-10   最后更新时间: 2016-10-27  
  •   40 订阅,3315 游览
We have put significant effort into efficiency. One of our primary use cases is handling web activity data, which is very high volume: each page view may generate dozens of writes. Furthermore we assume each message published is read by at least one consumer (often many), hence we strive to make consumption as cheap as possible.

我们已经把精力投入到效率中。我们主要使用案例之一处理 web 活动数据,这是非常高的容量: 每个页面视图模式下可以生成几十个写入操作。此外,我们假设每个发布的消息读至少一名消费者 (通常很多),因此我们努力使消费尽可能的廉价。


We have also found, from experience building and running a number of similar systems, that efficiency is a key to effective multi-tenant operations. If the downstream infrastructure service can easily become a bottleneck due to a small bump in usage by the application, such small changes will often create problems. By being very fast we help ensure that the application will tip-over under load before the infrastructure. This is particularly important when trying to run a centralized service that supports dozens or hundreds of applications on a centralized cluster as changes in usage patterns are a near-daily occurrence.
我们还发现,从经验构建和运行的一些类似的系统,其效率的关键是有效的多租户操作。


We discussed disk efficiency in the previous section. Once poor disk access patterns have been eliminated, there are two common causes of inefficiency in this type of system: too many small I/O operations, and excessive byte copying.
我们在上一节中讨论的磁盘的效率。一旦磁盘穷访问模式被淘汰,有效率的两个常见原因在这种类型的系统:太多的小的I / O操作,和过度的字节复制。


The small I/O problem happens both between the client and the server and in the server's own persistent operations.
client和server之间和在服务器的持续行动中,就会发生小的 I/O 问题。

To avoid this, our protocol is built around a "message set" abstraction that naturally groups messages together. This allows network requests to group messages together and amortize the overhead of the network roundtrip rather than sending a single message at a time. The server in turn appends chunks of messages to its log in one go, and the consumer fetches large linear chunks at a time.
为了避免这种情况,我们把这些消息集合在一起,这样减少网络请求的往返,而不是一次发送单个消息。


This simple optimization produces orders of magnitude speed up. Batching leads to larger network packets, larger sequential disk operations, contiguous memory blocks, and so on, all of which allows Kafka to turn a bursty stream of random message writes into linear writes that flow to the consumers.
这个简单优化生产数量级速度了。所有的批处理导致较大的网络数据包,较大的连续的磁盘操作,连续内存块,等等,使kafka由随机消息写入突发流变成线性流到消费者的写入。


The other inefficiency is in byte copying. At low message rates this is not an issue, but under load the impact is significant. To avoid this we employ a standardized binary message format that is shared by the producer, the broker, and the consumer (so data chunks can be transferred without modification between them).
其他的低效率是在字节复制和消息上,这不是一个问题,但是在负载下的影响是显著的。为了避免这种情况,我们采用了由producer、broker,和consumer共享一个标准化的二进制消息格式(这样数据块就可以在它们之间自由传输,不需要转换格式)


The message log maintained by the broker is itself just a directory of files, each populated by a sequence of message sets that have been written to disk in the same format used by the producer and consumer. Maintaining this common format allows optimization of the most important operation: network transfer of persistent log chunks. Modern unix operating systems offer a highly optimized code path for transferring data out of pagecache to a socket; in Linux this is done with the sendfile system call.
由broker保存的message log本身只是一个目录文件,这些消息已经被写到磁盘中,producer和consumer所使用的相同格式的消息集。这种抽象允许单一一个字节可以被broker和消费者所分享(某种程度上生产者也可以,尽管生产者那头的消息只有再被计算过校验和之后才会加入到日志中去)。维护这样的通用格式对可以对大多数重要的操作进行优化:持久日志数据块的网络传输。现在的Unix操作系统提供一种高优化的代码路径将数据从页缓存传到一个套接字(socket);在Linux中,这可以通过调用sendfile系统调用来完成。Java提供了访问这个系统调用的方法:FileChannel.transferTo api。


To understand the impact of sendfile, it is important to understand the common data path for transfer of data from file to socket:
为了理解sendfile的影响,需要理解一般的将数据从文件传到套接字的路径:

  1. The operating system reads data from the disk into pagecache in kernel space
    操作系统将数据从磁盘读到内核空间的页缓存中
  2. The application reads the data from kernel space into a user-space buffer
    应用将数据从内核空间读到用户空间的缓存中
  3. The application writes the data back into kernel space into a socket buffer
    应用将数据写回内存空间的套接字缓存中
  4. The operating system copies the data from the socket buffer to the NIC buffer where it is sent over the network
    操作系统将数据从套接字缓存写到网卡缓存中,以便将数据经网络发出

This is clearly inefficient, there are four copies and two system calls. Using sendfile, this re-copying is avoided by allowing the OS to send the data from pagecache to the network directly. So in this optimized path, only the final copy to the NIC buffer is needed.
这样做明显是低效的,这里有四次拷贝,两次系统调用。如果使用sendfile,再次拷贝可以被避免:允许操作系统将数据直接从页缓存发送到网络上。所以在这个优化的路径中,只有最后一步将数据拷贝到网卡缓存中是需要的。


We expect a common use case to be multiple consumers on a topic. Using the zero-copy optimization above, data is copied into pagecache exactly once and reused on each consumption instead of being stored in memory and copied out to kernel space every time it is read. This allows messages to be consumed at a rate that approaches the limit of the network connection.
我们期望一个主题上有多个消费者是一种常见的应用场景。利用上述的零拷贝,数据只被拷贝到页缓存一次,然后就可以在每次消费时被重得利用,而不需要将数据存在内存中,然后在每次读的时候拷贝到内核空间中。这使得消息消费速度可以达到网络连接的速度。


This combination of pagecache and sendfile means that on a Kafka cluster where the consumers are mostly caught up you will see no read activity on the disks whatsoever as they will be serving data entirely from cache.
Pagecache 和 sendfile 的组合,意味你会看到kafka集群的消费者在任何磁盘上的活动,因为他们将完全从缓存中提取数据。


For more background on the sendfile and zero-copy support in Java, see this article.
有关在Java中Sendfile和zero-copy(零拷贝)的支持更多的背景,请参阅本文。

端到端的批量压缩

In some cases the bottleneck is actually not CPU or disk but network bandwidth. This is particularly true for a data pipeline that needs to send messages between data centers over a wide-area network. Of course the user can always compress its messages one at a time without any support needed from Kafka, but this can lead to very poor compression ratios as much of the redundancy is due to repetition between messages of the same type (e.g. field names in JSON or user agents in web logs or common string values). Efficient compression requires compressing multiple messages together rather than compressing each message individually.
在某些场景下,瓶颈实际上不是CPU和磁盘而是网络带宽。这在需要在多个数据中心之间发送消息的数据流水线的情况下更是如此。当然,用户可以不需要Kafka的支持而发送压缩后的消息,但是这会导致非常差的压缩率。高效的压缩需要将多个消息一块儿压缩而不是对每一个消息进行压缩。理想情况下,这可以在端到端的情况下实现,数据会先被压缩,然后被生产者发送,并且在服务端也是保持压缩状态,只有在最终的消费者端才会被解压缩。


Kafka supports this by allowing recursive message sets. A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be written in compressed form and will remain compressed in the log and will only be decompressed by the consumer.
Kafka通过递归消息集合来支持这一点。一批消息可以放在一起被压缩,然后以这种形式发给服务器。这批消息会被递送到相同的消费者那里,并且保持压缩的形式,直到它到达目的地。


Kafka supports GZIP and Snappy compression protocols. More details on compression can be found here.
Kafka支持GZIP和Snappy压缩协议,更多的细节可以在这里找到:https://cwiki.apache.org/confluence/display/KAFKA/Compression








发表于: 1年前   最后更新时间: 2月前   游览量:3315
上一条: kafka持久化
下一条: kafka生产者
评论…

  • 评论…
    • in this conversation
      提问