Some deployments will need to manage a data pipeline that spans multiple datacenters. Our recommended approach to this is to deploy a local Kafka cluster in each datacenter with application instances in each datacenter interacting only with their local cluster and mirroring between clusters (see the documentation on the mirror maker tool for how to do this).
有些部署需要去管理跨多个数据中心的数据通道。对此,我们推荐的方法是在每个数据中心部署一套本地kafka集群,每个数据中心的应用程序实例只会影响它们本地集群和集群之间的镜像(查看镜像制造工具的文档,是如何做到这一点的)。
This deployment pattern allows datacenters to act as independent entities and allows us to manage and tune inter-datacenter replication centrally. This allows each facility to stand alone and operate even if the inter-datacenter links are unavailable: when this occurs the mirroring falls behind until the link is restored at which time it catches up.
这种部署模式允许数据中心当做独立的实体,使我们整体去管理和调整跨数据中心之间的复制。这使得每个设置都能独立的运转和操作,即使数据中心之间的链路不可用:当这种情况发生时落后的镜像,直到链路恢复了,此时,落后的镜像同步最新的镜像。
For applications that need a global view of all data you can use mirroring to provide clusters which have aggregate data mirrored from the local clusters in all datacenters. These aggregate clusters are used for reads by applications that require the full data set.
对于应用程序,它需要读取完整的数据集。你可以使用所有数据中心里本地集群已经汇总的数据镜像提供到集群,这些汇总的集群被应用程序读写。
This is not the only possible deployment pattern. It is possible to read from or write to a remote Kafka cluster over the WAN, though obviously this will add whatever latency is required to get the cluster.
这不是唯一的部署模式,它可通过WAN直接读或写到远程kafka集群,虽然很明显这将增加延迟获取集群。
Kafka naturally batches data in both the producer and consumer so it can achieve high-throughput even over a high-latency connection. To allow this though it may be necessary to increase the TCP socket buffer sizes for the producer, consumer, and broker using thesocket.send.buffer.bytes and socket.receive.buffer.bytesconfigurations. The appropriate way to set this is documented here.
Kafka轻松的同时在消费者和生产者进行批处理数据。因此它能在高延迟连接下实现高吞吐量,为实现这一点,它通过配置生产者,消费者和broker的thesocket.send.buffer.bytes
和socket.receive.buffer.bytes
以增加TCP套接字缓存的大小。适当的设置,设置方法文档在这里。
It is generally not advisable to run a single Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.
通常我们的不建议运行在高延迟链路跨多个数据中心的单一kafka集群。这将产生很高复制延迟无论是kafka的写入还是zookeeper的写入。如果网络在本地之间不可用,除kafka和zookeeper的将依然在本地保持可用。