半兽人 发表于: 2015-03-10   最后更新时间: 2016-04-03  
  •   95 订阅,3753 游览



Some deployments will need to manage a data pipeline that spans multiple datacenters. Our recommended approach to this is to deploy a local Kafka cluster in each datacenter with application instances in each datacenter interacting only with their local cluster and mirroring between clusters (see the documentation on the mirror maker tool for how to do this).


This deployment pattern allows datacenters to act as independent entities and allows us to manage and tune inter-datacenter replication centrally. This allows each facility to stand alone and operate even if the inter-datacenter links are unavailable: when this occurs the mirroring falls behind until the link is restored at which time it catches up.


For applications that need a global view of all data you can use mirroring to provide clusters which have aggregate data mirrored from the local clusters in all datacenters. These aggregate clusters are used for reads by applications that require the full data set.


This is not the only possible deployment pattern. It is possible to read from or write to a remote Kafka cluster over the WAN, though obviously this will add whatever latency is required to get the cluster.


Kafka naturally batches data in both the producer and consumer so it can achieve high-throughput even over a high-latency connection. To allow this though it may be necessary to increase the TCP socket buffer sizes for the producer, consumer, and broker using thesocket.send.buffer.bytes and socket.receive.buffer.bytesconfigurations. The appropriate way to set this is documented here.

Kafka轻松的同时在消费者和生产者进行批处理数据。因此它能在高延迟连接下实现高吞吐量,为实现这一点,它通过配置生产者,消费者和broker的thesocket.send.buffer.bytes和socket.receive.buffer.bytes 以增加TCP套接字缓存的大小。适当的设置,设置方法文档在这里。

It is generally not advisable to run a single Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable.


发表于: 1年前   最后更新时间: 1年前   游览量:3753
上一条: kafka彻底删除topic
下一条: kafka硬件和操作系统

  • 评论…
    • in this conversation