kafka集群部分节点宕机,生产者能继续消费,消费者不能正常消费

15:14:49,487 DEBUG AbstractCoordinator:561 - Sending GroupCoordinator request for group test_1 to broker 192.168.1.84:9092 (id: 0 rack: null)
15:14:49,489 DEBUG AbstractCoordinator:572 - Received GroupCoordinator response ClientResponse(receivedTimeMs=1541142889489, latencyMs=1, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=239,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group test_1
15:14:49,489 DEBUG AbstractCoordinator:594 - Group coordinator lookup for group test_1 failed: The group coordinator is not available.
15:14:49,489 DEBUG AbstractCoordinator:215 - Coordinator discovery failed for group test_1, refreshing metadata

详细问题介绍如下:本人部署了 3节点kafka(91、92、93) ,使用zk管理,验证kafka容错性,91 kafka进程kill,kafka 正常生产 消费; 继续kill 92 kafka进程 ,kafka正常生产但是不能消费。消费客户端出现如下错误:


15:14:49,487 DEBUG AbstractCoordinator:561 - Sending GroupCoordinator request for group test_1 to broker 192.168.1.84:9092 (id: 0 rack: null)
15:14:49,489 DEBUG AbstractCoordinator:572 - Received GroupCoordinator response ClientResponse(receivedTimeMs=1541142889489, latencyMs=1, disconnected=false, requestHeader={api_key=10,api_version=0,correlation_id=239,client_id=consumer-1}, responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}}) for group test_1
15:14:49,489 DEBUG AbstractCoordinator:594 - Group coordinator lookup for group test_1 failed: The group coordinator is not available.
15:14:49,489 DEBUG AbstractCoordinator:215 - Coordinator discovery failed for group test_1, refreshing metadata





发表于: 17天前   最后更新时间: 17天前   游览量:340
上一条: 到头了!
下一条: 已经是最后了!

评论…


  • 你的副本有几个?
    • 按kafka本身的高可用性和容错性,3节点我的topic 3个副本 ,不应该随便宕其中2个节点 集群理应正常工作和消费啊,大神求指教?哪里配置有问题
        • [skyon10@server6 kafka_2.11-2.0.0]$ bin/kafka-topics.sh --describe --zookeeper 192.168.1.84:2181,192.168.1.85:2181,192.168.1.86:2181 --topic spdb-cal
          Topic:spdb-cal  PartitionCount:3        ReplicationFactor:3     Configs:
                  Topic: spdb-cal Partition: 0    Leader: 0       Replicas: 0,1,2 Isr: 0
                  Topic: spdb-cal Partition: 1    Leader: 0       Replicas: 1,2,0 Isr: 0
                  Topic: spdb-cal Partition: 2    Leader: 0       Replicas: 2,0,1 Isr: 0
          [skyon10@server6 kafka_2.11-2.0.0]$ bin/kafka-topics.sh --describe --zookeeper 192.168.1.84:2181,192.168.1.85:2181,192.168.1.86:2181 --topic spdb-cal
          Topic:spdb-cal  PartitionCount:3        ReplicationFactor:3     Configs:
                  Topic: spdb-cal Partition: 0    Leader: 0       Replicas: 0,1,2 Isr: 0,1,2
                  Topic: spdb-cal Partition: 1    Leader: 0       Replicas: 1,2,0 Isr: 0,1,2
                  Topic: spdb-cal Partition: 2    Leader: 0       Replicas: 2,0,1 Isr: 0,1,2
            • # The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
              # For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
              offsets.topic.replication.factor=3
              transaction.state.log.replication.factor=3
              transaction.state.log.min.isr=3

              我修改了还是不行
                • 我新建了个topic,在server.properties 里面 修改了副本数等于3,并重启了kafka集群
                  offsets.topic.replication.factor=3
                  transaction.state.log.replication.factor=3
                  transaction.state.log.min.isr=3

                  还是报如下错误,能正常生产不能消费:

                  09:25:34,252 DEBUG AbstractCoordinator:651 - [Consumer clientId=consumer-1, groupId=test_3] Sending FindCoordinator request to broker 192.168.1.86:9092 (id: 2 rack: null)
                  09:25:34,256 DEBUG AbstractCoordinator:662 - [Consumer clientId=consumer-1, groupId=test_3] Received FindCoordinator response ClientResponse(receivedTimeMs=1541381134256, latencyMs=3, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=2, clientId=consumer-1, correlationId=372), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)))
                  09:25:34,257 DEBUG AbstractCoordinator:685 - [Consumer clientId=consumer-1, groupId=test_3] Group coordinator lookup failed: The coordinator is not available.
                  09:25:34,258 DEBUG AbstractCoordinator:242 - [Consumer clientId=consumer-1, groupId=test_3] Coordinator discovery failed, refreshing metadata
                    • [skyon10@server5 kafka_2.11-2.0.0]$ bin/kafka-topics.sh --describe --zookeeper 192.168.1.84:2181,192.168.1.85:2181,192.168.1.86:2181 --topic spdb-test
                      Topic:spdb-test PartitionCount:3        ReplicationFactor:3     Configs:
                              Topic: spdb-test        Partition: 0    Leader: 2       Replicas: 1,2,0 Isr: 2,1
                              Topic: spdb-test        Partition: 1    Leader: 2       Replicas: 2,0,1 Isr: 2,1
                              Topic: spdb-test        Partition: 2    Leader: 1       Replicas: 0,1,2 Isr: 2,1
                        • 目前新建的topic 还是出现以前一样的问题,我3个节点 ,topic 是三备份三分区的。做可用性测试的时候,broker 1、broker2 其中一个或者2个都 kafka进程kill掉,kafka是能正常消费的,但是borker 0 kafka进程kill了,kafka就不能正常消费了,java客户端就会出现如下错误:
                          09:25:34,252 DEBUG AbstractCoordinator:651 - [Consumer clientId=consumer-1, groupId=test_3] Sending FindCoordinator request to broker 192.168.1.86:9092 (id: 2 rack: null)
                          09:25:34,256 DEBUG AbstractCoordinator:662 - [Consumer clientId=consumer-1, groupId=test_3] Received FindCoordinator response ClientResponse(receivedTimeMs=1541381134256, latencyMs=3, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=2, clientId=consumer-1, correlationId=372), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)))
                          09:25:34,257 DEBUG AbstractCoordinator:685 - [Consumer clientId=consumer-1, groupId=test_3] Group coordinator lookup failed: The coordinator is not available.
                          09:25:34,258 DEBUG AbstractCoordinator:242 - [Consumer clientId=consumer-1, groupId=test_3] Coordinator discovery failed, refreshing metadata
                            • 大神,不好意思,看如下截图是不是有问题:
                              bin/kafka-topics.sh --describe --zookeeper 192.168.1.84:2181,192.168.1.85:2181,192.168.1.86:2181|grep consumer_offsets
                              Topic:__consumer_offsets        PartitionCount:50       ReplicationFactor:1     Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
                                      Topic: __consumer_offsets       Partition: 0    Leader: 2       Replicas: 2     Isr: 2
                                      Topic: __consumer_offsets       Partition: 1    Leader: -1      Replicas: 0     Isr: 0
                                      Topic: __consumer_offsets       Partition: 2    Leader: 1       Replicas: 1     Isr: 1
                                      Topic: __consumer_offsets       Partition: 3    Leader: 2       Replicas: 2     Isr: 2
                                      Topic: __consumer_offsets       Partition: 4    Leader: -1      Replicas: 0     Isr: 0
                                      Topic: __consumer_offsets       Partition: 5    Leader: 1       Replicas: 1     Isr: 1
                                      Topic: __consumer_offsets       Partition: 6    Leader: 2       Replicas: 2     Isr: 2
                                      Topic: __consumer_offsets       Partition: 7    Leader: -1      Replicas: 0     Isr: 0
                                      Topic: __consumer_offsets       Partition: 8    Leader: 1       Replicas: 1     Isr: 1
                                      Topic: __consumer_offsets       Partition: 9    Leader: 2       Replicas: 2     Isr: 2
                                      Topic: __consumer_offsets       Partition: 10   Leader: -1      Replicas: 0     Isr: 0
                                      ...........
                                • 你现在这种,只能手动。
                                  新建的集群,不能有任何生产和消费的情况下,默认参数调好后,默认创建的__consumer_offsets就是你默认值了。
                                    • 刚刚按照你的方式手动调整了,还是不能正常消费,报错还是和上述一样。

                                      Topic: __consumer_offsets       Partition: 0    Leader: 2       Replicas: 1,2   Isr: 2,1
                                              Topic: __consumer_offsets       Partition: 1    Leader: -1      Replicas: 1,2,0 Isr: 0
                                              Topic: __consumer_offsets       Partition: 2    Leader: 1       Replicas: 1,2   Isr: 1,2
                                              Topic: __consumer_offsets       Partition: 3    Leader: 2       Replicas: 1,2   Isr: 2,1
                                              Topic: __consumer_offsets       Partition: 4    Leader: -1      Replicas: 1,2,0 Isr: 0
                                              Topic: __consumer_offsets       Partition: 5    Leader: 1       Replicas: 1,2   Isr: 1,2
                                              Topic: __consumer_offsets       Partition: 6    Leader: 2       Replicas: 1,2   Isr: 2,1
                                              Topic: __consumer_offsets       Partition: 7    Leader: -1      Replicas: 1,2,0 Isr: 0
                                              Topic: __consumer_offsets       Partition: 8    Leader: 1       Replicas: 1,2   Isr: 1,2
                                              Topic: __consumer_offsets       Partition: 9    Leader: 2       Replicas: 1,2   Isr: 2,1
                                              Topic: __consumer_offsets       Partition: 10   Leader: -1      Replicas: 1,2,0 Isr: 0
                                        • 大神,我解决了,我把kill的那个节点重启了,手动修正kafka-reassign-partitions --verify 就全部完成了。leader 全部正常了。我还有个问题:我如何在新建集群或者新建topic的时候就把50 个__consumer_offsets 复制到每个节点上,而不是均匀分布在每个节点上?以免还出现现在这种情况?
                                          • 评论…
                                            • in this conversation