[2020-03-03 10:59:54,553] DEBUG [Controller 2]: Removing replica 1 from ISR 3,0 for partition [TMP_TO_LMIS_SHANGH,6]. (kafka.controller.KafkaController)
[2020-03-03 10:59:54,554] WARN [Controller 2]: Cannot remove replica 1 from ISR of partition [TMP_TO_LMIS_SHANGH,6] since it is not in the ISR. Leader = 3 ; ISR = List(3, 0) (
kafka.controller.KafkaController)[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = true) sent to broker 1is (kafka.controller.ControllerBrokerRequestBatch)
[2020-03-0310:59:54,554] DEBUG The stop replica request (delete = false) sent to broker 1is [Topic=TMP_TO_LMIS_SHANGH,Partition=6,Replica=1] (kafka.controller.ControllerBrok
erRequestBatch)[2020-03-0310:59:54,554] DEBUG The stop replica request (delete = true) sent to broker 1is (kafka.controller.ControllerBrokerRequestBatch)
[2020-03-0310:59:54,554] DEBUG The stop replica request (delete = false) sent to broker 1is [Topic=__consumer_offsets,Partition=17,Replica=1] (kafka.controller.ControllerBro
kerRequestBatch)[2020-03-0310:59:54,554] INFO [Replica state machine on controller 2]: Invoking state changeto OfflineReplica for replicas [Topic=__consumer_offsets,Partition=17,Replica=1]
(kafka.controller.ReplicaStateMachine)[2020-03-0310:59:54,554] DEBUG [Controller 2]: Removing replica 1from ISR 2,0forpartition [__consumer_offsets,17]. (kafka.controller.KafkaController)
zookeeper
你得先从节点日志查查。
我在27这台机器看到有很多
controller.log.2020-03-25-08
内容为[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 0 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 1 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 2 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 3 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 4 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:05:01,450] TRACE [Controller 4]: checking need to trigger partition rebalance (kafka.controller.KafkaController) [2020-03-25 08:05:01,454] DEBUG [Controller 4]: preferred replicas by broker Map(0 -> Map([gtp_data_log,1] -> List(0, 3, 4), [wlpt_to_mdb,2] -> List(0, 2, 3), [JLP_TO_LMIS_CHO NGQ1,3] -> List(0, 4, 1), [JLP_TO_LMIS_SHANGH,5] -> List(0, 1, 2), [mdb_Fd_Route_NM,4] -> List(0, 2, 3), [TMP_TO_LMIS_SD,1] -> List(0, 3, 4), [JLP_TO_LMIS_GD,0] -> List(0, 1, 2), [consumer_offsets,30] -> List(0, 2, 3), [JLP_TO_LMIS_HEN,7] -> List(0, 2, 3), [TMP_TO_LMIS_GD,7] -> List(0, 4, 1), [TMP_TO_LMIS_LZ,9] -> List(0, 2, 3), [gtp_data_log,6] -> List(0, 4, 1), [TMP_TO_LMIS_CHONGQ,4] -> List(0, 4, 1), [TMP_TO_LMIS_HAIN,7] -> List(0, 2, 3), [JTmdb_Fd_Good,2] -> List(0, 3, 4), [JLP_TO_LMIS_FJ,6] -> List(0, 2, 3), [sen demail,2] -> List(0, 4), [JLP_TO_LMIS_SHANGH,0] -> List(0, 4, 1), [mdb_Fd_Route_LZ,0] -> List(0, 2, 3), [consumer_offsets,10] -> List(0, 2, 3), [JLP_TO_LMIS_FJ1,2] -> List(0 , 3, 4), [mdb_Fd_Route_HAIN,2] -> List(0, 1, 2), [JLP_TO_LMIS_HEN,2] -> List(0, 1, 2), [TMP_TO_LMIS_FJ,6] -> List(0, 4, 1), [TMP_TO_LMIS_XM,0] -> List(0, 1, 2), [JLP_TO_LMIS_S D,2] -> List(0, 3, 4), [TMP_TO_LMIS_JIANGX,1] -> List(0, 1, 2), [__consumer_offsets,40] -> List(0, 4, 1), [TMP_TO_LMIS_BEIJ,4] -> List(0, 3, 4), [Parallel_Computing_Stock,0]
其他的机器上也有controller.log.2020-03-这样的日志。但不会每个小时都生成。内容也不像上面这样
[2020-03-03 10:57:28,711] INFO [Controller 1]: Controller startup complete (kafka.controller.KafkaController) [2020-03-03 10:57:31,354] DEBUG [Controller 1]: Controller resigning, broker id 1 (kafka.controller.KafkaController) [2020-03-03 10:57:31,354] DEBUG [Controller 1]: De-registering IsrChangeNotificationListener (kafka.controller.KafkaController) [2020-03-03 10:57:31,356] INFO [Partition state machine on Controller 1]: Stopped partition state machine (kafka.controller.PartitionStateMachine) [2020-03-03 10:57:31,357] INFO [Replica state machine on controller 1]: Stopped replica state machine (kafka.controller.ReplicaStateMachine) [2020-03-03 10:57:31,358] INFO [Controller 1]: Broker 1 resigned as the controller (kafka.controller.KafkaController) [2020-03-03 10:57:33,325] INFO [Controller 1]: Controller starting up (kafka.controller.KafkaController) [2020-03-03 10:57:33,342] INFO [Controller 1]: Controller startup complete (kafka.controller.KafkaController)
看起来比较正常,只有在踢出ISR中的副本时有的机器上有这样的日志
[2020-03-03 10:59:54,553] DEBUG [Controller 2]: Removing replica 1 from ISR 3,0 for partition [TMP_TO_LMIS_SHANGH,6]. (kafka.controller.KafkaController) [2020-03-03 10:59:54,554] WARN [Controller 2]: Cannot remove replica 1 from ISR of partition [TMP_TO_LMIS_SHANGH,6] since it is not in the ISR. Leader = 3 ; ISR = List(3, 0) ( kafka.controller.KafkaController)[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = true) sent to broker 1 is (kafka.controller.ControllerBrokerRequestBatch) [2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = false) sent to broker 1 is [Topic=TMP_TO_LMIS_SHANGH,Partition=6,Replica=1] (kafka.controller.ControllerBrok erRequestBatch)[2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = true) sent to broker 1 is (kafka.controller.ControllerBrokerRequestBatch) [2020-03-03 10:59:54,554] DEBUG The stop replica request (delete = false) sent to broker 1 is [Topic=__consumer_offsets,Partition=17,Replica=1] (kafka.controller.ControllerBro kerRequestBatch)[2020-03-03 10:59:54,554] INFO [Replica state machine on controller 2]: Invoking state change to OfflineReplica for replicas [Topic=__consumer_offsets,Partition=17,Replica=1] (kafka.controller.ReplicaStateMachine)[2020-03-03 10:59:54,554] DEBUG [Controller 2]: Removing replica 1 from ISR 2,0 for partition [__consumer_offsets,17]. (kafka.controller.KafkaController) zookeeper
/controller_epoch 记录了controller变化的次数,也就是切换了多少次,次数大了说明集群不稳定,controller总是重新选举
我有225。但不知道不稳定在那里
我在27这台机器看到有很多
controller.log.2020-03-25-08
内容为:[2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 0 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 1 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 2 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 3 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] DEBUG [Controller 4]: topics not in preferred replica Map() (kafka.controller.KafkaController) [2020-03-25 08:00:01,455] TRACE [Controller 4]: leader imbalance ratio for broker 4 is 0.000000 (kafka.controller.KafkaController) [2020-03-25 08:05:01,450] TRACE [Controller 4]: checking need to trigger partition rebalance (kafka.controller.KafkaController) [2020-03-25 08:05:01,454] DEBUG [Controller 4]: preferred replicas by broker Map(0 -> Map([gtp_data_log,1] -> List(0, 3, 4), [wlpt_to_mdb,2] -> List(0, 2, 3), [JLP_TO_LMIS_CHO NGQ1,3] -> List(0, 4, 1), [JLP_TO_LMIS_SHANGH,5] -> List(0, 1, 2), [mdb_Fd_Route_NM,4] -> List(0, 2, 3), [TMP_TO_LMIS_SD,1] -> List(0, 3, 4), [JLP_TO_LMIS_GD,0] -> List(0, 1, 2), [__consumer_offsets,30] -> List(0, 2, 3), [JLP_TO_LMIS_HEN,7] -> List(0, 2, 3), [TMP_TO_LMIS_GD,7] -> List(0, 4, 1), [TMP_TO_LMIS_LZ,9] -> List(0, 2, 3), [gtp_data_log,6] -> List(0, 4, 1), [TMP_TO_LMIS_CHONGQ,4] -> List(0, 4, 1), [TMP_TO_LMIS_HAIN,7] -> List(0, 2, 3), [JTmdb_Fd_Good,2] -> List(0, 3, 4), [JLP_TO_LMIS_FJ,6] -> List(0, 2, 3), [sen demail,2] -> List(0, 4), [JLP_TO_LMIS_SHANGH,0] -> List(0, 4, 1), [mdb_Fd_Route_LZ,0] -> List(0, 2, 3), [__consumer_offsets,10] -> List(0, 2, 3), [JLP_TO_LMIS_FJ1,2] -> List(0 , 3, 4), [mdb_Fd_Route_HAIN,2] -> List(0, 1, 2), [JLP_TO_LMIS_HEN,2] -> List(0, 1, 2), [TMP_TO_LMIS_FJ,6] -> List(0, 4, 1), [TMP_TO_LMIS_XM,0] -> List(0, 1, 2), [JLP_TO_LMIS_S D,2] -> List(0, 3, 4), [TMP_TO_LMIS_JIANGX,1] -> List(0, 1, 2), [__consumer_offsets,40] -> List(0, 4, 1), [TMP_TO_LMIS_BEIJ,4] -> List(0, 3, 4), [Parallel_Computing_Stock,0]
其他的机器上也有controller.log.2020-03-这样的日志。但不会每个小时都生成。内容也不像上面这样
我的Partition应该算是比较均衡
topic: mdb_Fd_Route_GD Partition: 0 Leader: 2 Replicas: 2,3,4 Isr: 2,4,3 Topic: mdb_Fd_Route_GD Partition: 1 Leader: 3 Replicas: 3,4,0 Isr: 4,3,0 Topic: mdb_Fd_Route_GD Partition: 2 Leader: 4 Replicas: 4,0,1 Isr: 4,1,0 Topic: mdb_Fd_Route_GD Partition: 3 Leader: 0 Replicas: 0,1,2 Isr: 2,1,0 Topic: mdb_Fd_Route_GD Partition: 4 Leader: 1 Replicas: 1,2,3 Isr: 2,3,1 Topic: mdb_Fd_Route_GD Partition: 5 Leader: 2 Replicas: 2,4,0 Isr: 2,4,0 Topic: mdb_Fd_Route_GD Partition: 6 Leader: 3 Replicas: 3,0,1 Isr: 3,1,0 Topic: mdb_Fd_Route_GD Partition: 7 Leader: 4 Replicas: 4,1,2 Isr: 4,2,1 Topic: mdb_Fd_Route_GD Partition: 8 Leader: 0 Replicas: 0,2,3 Isr: 2,3,0 Topic: mdb_Fd_Route_GD Partition: 9 Leader: 1 Replicas: 1,3,4 Isr: 4,3,1
大多数都是这样的
你的答案