Controller重启导致kafka元数据错乱?

发表于: 2021-03-18   最后更新时间: 2021-03-18 16:31:17   2,402 游览

我下午三点下线了controller节点1后,2被选为新的controller。但是发现有个topic A的partition1元数据信息不对,原先它的三个副本在2,3,4(2为leader)。但是看打印的日志中发现新controller 2 告诉6节点说topic A的partition1的副本是3,2,6,但并没有向2,3,4同步元数据的变更,导致6节点无故新增了一副本并向leader拉取一直失败,

以下是日志:各位大佬帮我看看这是为什么呢?感觉像是controller的下线导致元数据错乱了

6节点机器上的日志:

(两点的日志还正常)
[2021-03-16 14:12:39.979] TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', p
artitionIndex=1, controllerEpoch=29, leader=2, leaderEpoch=13, isr=[2, 3, 4], zkVersion=25, replicas=[2, 3, 4], offlineReplicas=[
]) for partition A-1 in response to UpdateMetadata request sent by controller 1 epoch 29 with correlation id 0 (state.change.logger)

(这是重启controller后的日志)
[2021-03-16 15:00:46.600] TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', p
artitionIndex=1, controllerEpoch=29, leader=2, leaderEpoch=13, isr=[2, 3, 4], zkVersion=25, replicas=[3, 2, 6], offlineReplicas=[
]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 0 (state.change.logger)

这个日志上还有一点疑惑的是controller epoch是30,但UpdateMetadataPartitionState里的controller epoch是29,和两点的一致。这块是不是有问题呢?

2节点机器上的日志:
[2021-03-16 15:07:26,164] WARN [ReplicaManager broker=2] Leader 2 failed to record follower 6's position 11679162 since the replica is not recognized to be one of the assigned replicas 3,4,2 for partition A-1. Empty records will be returned for this partition. (kafka.server.ReplicaManager)

发表于 2021-03-18
添加评论

kafka默认topic一旦创建,分区(包含副本)一旦创建,分配到的节点就不会改变。
所以有没有做过什么操作或测试,影响了这个。

  • 比如其他kafka错连了zk
  • broker.id重复了
-> 半兽人 3年前

并没有,当时只是重启了controller,然后看日志就发现了这个问题

半兽人 -> 3年前

神奇的6节点,超出我的认知了。
最后,我还是坚持上面的观点。

ps:建议你清理一下脏数据。

-> 半兽人 3年前

我刚又查了一下日志,发现三点重启controller节点之后进行replica state machine启动,发现启动完成的日志里副本是3,2,6.这个我是不是可以理解为zk里的元数据就是这。但不懂的就是三点之前副本是2,3,4的这个元数据怎么来的,而且集群跑了好长时间了,都没问题,从kafka-manager以及监控指标上看也没有未同步的副本等任何错误信息。另外两点重启了一台普通broker,集群也正常。只有三点重启controller才开始有异常。

同时问一下您指的清理脏数据怎么操作呢?指哪些脏数据

半兽人 -> 3年前

你用命令查看一下topic A的分区情况:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

看看。

该topic是否正常使用?

-> 半兽人 3年前

现在是正常的,因为是线上的,当时做了紧急恢复,目前元数据信息就是3,2,6

-> 半兽人 3年前

而且从监控上看,这个topic在三点重启controller后也在正常生产、消费,follower3,4正常同步。6副本无法同步,日志里报Leader 2 failed to record follower 6's position 11679162 since the replica is not recognized to be one of the assigned replicas 3,4,2 for partition A-1. Empty records will be returned for this partition. (kafka.server.ReplicaManager)。

是后来又重启了另外一台后,集群才开始出现未同步副本的

半兽人 -> 3年前

你是如何恢复的?
leader2忽略了follower6的,不是分区A-1的副本(3、4、2)中。
我好奇的是6节点,怎么会有不存在A-1的数据的,之前有吗。

-> 半兽人 3年前

是三点十几重启了3节点后,分区A-1的元数据开始变更了,这才知道6是follower。之前都没有,6节点上的目录都是三点那会创建的。这是6节点的日志:

[2021-03-16 15:00:46.829] INFO Created log for partition A-1 in /data4/kafka-data/A-1 with properties {compression.type -> producer, message.downconversion.enable -> true, min.insync.replicas -> 2, segment.jitter.ms -> 0, cleanup.policy -> [delete], flush.ms -> 9223372036854775807, segment.bytes -> 1073741824, retention.ms -> 172800000, flush.messages -> 9223372036854775807, message.format.version -> 2.0-IV1, file.delete.delay.ms -> 60000, max.compaction.lag.ms -> 9223372036854775807, max.message.bytes -> 4194304, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> true, retention.bytes -> -1, delete.retention.ms -> 86400000, segment.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760}. (kafka.log.LogManager)

(这是三点十几重启了3节点后的日志,可以看出leaderEpoch变了)

[2021-03-16 15:11:04.092]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 4], zkVersion=26, replicas=[3, 2, 6], offlineReplicas=[3]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 23 (state.change.logger)
[2021-03-16 15:11:10.331]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 4], zkVersion=26, replicas=[3, 2, 6], offlineReplicas=[3]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 29 (state.change.logger)
[2021-03-16 15:11:15.076]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 4, 6], zkVersion=27, replicas=[3, 2, 6], offlineReplicas=[3]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 34 (state.change.logger)
[2021-03-16 15:11:15.236]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 4, 6], zkVersion=27, replicas=[3, 2, 6], offlineReplicas=[3]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 35 (state.change.logger)
[2021-03-16 15:11:23.389]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 6], zkVersion=28, replicas=[3, 2, 6], offlineReplicas=[3]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 38 (state.change.logger)
[2021-03-16 15:11:32.727]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 6], zkVersion=28, replicas=[3, 2, 6], offlineReplicas=[3]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 40 (state.change.logger)
[2021-03-16 15:11:34.809]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 6], zkVersion=28, replicas=[3, 2, 6], offlineReplicas=[]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 41 (state.change.logger)
[2021-03-16 15:11:34.884]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 6], zkVersion=28, replicas=[3, 2, 6], offlineReplicas=[]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 42 (state.change.logger)
[2021-03-16 15:11:55.426]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=2, leaderEpoch=14, isr=[2, 6, 3], zkVersion=29, replicas=[3, 2, 6], offlineReplicas=[]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 46 (state.change.logger)
[2021-03-16 15:15:54.197]  TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', partitionIndex=1, controllerEpoch=30, leader=3, leaderEpoch=15, isr=[2, 6, 3], zkVersion=30, replicas=[3, 2, 6], offlineReplicas=[]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 50 (state.change.logger)
-> 半兽人 3年前

其实我现在疑惑的点在这呢:这两个时间点的leader epoch、zkVersion都一致,怎么唯独replicas信息变了?这个是不是有问题呢

[2021-03-16 14:12:39.979] metis-main2-sg2-kafka TRACE [Broker id=10040] Cached leader info UpdateMetadataPartitionState(topicName='metis_mobile_live_stat', p
artitionIndex=1, controllerEpoch=29, leader=10045, leaderEpoch=13, isr=[10045, 10042, 10043], zkVersion=25, replicas=[10045, 10042, 10043], offlineReplicas=[
]) for partition metis_mobile_live_stat-1 in response to UpdateMetadata request sent by controller 10041 epoch 29 with correlation id 0 (state.change.logger)
[2021-03-16 15:00:46.600] metis-main2-sg2-kafka TRACE [Broker id=10040] Cached leader info UpdateMetadataPartitionState(topicName='metis_mobile_live_stat', p
artitionIndex=1, controllerEpoch=29, leader=10045, leaderEpoch=13, isr=[10045, 10042, 10043], zkVersion=25, replicas=[10042, 10045, 10040], offlineReplicas=[
]) for partition metis_mobile_live_stat-1 in response to UpdateMetadata request sent by controller 10045 epoch 30 with correlation id 0 (state.change.logger)
-> 3年前

日志发错了:

[2021-03-16 14:12:39.979] TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', p
artitionIndex=1, controllerEpoch=29, leader=2, leaderEpoch=13, isr=[2, 3, 4], zkVersion=25, replicas=[2, 3, 4], offlineReplicas=[
]) for partition A-1 in response to UpdateMetadata request sent by controller 1 epoch 29 with correlation id 0 (state.change.logger)
[2021-03-16 15:00:46.600] TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', p
artitionIndex=1, controllerEpoch=29, leader=2, leaderEpoch=13, isr=[2, 3, 4], zkVersion=25, replicas=[3, 2, 6], offlineReplicas=[
]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 0 (state.change.logger)
半兽人 -> 3年前

分区副本创建后,是不会改变的,超出我的认知范围了。
1、你用我上面提供的获取topic分区的命令,自己做一个快照,然后过几天同样查看一下,是否有变化。
2、针对日志,其实都是info级别的,并没有error级别的,虽然有些打印不合理,但是如果只是在举证,并不影响topic的使用,可以忽视。

-> 半兽人 3年前

有了个新发现:监控上PartitionReassignmentRateAndTimeMs这个指标竟然有值。但是2成为controller的时候打印的日志里说没有正在均衡的partition(这个就是从/admin/reassign_partitions获取的):
[2021-03-16 15:00:46,246] INFO [Controller id=2] Partitions being reassigned: Map() (kafka.controller.KafkaController)

当时也没有调用API进行均衡,所以我怀疑kafka内部是不是有什么机制会导致呢?

现在查这个问题也是想弄清楚原因,要不然我后边都没法滚动升级了,其它机器都不敢重启了。
关于升级时需要逐台重启kafka机器有什么好的建议不?

半兽人 -> 3年前

kafka什么版本?如果是大版本升级,你需要看看升级注意事项。
如果你没用到新的功能(如:流、连接器)等,其实老版本比较稳定的(因为功能单一)。
升级文档好久没更新了,我抽时间更新一下:https://www.orchome.com/505

-> 半兽人 3年前

从2.1.1升级到2.6.1。官网给的就是让滚动升级,流程它每个版本说的基本一样。

你的答案

查看kafka相关的其他问题或提一个您自己的问题