先重启一下3节点,应该就能恢复,我是有错误信息才能分析。
ps:我怀疑是之前迁移数据,由于同步问题,这个节点已经脱离管控了
验证过了,确实将partition7转移到0和4上就OK了。但是我现在怎么找到问题呢?
我的3节点kafka的server.log上也没有报错。就是不能消费。
[2020-01-17 02:46:31,901] INFO [ProducerStateManager partition=defaultKJLog-7] Writing producer snapshot at offset 16220758524 (kafka.log.ProducerStateManager)
[2020-01-17 02:46:31,902] INFO [Log partition=defaultKJLog-7, dir=/data/kafka-logs] Rolled new log segment at offset 16220758524 in 2 ms. (kafka.log.Log)
[2020-01-17 02:52:42,610] INFO [GroupMetadataManager brokerId=3] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2020-01-17 02:53:11,877] INFO [Log partition=defaultKJLog-7, dir=/data/kafka-logs] Found deletable segments with base offsets [14059201260] due to retention time 604800000ms breach (kafka.log.Log)
[2020-01-17 02:53:11,877] INFO [Log partition=defaultKJLog-7, dir=/data/kafka-logs] Scheduling log segment [baseOffset 14059201260, size 1073723120] for deletion. (kafka.log.Log)
[2020-01-17 02:53:11,877] INFO [Log partition=defaultKJLog-7, dir=/data/kafka-logs] Incrementing log start offset to 14063800336 (kafka.log.Log)
[2020-01-17 02:53:11,930] INFO Cleared earliest 0 entries from epoch cache based on passed offset 14063800336 leaving 12 in EpochFile for partition defaultKJLog-7 (kafka.server.epoch.LeaderEpochFileCache)
[2020-01-17 02:54:11,877] INFO [Log partition=defaultKJLog-7, dir=/data/kafka-logs] Deleting segment 14059201260 (kafka.log.Log)
[2020-01-17 02:54:12,005] INFO Deleted log /data/kafka-logs/defaultKJLog-7/00000000014059201260.log.deleted. (kafka.log.LogSegment)
[2020-01-17 02:54:12,010] INFO Deleted offset index /data/kafka-logs/defaultKJLog-7/00000000014059201260.index.deleted. (kafka.log.LogSegment)
[2020-01-17 02:54:12,011] INFO Deleted time index /data/kafka-logs/defaultKJLog-7/00000000014059201260.timeindex.deleted. (kafka.log.LogSegment)
[2020-01-17 03:02:42,610] INFO [GroupMetadataManager brokerId=3] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2020-01-17 03:04:42,948] INFO [ProducerStateManager partition=defaultKJLog-7] Writing producer snapshot at offset 16225374460 (kafka.log.ProducerStateManager)
[2020-01-17 03:04:42,950] INFO [Log partition=defaultKJLog-7, dir=/data/kafka-logs] Rolled new log segment at offset 16225374460 in 2 ms. (kafka.log.Log)
可是我的partition1,5,9也有在3上面呀,只是brokerid=3的不是leader。如果是broker3节点有问题,我直接把这个节点停掉不行么?我停下来以后依旧有问题的。
你其他的topic都没有在3上面,只有这个有问题的分区在3节点上,如果上面测试已经没问题了,那就可以定位是3节点有问题了。
要定位3节点的问题,要看它的日志输出了。