[root@prd-kafka-01 opt]# /usr/hdp/2.6.4.0-91/kafka/bin/kafka-reassign-partitions.sh --zookeeper 172.19.38.217:2181 --reassignment-json-file expand-cluster-ods-be-reassignment.json --verify
Status of partition reassignment:
Reassignment of partition [ods_be_monitor_item_detail,8] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,9] completed successfully
Reassignment of partition [ods_be_monitor_item_detail,6] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,14] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,5] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,11] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,13] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,3] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,2] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,4] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,1] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,10] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,12] is still in progress
Reassignment of partition [ods_be_monitor_item_detail,0] completed successfully
Reassignment of partition [ods_be_monitor_item_detail,7] is still in progress
[2020-08-1417:39:01,727] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id572from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-12 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1417:39:01,877] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id578from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-2 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1417:39:01,928] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id583from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-7 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1417:39:02,079] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id589from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-12 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1417:39:02,129] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id594from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-2 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1417:40:06,193] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id601from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-7 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1417:40:06,295] INFO [KafkaApi-1009] Closing connection due toerror during produce request with correlation id608from client id producer-1with ack=0
Topic and partition to exceptions: pshop_sell_status_topic-12 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
[2020-08-1418:21:58,656] ERROR [KafkaApi-1009] Error when handling request {controller_id=1005,controller_epoch=22,partition_states=[{topic=monitor_shop_selltime_status_v3,partition=2,controller_epoch=22,leader=1005,leader_epoch=2,isr=[1005,1006,1008],zk_version=13,replicas=[1005,1006,1008,1010,1009]}],live_leaders=[{id=1005,host=kafka1.sh-internal.com,port=6667}]} (kafka.server.KafkaApis)
java.io.IOException: Malformed line inoffset checkpoint file: pshop sell status topic 70'
at kafka.server.OffsetCheckpoint.malformedLineException$1(OffsetCheckpoint.scala:81)
at kafka.server.OffsetCheckpoint.liftedTree2$1(OffsetCheckpoint.scala:104)
现在已经持续一天了,还是没有完成的状态
原来有个哥们花了3天。。。
源源不断涌入的新消息,和迁移的速度的时间差,来决定了你的迁移的时间。
中间会不会出现中断的问题,我都2天了还没好,有没有什么方法能判断需要多久的
[root@prd-kafka-01 opt]# /usr/hdp/2.6.4.0-91/kafka/bin/kafka-topics.sh --describe --zookeeper 172.19.38.217:2181 --topic ods_be_monitor_item_detail Topic:ods_be_monitor_item_detail PartitionCount:15 ReplicationFactor:3 Configs:retention.ms=172800000 Topic: ods_be_monitor_item_detail Partition: 0 Leader: 1006 Replicas: 1006,1005,1008 Isr: 1006,1005,1008 Topic: ods_be_monitor_item_detail Partition: 1 Leader: 1008 Replicas: 1008,1006,1009,1005 Isr: 1008,1006,1005 Topic: ods_be_monitor_item_detail Partition: 2 Leader: 1005 Replicas: 1005,1006,1008,1010,1009 Isr: 1005,1008,1006 Topic: ods_be_monitor_item_detail Partition: 3 Leader: 1006 Replicas: 1005,1006,1008,1010,1009 Isr: 1006,1008,1005 Topic: ods_be_monitor_item_detail Partition: 4 Leader: 1008 Replicas: 1005,1010,1006,1008 Isr: 1008,1006,1005 Topic: ods_be_monitor_item_detail Partition: 5 Leader: 1005 Replicas: 1006,1008,1009,1005 Isr: 1005,1008,1006 Topic: ods_be_monitor_item_detail Partition: 6 Leader: 1006 Replicas: 1005,1006,1008,1010,1009 Isr: 1006,1008,1005 Topic: ods_be_monitor_item_detail Partition: 7 Leader: 1008 Replicas: 1005,1006,1008,1010,1009 Isr: 1008,1006,1005 Topic: ods_be_monitor_item_detail Partition: 8 Leader: 1005 Replicas: 1010,1005,1006,1008 Isr: 1005,1008,1006 Topic: ods_be_monitor_item_detail Partition: 9 Leader: 1006 Replicas: 1005,1006,1008 Isr: 1006,1005,1008 Topic: ods_be_monitor_item_detail Partition: 10 Leader: 1008 Replicas: 1005,1006,1008,1010,1009 Isr: 1008,1006,1005 Topic: ods_be_monitor_item_detail Partition: 11 Leader: 1005 Replicas: 1008,1010,1005,1006 Isr: 1005,1008,1006 Topic: ods_be_monitor_item_detail Partition: 12 Leader: 1006 Replicas: 1009,1005,1006,1008 Isr: 1006,1008,1005 Topic: ods_be_monitor_item_detail Partition: 13 Leader: 1008 Replicas: 1010,1006,1008,1005 Isr: 1008,1006,1005 Topic: ods_be_monitor_item_detail Partition: 14 Leader: 1005 Replicas: 1005,1008,1009,1006 Isr: 1005,1008,1006 这是正常的吗
[root@prd-kafka-01 opt]# /usr/hdp/2.6.4.0-91/kafka/bin/kafka-reassign-partitions.sh --zookeeper 172.19.38.217:2181 --reassignment-json-file expand-cluster-ods-be-reassignment.json --verify Status of partition reassignment: Reassignment of partition [ods_be_monitor_item_detail,8] is still in progress Reassignment of partition [ods_be_monitor_item_detail,9] completed successfully Reassignment of partition [ods_be_monitor_item_detail,6] is still in progress Reassignment of partition [ods_be_monitor_item_detail,14] is still in progress Reassignment of partition [ods_be_monitor_item_detail,5] is still in progress Reassignment of partition [ods_be_monitor_item_detail,11] is still in progress Reassignment of partition [ods_be_monitor_item_detail,13] is still in progress Reassignment of partition [ods_be_monitor_item_detail,3] is still in progress Reassignment of partition [ods_be_monitor_item_detail,2] is still in progress Reassignment of partition [ods_be_monitor_item_detail,4] is still in progress Reassignment of partition [ods_be_monitor_item_detail,1] is still in progress Reassignment of partition [ods_be_monitor_item_detail,10] is still in progress Reassignment of partition [ods_be_monitor_item_detail,12] is still in progress Reassignment of partition [ods_be_monitor_item_detail,0] completed successfully Reassignment of partition [ods_be_monitor_item_detail,7] is still in progress
你为什么要重平衡呢?是不是你的集群存在什么问题?这样15个分区,每个160G数据,你重平衡耗时耗力太大,严重影响kafka的吞吐量和效率吧。
扩容加了2个节点,所以要平衡
不会,关注一下分区同步的offset,而且你有3个副本,这个量级确实很庞大
好的,我在观察一两天看看,十分感谢
您好,我想问下,分区同步offset怎么观察
我这个已经快5天了还没好,怕出问题
现在写入数据一直报错NotLeaderForPartitionError
[2020-08-14 17:39:01,727] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 572 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-12 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis) [2020-08-14 17:39:01,877] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 578 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-2 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis) [2020-08-14 17:39:01,928] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 583 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-7 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis) [2020-08-14 17:39:02,079] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 589 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-12 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis) [2020-08-14 17:39:02,129] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 594 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-2 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis) [2020-08-14 17:40:06,193] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 601 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-7 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis) [2020-08-14 17:40:06,295] INFO [KafkaApi-1009] Closing connection due to error during produce request with correlation id 608 from client id producer-1 with ack=0 Topic and partition to exceptions: pshop_sell_status_topic-12 -> org.apache.kafka.common.errors.NotLeaderForPartitionException (kafka.server.KafkaApis)
INFO日志,可以忽视额。
可以去机器上看看物理文件,同步的offset位置(手机上打字)
[2020-08-14 18:21:58,656] ERROR [KafkaApi-1009] Error when handling request {controller_id=1005,controller_epoch=22,partition_states=[{topic=monitor_shop_selltime_status_v3,partition=2,controller_epoch=22,leader=1005,leader_epoch=2,isr=[1005,1006,1008],zk_version=13,replicas=[1005,1006,1008,1010,1009]}],live_leaders=[{id=1005,host=kafka1.sh-internal.com,port=6667}]} (kafka.server.KafkaApis) java.io.IOException: Malformed line in offset checkpoint file: pshop sell status topic 7 0' at kafka.server.OffsetCheckpoint.malformedLineException$1(OffsetCheckpoint.scala:81) at kafka.server.OffsetCheckpoint.liftedTree2$1(OffsetCheckpoint.scala:104)
现在有这个报错,我把之前分区从分配的任务删了,还是写不进去数据
你动了迁移吗?
没有,但是别人给我创建了一个这种topic
pshop sell status topic 中间有空格
recovery-point-offset-checkpoint replication-offset-checkpoint 这两个文件一直会有pshop sell status topic 这个信息 删了文件重启也不行,现在该怎么解决,线上的很着急,麻烦回复下
是格式问题引起的,有空格的主题怎么会创建成功呢。
kafka什么版本,不清楚你存储的offset是在zk还在kafka自己的
__consumer_offsets
中。要从里面删除掉。
[root@prd-kafka-01 kafka]# find ./libs/ -name *kafka_* | head -1 | grep -o '\kafka[^\n]*'
kafka_2.11-0.10.1.2.6.4.0-91.jar
我现在该怎么操作,好多方法用了也解决不了
能给一个联系方式吗
我给你个建议。你的是生产的kafka,你需要手动清理掉问题的topic,但是在生产上操作是个比较高危的动作,而且你还在迁移数据中。
1、搭建一个新的kafka集群,将业务引导新的kafka上。
2、业务引走之后,你就可以安心修复旧的kafka集群了。
迁移数据我已经在zk上面给停止了。
之前是五个节点,现在其中一个节点还有问题,一直在修复数据,端口不监听
[2020-08-15 00:17:37,087] INFO Recovering unflushed segment 307489432 in log raw_shop_business_detail-2. (kafka.log.Log) [2020-08-15 00:17:37,709] INFO Recovering unflushed segment 76574100245 in log ods_eleme_monitor_item_detail-7. (kafka.log.Log) [2020-08-15 00:17:39,330] INFO Recovering unflushed segment 76634617423 in log ods_eleme_monitor_item_detail-11. (kafka.log.Log) [2020-08-15 00:17:39,730] INFO Recovering unflushed segment 76885515852 in log ods_eleme_monitor_item_detail-0. (kafka.log.Log) [2020-08-15 00:17:44,113] INFO Recovering unflushed segment 307582694 in log raw_shop_business_detail-2. (kafka.log.Log) [2020-08-15 00:17:46,124] INFO Recovering unflushed segment 76635155355 in log ods_eleme_monitor_item_detail-11. (kafka.log.Log) [2020-08-15 00:17:48,664] INFO Recovering unflushed segment 76574648701 in log ods_eleme_monitor_item_detail-7. (kafka.log.Log) [2020-08-15 00:17:48,928] INFO Recovering unflushed segment 76886065583 in log ods_eleme_monitor_item_detail-0. (kafka.log.Log) [2020-08-15 00:17:50,801] INFO Recovering unflushed segment 307675968 in log raw_shop_business_detail-2. (kafka.log.Log) [2020-08-15 00:17:52,978] INFO Recovering unflushed segment 76635693417 in log ods_eleme_monitor_item_detail-11. (kafka.log.Log) [2020-08-15 00:17:57,688] INFO Recovering unflushed segment 76886609961 in log ods_eleme_monitor_item_detail-0. (kafka.log.Log) [2020-08-15 00:17:57,825] INFO Recovering unflushed segment 307770638 in log raw_shop_business_detail-2. (kafka.log.Log) [2020-08-15 00:17:59,792] INFO Recovering unflushed segment 76636232706 in log ods_eleme_monitor_item_detail-11. (kafka.log.Log) [2020-08-15 00:17:59,811] INFO Recovering unflushed segment 76575198496 in log ods_eleme_monitor_item_detail-7. (kafka.log.Log)
你把那个有问题的节点的
log.dir
指向一下新的目录(也可保留老数据),让这台broker重新同步数据吧。如果的topic的副本都大于1的话,可以暴力一点。
从新搭建了一个集群
你的答案