生产KAFKA集群遇到 这个BUG,现象是KAFKA集群正常。ISR也全,分区leader无法写入和消费,分区follower同步数据失败,最终异常节点内存消耗很高,重启这一个节点后,恢复正常。检查所有日志,只有follower上面有以下同步异常日志,其他的日志均未发现异常。
[2019-09-02 10:03:30,051] WARN [ReplicaFetcher replicaId=8, leaderId=2, fetcherId=0]
Error in response for fetch request (type=FetchRequest, replicaId=8, maxWait=500, minBytes=1, maxBytes=10485760,
fetchData={
PortrayAys-10=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[4]),
MMS-Metric-1=(offset=749142728, logStartOffset=749142728, maxBytes=1048576, currentLeaderEpoch=Optional[0]),
dialtest-2=(offset=15316723, logStartOffset=15316723, maxBytes=1048576, currentLeaderEpoch=Optional[4]),
DetectoCPU-3=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[2]),
OneMinBL-17=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[4]),
MetricRoute-0=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[2]),
MetricBaseData-25=(offset=649178, logStartOffset=649178, maxBytes=1048576, currentLeaderEpoch=Optional[0]),
Argus-RawData-8=(offset=20386963624, logStartOffset=20386403279, maxBytes=1048576, currentLeaderEpoch=Optional[0]),
MetricBaseData-15=(offset=648652, logStartOffset=648652, maxBytes=1048576, currentLeaderEpoch=Optional[0]),
PortrayAys-20=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[2]),
NewProxyBaseData-5=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[2]),
Detector-To-ES-8=(offset=31787842, logStartOffset=31787842, maxBytes=1048576, currentLeaderEpoch=Optional[4]),
AIAnomaly-2=(offset=5628, logStartOffset=5628, maxBytes=1048576, currentLeaderEpoch=Optional[2]),
AIPortray-22=(offset=0, logStartOffset=0, maxBytes=1048576, currentLeaderEpoch=Optional[4])},
isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=2013330601, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)
2.2.0, 2.1.1
参考:https://issues.apache.org/jira/browse/KAFKA-7697
我直接从2.1.0版本升级到2.2.0, 2.1.1这两个版本中的一个,这要升级会有什么影响吗?版本之间的兼容性怎么样?
兼容的,可参考
https://www.orchome.com/505
kafka_2.11-0.8.2.1
kafka_2.10-0.10.2.0
请问一下,这两个版本能兼容吗?我这边开始用的kafka_2.11-0.8.2.1,后来用kafka_2.10-0.10.2.0,但是消费端的依赖还是kafka_2.11-0.8.2.1,造成有时候offset提交失败,这种是兼容导致的吗?
客户端也升级一下吧
我觉得offset这种,是其他原因导致的失败,所以用排除法,先替换客户端看看还出现不
你好,像
__consumer_offsets
这个默认topic的一个分区里有如下日志,是不是可以认为offset=728
和729
这两条消息被正常消费了?谢谢!
不是,你注意日志级别。是warn吗
我是用这个命令解析出来的,严格来说,这不是日志,是
__consumer_offsets
的数据文件那怎么判断消费是否成功?现在服务端看不到报错,消费端消费后提交offset有报错
0.8之前,消费者的offset消费位置存储在zk中,0.9版本之后,消费者消费topic的offset位置默认放在了
consumer_offsets
主题中。消费者提交offset报错,提供下报错信息吧
<2019-09-12 18:12:22,542>[TRACE] kafka.consumer.ZookeeperConsumerConnector - [ems_SH-L08013-1568281779941-bdab7903], OffsetMap: Map([ems-otchs-topic,0] -> [OffsetMetadata[728,NO_METADATA],CommitTime -1,ExpirationTime -1]) <2019-09-12 18:12:22,543>[DEBUG] kafka.consumer.ZookeeperConsumerConnector - [ems_SH-L08013-1568281779941-bdab7903], Connected to offset manager 30.79.78.25:34548. <2019-09-12 18:12:22,543>[TRACE] kafka.network.RequestOrResponseSend - 75 bytes written. <2019-09-12 18:12:22,543>[ERROR] kafka.consumer.ZookeeperConsumerConnector - [ems_SH-L08013-1568281779941-bdab7903], Error while committing offsets. java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) at kafka.network.BlockingChannel.readCompletely(BlockingChannel.scala:129) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:120) at kafka.consumer.ZookeeperConsumerConnector.liftedTree2$1(ZookeeperConsumerConnector.scala:355) at kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:352) at kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:332) at kafka.javaapi.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:108) at com.paic.mercury.esb.kafka.KafkaConsumer.commitOffsets(KafkaConsumer.java:38) at com.paic.mercury.esb.kafka.consumer.KafkaMessageFetcherPool.commitOffsets(KafkaMessageFetcherPool.java:47) at com.paic.mercury.esb.kafka.consumer.KafkaMessageConsumer.doWork(KafkaMessageConsumer.java:94) at com.paic.mercury.esb.kafka.consumer.KafkaMessageConsumer.run(KafkaMessageConsumer.java:44) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
只有消费端有报错,server端没有。
kafka客户端和服务端的版本要一致
[ems,ems-topic-out,0]::[OffsetMetadata[1225,NO_METADATA],CommitTime 1568284274716,ExpirationTime 1568370674716] [ems,ems-topic-out,0]::NULL [ems,ems-topic-out,0]::[OffsetMetadata[1227,NO_METADATA],CommitTime 1568620352633,ExpirationTime 1568706752633]
这种NULL是提交失败还是提交时未带offset?
kafka版本目前已改成一样的版本了,后来发现ZK版本不一样,不知道会不会有影响,现在把消费端的ZK依赖改成跟server端一样再观察一下
嗯,版本问题。zk版本官方推荐3.4.9
你好,想请教一下这两个参数的作用是什么,看网上写的有点不太明白
offset.channel.backoff.ms 1000 重新连接offsets channel或者是重试失败的offset的fetch/commit请求的backoff时间
offsets.channel.socket.timeout.ms 10000 当读取offset的fetch/commit请求回应的socket 超时限制。此超时限制是被consumerMetadata请求用来请求offset管理
1、重新尝试连接的时间。
2、读取offset(拉取/提交)请求的超时时间。
如果有新问题就重新提交问题吧。
你的答案