Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When retry async send, possible NPE will occure:
java.lang.NullPointerException: null
at com.alibaba.rocketmq.client.latency.MQFaultStrategy.selectOneMessageQueue(MQFaultStrategy.java:91) ~[classes/:na]
at com.alibaba.rocketmq.client.impl.producer.DefaultMQProducerImpl.selectOneMessageQueue(DefaultMQProducerImpl.java:404) ~[classes/:na]
at com.alibaba.rocketmq.client.impl.MQClientAPIImpl.onExceptionImpl(MQClientAPIImpl.java:385) ~[classes/:na]
at com.alibaba.rocketmq.client.impl.MQClientAPIImpl.access$100(MQClientAPIImpl.java:72) ~[classes/:na]
at com.alibaba.rocketmq.client.impl.MQClientAPIImpl$1.operationComplete(MQClientAPIImpl.java:356) ~[classes/:na]
at com.alibaba.rocketmq.remoting.netty.ResponseFuture.executeInvokeCallback(ResponseFuture.java:58) ~[classes/:na]
at com.alibaba.rocketmq.remoting.netty.NettyRemotingAbstract.scanResponseTable(NettyRemotingAbstract.java:255) ~[classes/:na]
at com.alibaba.rocketmq.remoting.netty.NettyRemotingClient$5.run(NettyRemotingClient.java:165) [classes/:na]
at java.util.TimerThread.mainLoop(Timer.java:555) [na:1.7.0_80]
at java.util.TimerThread.run(Timer.java:505) [na:1.7.0_80]
The problem is : when selectOneMessageQueue in MQFaultStrategy, the topicPublishInfo which is just passed from sendKernelImpl, will be possiblly null, which causes NPE.
There are some places where sendKernelImpl wii have null TopicPublishInfo, for example :
private SendResult sendSelectImpl(// Message msg, // MessageQueueSelector selector, // Object arg, // final CommunicationMode communicationMode, // final SendCallback sendCallback, final long timeout// ) throws MQClientException, RemotingException, MQBrokerException, InterruptedException { this.makeSureStateOK(); Validators.checkMessage(msg, this.defaultMQProducer); TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic()); if (topicPublishInfo != null && topicPublishInfo.ok()) { MessageQueue mq = null; try { mq = selector.select(topicPublishInfo.getMessageQueueList(), msg, arg); } catch (Throwable e) { throw new MQClientException("select message queue throws exception.", e); } if (mq != null) { return this.sendKernelImpl(msg, mq, communicationMode, sendCallback, null, timeout);//here, the topicroutinfo is null, which has the risk of NPE } else { throw new MQClientException("select message queue return null.", null); } } throw new MQClientException("No route info for this topic, " + msg.getTopic(), null); }
Though I find out the bug in 3.5.8, the same issue exists in 4.0 since the relative code is the same
This NPE will make retry fail, and even ,onException callback fail to be called.
Attachments
Issue Links
- links to