Uploaded image for project: 'Qpid Dispatch'
  1. Qpid Dispatch
  2. DISPATCH-1019

Messaging instability in networks with many clients / addresses.



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: Routing Engine
    • Labels:


      After DISPATCH-966 has been fixed I am still experiencing problems in a network with many clients / addresses. I am running a three node fully connected mesh of dispatch routers with 10000 attached clients all with two address messaging at around 100 msg/sec.

      In the logs I am seeing the following errors:

      2018-05-29 14:31:05.145732 +0000 ERROR (error) Invalid message: Insufficient Data to Determine Tag
      2018-05-29 14:31:05.145748 +0000 ERROR (error) Invalid message: Can't convert message field body

      Which, in turn, lead to python errors like:

      2018-05-29 14:31:05.145971 +0000 ROUTER_MA (trace) RCVD: MAU(id=None pv=1 area=0 mobile_seq=0)
      2018-05-29 14:31:05.146130 +0000 ROUTER (error) Exception in control message processing
      {{ Traceback (most recent call last):}}
      {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, in handleControlMessage}}
      {{ self.mobile_address_engine.handle_mau(msg, now)}}
      {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, in handle_mau}}
      {{ node = self.node_tracker.router_node(msg.id)}}
      {{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in router_node}}
      {{ return self.nodes[node_id]}}
      {{ KeyError: None}}
      2018-05-29 14:31:05.146175 +0000 ROUTER (error) Control message error: opcode=MAU body=None

      I have tracked down the cause of the "Insufficient Data to Determine Tag" message to the following: During the call to qd_parse of the MAU message the qd_iterator_t reaches the end of the buffer list before it should. Specifically the call to qd_iterator_advance here https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151, "fails" to move forward a certain number of bytes (e.g. 31) even though the iterator->view_pointer->remaining value has plenty of bytes left (e.g. 84802). The "fail" is because we reach the end of the buffer list before we should (here https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).

      What I have not been able to figure out yet is why this happens because it is not consistent. Many large MAU message are parsed correctly only sometimes not. I am able to reproduce these errors every time I run my tests. There may be a time component involved because the more logging I to the router code, the less often the errors seem to occur.




            • Assignee:
              mmeulemans Marcel Meulemans
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: