Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-4233

Windows C++ client does not reconnect when port is block then re-opened (e.g by firewall)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14, 0.16, Future
    • 0.28, qpid-cpp-0.34
    • C++ Client
    • None
    • Windows 7 32bit

    Description

      Looking at the qpid client code it doesn't seem to handle reconnection if the tcp port is blocked and then unblocked.

      We are using the Windows client with SSL and SASL having applied this patch to the 0.14 codebase: https://issues.apache.org/jira/browse/QPID-3914

      I then noticed this JIRA which sounded like the same issue that I was experiencing https://issues.apache.org/jira/browse/QPID-3759 and applied the patch but I am still getting the same issue.

      This is a sample of how we are creating the connection:
      m_pConnection = new qpid::messaging::Connection::Connection("amqp:ssl:<IP1>:<port1>,<IP2>:<port2>", ""); //(IP1:port1 and IP2:port2 are the same as we currently only have one server to connect to.
      m_pConnection->setOption("transport","ssl");
      m_pConnection->setOption("sasl_mechanisms", "EXTERNAL");
      m_pConnection->setOption("ssl-cert-filename", m_strSslCertFileName.c_str());
      m_pConnection->setOption("ssl-cert-filenamepass",m_strSslCertFileNamePassword.c_str());
      m_pConnection->setOption("host-cert-filename",m_strHostCertFileName.c_str());
      m_pConnection->setOption("heartbeat",30); //30 seconds, defaults to 0 which is no heartbeats
      m_pConnection->setOption("reconnect",true); //defaults to false
      m_pConnection->setOption("reconnect-interval",30); //30 seconds, default is 60 seconds
      m_pConnection->open();

      We then create 3 Sessions from the connection e.g.:
      m_SessionResponse = m_pConnection->createSession("Response");
      Using one of these sessions we create both a receiver and a sender.
      And we create a receiver for each of the other 2 sessions.

      I am expecting these receivers and sender to remain active for the lifetime of the program. We call receiver.fetch(Duration::SECOND * 10); in a loop on its own thread for each receiver.

      We start the application and it connects and runs ok. Then we block the port using windows firewall to simulate a network issue. At this point the .fetch(Duration::SECOND * 10); never returns from the call. And if you call the qpid::messaging::Sender::send function this returns with no exceptions thrown.

      I am not sure what exactly should happen in this scenario these are my thoughts please advise/correct:
      1) At worst the fetch should throw an exception so the calling application knows there is a problem.
      2) Possibly the send should also throw an exception, again so the calling application knows there is a problem.
      3) If "reconnect" is enabled then we should try to reconnect (to the same IP:port).
      4) If multiple IPs are specified we should failover to the next IP on reconnect.

      I can see in the qpid log that the heartbeats are timing out with this message "Traffic timeout". This could possibly be used to trigger the reconnect.
      Also I noticed when debugging that void TCPConnector::eof(AsynchIO&) is called much before the heartbeat timeout and maybe this could be used instead.

      Attachments

        Activity

          People

            Unassigned Unassigned
            richard.sheath Richard Sheath
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: