Uploaded image for project: 'Qpid Dispatch'
  1. Qpid Dispatch
  2. DISPATCH-2173

30-Mesh Behaving Badly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Router Node
    • Labels:
      None

      Description

      While testing scale-up of full-mesh networks I encountered some Bad Behavior at 30 nodes. (435 connections.)

      On my first try, 15 of the routers died.

      On my second try, no nodes died – but the network never converged. It consumed all available CPU (32 cores) for three minutes, and the 30 routers printed a combined total of more than 1000 radius calculations to their logs by the time I became wrathful and cast them all into the Bitbucket of Woe.

       

      For reference, those radius calculations are how I decide that the network has converged – everybody has settled down and agreed on the topology and stopped talking about it. The last thing each router prints to its log is a radius calculation, and then it's done. This may happen multiple times for each router, but when the total number of such prints stops changing – the network has converged.

       

      For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this test exceeded that by 25x, I decided it was never going to quit.

       

      ...Now looking at the logs to see if I can figure out what was happening...

       

        Attachments

          Activity

            People

            • Assignee:
              michaelgoulish michael goulish
              Reporter:
              michaelgoulish michael goulish
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: