Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-4028

ServerClient getConnection is inefficient

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.5, 1.5.4, 1.6.4, 1.7.0
    • Fix Version/s: 1.6.5, 1.7.1, 1.8.0
    • Component/s: client
    • Labels:
      None
    • Environment:

      Large production environment.

      Description

      Several bulk load FATE operations were taking a long time, but actual bulk load statistics were quite good.

      The master bulk load threads were stuck in LoadFiles, specifically trying to get a connection to a random tablet server.

      The method to get a random connection looks at all the tablet server locks in zookeeper. On a large cluster (say, one with more than 1000 nodes), this is a lot of lookups in zookeeper. And this is done for every file to be bulk loaded.

      Normally, these lookups would be cached in zooCache, and the next look up would would all be from local memory. But the cache is a singleton in the master, so other activities, especially those that make RPC calls to zookeeper while holding the lock, will delay these lookups.

      The master has a list of the active tablet servers. It can pick one at random and create a new connection to it, using, potentially thousands of fewer calls to the zoocache for each file to be loaded.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ecn Eric C. Newton
                Reporter:
                ecn Eric C. Newton
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m