Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3268

HoldTimeoutException is poorly propagated to clients

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • 1.6.1
    • None
    • client
    • None

    Description

      6 node cluster was running randomwalk when the MultiTable module failed. A BatchWriter was trying to add a new Mutation to a table in o.a.a.test.randomwalk.multitable.Write. The call to addMutations failed with a MutationsRejectedException with the information that there was an exception on the server.

      In actuality, the addition of this mutation triggered a flush and tried to ship it over to a tabletserver. The tabletserver hosting the tablet for that mutation was under load but still responsive. The hold time was exceeded for this tserver, but all the client sees is that there was some exception on this server.

      If the client actually knew that commits were being held, it could correctly back off (sleep) and retry the mutations since the last flush. Right now, they can't really do anything. Additionally, being unable to get the mutations that were buffered since the last flush is sub-par, but that can be worked around.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              elserj Josh Elser
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: