Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8150

server should not produce RAITE for already-opening region in 0.94 (because master retry logic handles this case poorly)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.94.6
    • Fix Version/s: 0.94.7
    • Component/s: None
    • Labels:
      None

      Description

      The code in 0.94 AM sets the region plan to point to the same server when retrying the assignment due to RAITE.

      LOG.warn("Failed assignment of "
                  + state.getRegion().getRegionNameAsString()
                  + " to "
                  + plan.getDestination()
                  + ", trying to assign "
                  + (regionAlreadyInTransitionException ? "to the same region server"
                      + " because of RegionAlreadyInTransitionException;" : "elsewhere instead; ")
                  + "retry=" + i, t);
      

      However, there's no wait time in the loop that retries the assignment, and if region is being marked failed to open, which may take some time, master can easily exhaust retries in less than half a second, going to the same server every time and getting the same exception (unfortunately I no longer have logs); then the region will be stuck.

      Do you think this is worth fixing (for example, by not using the same server here after a few retries, or by adding timed backoff in such cases)?

        Attachments

        1. HBASE-8150-v0-094.patch
          4 kB
          Sergey Shelukhin

          Activity

            People

            • Assignee:
              sershe Sergey Shelukhin
              Reporter:
              sershe Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: