Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-2063

ZKRMStateStore: Better handling of operation failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 2.4.0
    • None
    • resourcemanager
    • None

    Description

      Today, when a ZK operation fails, we handle connection-loss and operation-timeout the same way. This could definitely use some improvements:

      1. Add special handling for other error codes
      2. Connection-loss: Nullify zkClient, so a new connection is established
      3. Operation-timeout: Retry a few times with exponential delay?

      Attachments

        Issue Links

          Activity

            People

              kasha Karthik Kambatla
              kasha Karthik Kambatla
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: