Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-17901

Make HDFS operations resilient to namenode safemode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.4.0
    • 3.0.0
    • ambari-server
    • None

    Description

      HdfsResourceJar and HdfsResourceWebHDFS (WebHDFSUtil) are the classes that carry out the HDFS operations. All retry able operations (e.g. SETPERMISSION) should be guarded with retry logic that would retry the operation until a given timeout before giving up and bailing out.

      To determine which HDFS operations are retry able might be as easy as just looking the returned status/error code or the type of the exception (e.g. "RetriableException") though this needs to be verified if it's consistent with both the webhdfs and hdfsresource jar.

      This problem came up in https://issues.apache.org/jira/browse/AMBARI-17182 when starting all services after Enabling HA.
      Retry count and timeout should be clarified, as sometimes it may take a long time for namenode to exit safemode.

      Attachments

        Issue Links

          Activity

            People

              smagyari Magyari Sandor Szilard
              smagyari Magyari Sandor Szilard
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: