Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7798

Checkpointing failure caused by shared KerberosAuthenticator

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: security
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We have observed in our real cluster occasional checkpointing failure. The standby NameNode was not able to upload image to the active NameNode.

      After some digging, the root cause appears to be a shared KerberosAuthenticator in URLConnectionFactory. The authenticator is designed as a use-once instance, and is not stateless. It has attributes such as HttpURLConnection and URL. When multiple threads are calling URLConnectionFactory#openConnection(...), the shared authenticator is going to have race condition, resulting in a failed image uploading.

      Therefore for the first step, without breaking the current API, I propose we create a new KerberosAuthenticator instance for each connection, to make checkpointing work. We may consider making Authenticator design and implementation stateless afterwards, as ConnectionConfigurator does.

        Attachments

        1. HDFS-7798.01.patch
          1 kB
          Chengbing Liu

          Activity

            People

            • Assignee:
              chengbing.liu Chengbing Liu
              Reporter:
              chengbing.liu Chengbing Liu
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: