Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19092 ABFS phase 4: post Hadoop 3.4.0 features
  3. HADOOP-17915

ABFS AbfsDelegationTokenManager to generate canonicalServiceName if DT plugin doesn't

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.3.1
    • None
    • fs/azure

    Description

      Currently in AbfsDelegationTokenManager, any CustomDelegationTokenManager only provides a canonical service name if it
      implements BoundDTExtension and its getCanonicalServiceName() method.

      If this doesn't hold, AbfsDelegationTokenManager returns null, which causes AzureBlobFileSystem.getCanonicalServiceName()
      to call super.getCanonicalServiceName() *which resolves the IP address of the abfs endpoint, and then the FQDN of that IPAddr

      If a storage account is served over >1 endpoint, then the DT will only have a valid service name for one of the possible
      endpoints, so only work if all process get the same IP address when the look up the storage account address

      Fix

      1. DT plugins SHOULD generate the canonical service name
      2. If they don't, and DTs are enabled: AbfsDelegationTokenManager to create a default one
      3. and AzureBlobFileSystem.getCanonicalServiceName() MUST NOT call superclass.

      The default canonical service name of a store will be abfs:// + FsURI.getHost() + "/", so all containers in same storage account has the same service name

      abfs://bucket@stevel-testing.dfs.core.windows.net/path
      

      maps to

      abfs://stevel-testing.dfs.core.windows.net/ 
      

      This will mean that only one DT will be created per storage a/c; Applications will not need to list all containers which deployed processes will wish to interact with. Today's behaviour, based on rDNS lookup of storage account, is possibly slightly broader in that all storage accounts which map to the same IPAddr share a DT. The proposed scheme will still be much broader than that of S3A, where every bucket has its unique service name, so apps need to list all target filesystems at launch time (easy for MR, source of trouble in spark).

      Fix: straightforward.

      Test

      • no DTs: service name == null
      • DTs: will match proposed pattern, even if extension returns null.

      Attachments

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m