Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1623 High Availability Framework for HDFS NN
  3. HDFS-2683

Authority-based lookup of proxy provider fails if path becomes canonicalized

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: HA branch (HDFS-1623)
    • Fix Version/s: HA branch (HDFS-1623)
    • Component/s: ha, hdfs-client
    • Labels:
      None

      Description

      When testing MapReduce on top of an HA cluster we ran into the following bug: some uses of HDFS paths go through a canonicalization step which ensures that the authority component in the URI includes a port number. So our hdfs://logical-nn-uri/foo path turned into hdfs://logical-nn-uri:8020/foo. The code which looks up the failover proxy provider then failed to find the associated config. We should only compare the hostname portion of the URI when looking up proxy providers.

      1. hdfs-2683.txt
        7 kB
        Todd Lipcon
      2. hdfs-2683.txt
        6 kB
        Todd Lipcon

        Activity

        Todd Lipcon created issue -
        Todd Lipcon made changes -
        Field Original Value New Value
        Attachment hdfs-2683.txt [ 12507459 ]
        Hide
        Eli Collins added a comment -

        Why not reject logical names with port numbers instead of warn? Patch lgtm.

        Show
        Eli Collins added a comment - Why not reject logical names with port numbers instead of warn? Patch lgtm.
        Hide
        Eli Collins added a comment -

        Ie remove the canonicalization step.

        Show
        Eli Collins added a comment - Ie remove the canonicalization step.
        Hide
        Todd Lipcon added a comment -

        The issue is that the canonicalization happens in lots of places at high levels - eg in FileContext's AbstractFileSystem code... it's used for cache keys, etc - we don't want to separately cache an FS instance for hdfs://nn/ and hdfs://nn:8020/

        Show
        Todd Lipcon added a comment - The issue is that the canonicalization happens in lots of places at high levels - eg in FileContext's AbstractFileSystem code... it's used for cache keys, etc - we don't want to separately cache an FS instance for hdfs://nn/ and hdfs://nn:8020/
        Hide
        Todd Lipcon added a comment -

        Though it does make sense to just throw an exception if a non-default port is used with a logical URI. I'll make that change.

        Show
        Todd Lipcon added a comment - Though it does make sense to just throw an exception if a non-default port is used with a logical URI. I'll make that change.
        Hide
        Todd Lipcon added a comment -

        Changed to throw an IOE when a non-default port is specified

        Show
        Todd Lipcon added a comment - Changed to throw an IOE when a non-default port is specified
        Todd Lipcon made changes -
        Attachment hdfs-2683.txt [ 12507461 ]
        Hide
        Eli Collins added a comment -

        +1

        Show
        Eli Collins added a comment - +1
        Hide
        Todd Lipcon added a comment -

        Thanks, committed to branch

        Show
        Todd Lipcon added a comment - Thanks, committed to branch
        Todd Lipcon made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s HA branch (HDFS-1623) [ 12317568 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development