Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There are multiple scripts and projects like pig, hive, elephantbird refer to HDFS URI as hdfs://namenodehostport/ or hdfs:/// . In federated namespace this causes problem because supported scheme for federation is viewfs:// . We will have to force all users to change their scripts/programs to be able to access federated cluster.

      It would be great if thee was a way to map viewfs scheme to hdfs scheme without exposing it to users. Opening this JIRA to get inputs from people who have thought about this in their clusters.

      In our clusters we ended up created another class HDFSCompatibleViewFileSystem which hijacks both hdfs.fs.impl and viewfs.fs.impl and passes down filesystem calls to ViewFileSystem. Is there any suggested approach other than this?

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          Lohit, I can see why just changing the default FS parameter isn't enough -if you want to support inputs across clusters then you may need explicit source and dest URLs. Though if your code is using hdfs://// that's a very much "default" reference that is deprecated in 2.x (what if the default FS isn't HDFS?)

          did you try just subverting `hdfs.fs.impl` to point to the viewfs implementation, or didn't that work due to viewfs being fussy about the URIs and/or you also wanting to use HDFS locally?

          Otherwise, the general best practise is "don't have hard code URIs in your scripts", though that is easier said than done if you have a lot of existing scripts.

          Can you stick up your new FS for us to look at?

          Show
          Steve Loughran added a comment - Lohit, I can see why just changing the default FS parameter isn't enough -if you want to support inputs across clusters then you may need explicit source and dest URLs. Though if your code is using hdfs://// that's a very much "default" reference that is deprecated in 2.x (what if the default FS isn't HDFS?) did you try just subverting `hdfs.fs.impl` to point to the viewfs implementation, or didn't that work due to viewfs being fussy about the URIs and/or you also wanting to use HDFS locally? Otherwise, the general best practise is "don't have hard code URIs in your scripts", though that is easier said than done if you have a lot of existing scripts. Can you stick up your new FS for us to look at?
          Hide
          Andrew Wang added a comment -

          Hey Lohit,

          Why not use the default FS config parameter (fs.default.name) instead of using fully-qualified URLs? Then you just need to change the client config when switching clusters.

          Show
          Andrew Wang added a comment - Hey Lohit, Why not use the default FS config parameter (fs.default.name) instead of using fully-qualified URLs? Then you just need to change the client config when switching clusters.
          Hide
          Omkar Vinit Joshi added a comment -

          I am not sure if it is related but take a loot at YARN-1203 (http / https) where now we are forcing AM to tell up front which scheme is supported and Yarn will use it for communication or else it will default to cluster default. I think something similar can be done here as well.

          Show
          Omkar Vinit Joshi added a comment - I am not sure if it is related but take a loot at YARN-1203 (http / https) where now we are forcing AM to tell up front which scheme is supported and Yarn will use it for communication or else it will default to cluster default. I think something similar can be done here as well.

            People

            • Assignee:
              Unassigned
              Reporter:
              Lohit Vijayarenu
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:

                Development