Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13074 Ozone File System
  3. HDFS-13108

Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • HDFS-7240
    • HDFS-7240
    • ozone
    • None
    • Reviewed

    Description

      A. Current state
       
      1. The datanode host / bucket /volume should be defined in the defaultFS (eg.  o3://datanode:9864/test/bucket1)
      2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the keys from the bucket1)

      It works very well, but there are some limitations.

      B. Problem one 

      The current code doesn't support fully qualified locations. For example 'dfs -ls o3://datanode:9864/test/bucket1/dir1' is not working.

      C.) Problem two

      I tried to fix the previous problem, but it's not trivial. The biggest problem is that there is a Path.makeQualified call which could transform unqualified url to qualified url. This is part of the Path.java so it's common for all the Hadoop file systems.

      In the current implementations it qualifies an url with keeping the schema (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use the relative path as the end of the qualified url. For example:

      makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will return o3://datanode:9864/dir1/file which is obviously wrong (the good would be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround with using a custom makeQualified in the Ozone code and it worked from command line but couldn't work with Spark which use the Hadoop api and the original makeQualified path.

      D.) Solution

      We should support makeQualified calls, so we can use any path in the defaultFS.
       
      I propose to use a simplified schema as o3://bucket.volume/ 

      This is similar to the s3a  format where the pattern is s3a://bucket.region/ 

      We don't need to set the hostname of the datanode (or ksm in case of service discovery) but it would be configurable with additional hadoop configuraion values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 (this is how the s3a works today, as I know).

      We also need to define restrictions for the volume names (in our case it should not include dot any more).

      ps: some spark output

      2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
      2018-02-03 18:43:05 INFO  Client:54 - Uploading resource file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip -> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/_spark_libs_2440448967844904444.zip

      My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name of the home directory.

       

      Attachments

        1. HDFS-13108-HDFS-7240.001.patch
          12 kB
          Marton Elek
        2. HDFS-13108-HDFS-7240.002.patch
          12 kB
          Marton Elek
        3. HDFS-13108-HDFS-7240.003.patch
          12 kB
          Marton Elek
        4. HDFS-13108-HDFS-7240.005.patch
          15 kB
          Marton Elek
        5. HDFS-13108-HDFS-7240.006.patch
          15 kB
          Marton Elek
        6. HDFS-13108-HDFS-7240.007.patch
          15 kB
          Marton Elek

        Activity

          People

            elek Marton Elek
            elek Marton Elek
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: