[HDFS-13108] Ozone: OzoneFileSystem: Simplified url schema for Ozone File System - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: HDFS-7240
Fix Version/s: HDFS-7240
Component/s: ozone
Labels:
None

Hadoop Flags:

Reviewed

Description

A. Current state

1. The datanode host / bucket /volume should be defined in the defaultFS (eg. o3://datanode:9864/test/bucket1)
2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the keys from the bucket1)

It works very well, but there are some limitations.

B. Problem one

The current code doesn't support fully qualified locations. For example 'dfs -ls o3://datanode:9864/test/bucket1/dir1' is not working.

C.) Problem two

I tried to fix the previous problem, but it's not trivial. The biggest problem is that there is a Path.makeQualified call which could transform unqualified url to qualified url. This is part of the Path.java so it's common for all the Hadoop file systems.

In the current implementations it qualifies an url with keeping the schema (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use the relative path as the end of the qualified url. For example:

makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will return o3://datanode:9864/dir1/file which is obviously wrong (the good would be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround with using a custom makeQualified in the Ozone code and it worked from command line but couldn't work with Spark which use the Hadoop api and the original makeQualified path.

D.) Solution

We should support makeQualified calls, so we can use any path in the defaultFS.

I propose to use a simplified schema as o3://bucket.volume/

This is similar to the s3a format where the pattern is s3a://bucket.region/

We don't need to set the hostname of the datanode (or ksm in case of service discovery) but it would be configurable with additional hadoop configuraion values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 (this is how the s3a works today, as I know).

We also need to define restrictions for the volume names (in our case it should not include dot any more).

ps: some spark output

2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-02-03 18:43:05 INFO Client:54 - Uploading resource file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip -> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/_spark_libs_2440448967844904444.zip

My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name of the home directory.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-13108-HDFS-7240.001.patch
11/Feb/18 11:35
12 kB
Marton Elek
HDFS-13108-HDFS-7240.002.patch
12/Feb/18 18:11
12 kB
Marton Elek
HDFS-13108-HDFS-7240.003.patch
13/Feb/18 21:44
12 kB
Marton Elek
HDFS-13108-HDFS-7240.005.patch
16/Feb/18 09:21
15 kB
Marton Elek
HDFS-13108-HDFS-7240.006.patch
20/Feb/18 18:21
15 kB
Marton Elek
HDFS-13108-HDFS-7240.007.patch
21/Feb/18 07:17
15 kB
Marton Elek

Activity

People

Assignee:: Marton Elek

Reporter:: Marton Elek

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 05/Feb/18 22:47

Updated:: 26/Apr/18 21:57

Resolved:: 14/Mar/18 00:10