Add Apache Ozone support in hive hook



    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • atlas-core


      Apache Ozone is the new object store for Hadoop - https://hadoop.apache.org/ozone/

      Apache Atlas needs to add entity types to support creation of Ozone entities. Hive hook should also be updated to create lineage between ozone entities and hive tables (for EXTERNAL TABLE)

      Approach :

      1. Refactored BaseHiveEvent.getPathEntity() -> moved to AtlasPathExtractorUtil.java
      2. Created PathExtractorContext.java to wrap most arguments.
      3. AtlasPathExtractorUtil.getPathEntity() -> accept Path, PathExtractorContext -> return AtlasEntityWithExtInfo
      4. Added specific condition in AtlasPathExtractorUtil.getPathEntity() to handle Ozone path
        -> path starts with "ofs://" or "o3fs://"
      5. Added UT around AtlasPathExtractorUtil.getPathEntity() -> AtlasPathExtractorUtilTest.java


      Examples :

      -> CREATE EXTERNAL TABLE sales (id int) row format delimited fields terminated by ' ' stored as textfile location 'o3fs://bucket1.volume1.ozone1/sale1/q1/sales';
        Name Qualified Name
      ozone_key /sale1/q1/sales o3fs://bucket1.volume1.ozone1/sale1/q1/sales@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm



      -> create EXTERNAL table stocks (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/stocks.txt' into table stocks;
        Name Qualified Name
      ozone_key /stocks.txt o3fs://bucket1.volume1.ozone1/stocks.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm



      -> create EXTERNAL table stocks_q1 (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt' into table stocks_q1;
        Name Qualified Name
      ozone_key /quarter1/stocks_q1.txt o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm


      Note: The approach has been updated in ATLAS-3879



