Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-3836

Add Apache Ozone support in hive hook

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • atlas-core

    Description

      Apache Ozone is the new object store for Hadoop - https://hadoop.apache.org/ozone/

      Apache Atlas needs to add entity types to support creation of Ozone entities. Hive hook should also be updated to create lineage between ozone entities and hive tables (for EXTERNAL TABLE)

      Approach :

      1. Refactored BaseHiveEvent.getPathEntity() -> moved to AtlasPathExtractorUtil.java
      2. Created PathExtractorContext.java to wrap most arguments.
      3. AtlasPathExtractorUtil.getPathEntity() -> accept Path, PathExtractorContext -> return AtlasEntityWithExtInfo
      4. Added specific condition in AtlasPathExtractorUtil.getPathEntity() to handle Ozone path
        -> path starts with "ofs://" or "o3fs://"
      5. Added UT around AtlasPathExtractorUtil.getPathEntity() -> AtlasPathExtractorUtilTest.java

       

      Examples :

      -> CREATE EXTERNAL TABLE sales (id int) row format delimited fields terminated by ' ' stored as textfile location 'o3fs://bucket1.volume1.ozone1/sale1/q1/sales';
        Name Qualified Name
      ozone_key /sale1/q1/sales o3fs://bucket1.volume1.ozone1/sale1/q1/sales@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

       

      -> create EXTERNAL table stocks (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/stocks.txt' into table stocks;
      
        Name Qualified Name
      ozone_key /stocks.txt o3fs://bucket1.volume1.ozone1/stocks.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

       

      -> create EXTERNAL table stocks_q1 (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt' into table stocks_q1;
        Name Qualified Name
      ozone_key /quarter1/stocks_q1.txt o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

      Note: The approach has been updated in ATLAS-3879

       

      Attachments

        1. Ozone_volume.png
          154 kB
          Nikhil P Bonte
        2. Hive_table_lineage_load_in_path.png
          193 kB
          Nikhil P Bonte
        3. Hive_table_lineage.png
          157 kB
          Nikhil P Bonte
        4. Ozone_bucket.png
          164 kB
          Nikhil P Bonte
        5. Ozone_key.png
          183 kB
          Nikhil P Bonte

        Issue Links

          Activity

            People

              nbonte Nikhil P Bonte
              sarath Sarath Subramanian
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: