Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-3836

Add Apache Ozone support in hive hook

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.1.0
    • Component/s: atlas-core
    • Labels:

      Description

      Apache Ozone is the new object store for Hadoop - https://hadoop.apache.org/ozone/

      Apache Atlas needs to add entity types to support creation of Ozone entities. Hive hook should also be updated to create lineage between ozone entities and hive tables (for EXTERNAL TABLE)

      Approach :

      1. Refactored BaseHiveEvent.getPathEntity() -> moved to AtlasPathExtractorUtil.java
      2. Created PathExtractorContext.java to wrap most arguments.
      3. AtlasPathExtractorUtil.getPathEntity() -> accept Path, PathExtractorContext -> return AtlasEntityWithExtInfo
      4. Added specific condition in AtlasPathExtractorUtil.getPathEntity() to handle Ozone path
        -> path starts with "ofs://" or "o3fs://"
      5. Added UT around AtlasPathExtractorUtil.getPathEntity() -> AtlasPathExtractorUtilTest.java

       

      Examples :

      -> CREATE EXTERNAL TABLE sales (id int) row format delimited fields terminated by ' ' stored as textfile location 'o3fs://bucket1.volume1.ozone1/sale1/q1/sales';
        Name Qualified Name
      ozone_key /sale1/q1/sales o3fs://bucket1.volume1.ozone1/sale1/q1/sales@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

       

      -> create EXTERNAL table stocks (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/stocks.txt' into table stocks;
      
        Name Qualified Name
      ozone_key /stocks.txt o3fs://bucket1.volume1.ozone1/stocks.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

       

      -> create EXTERNAL table stocks_q1 (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt' into table stocks_q1;
        Name Qualified Name
      ozone_key /quarter1/stocks_q1.txt o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

      Note: The approach has been updated in ATLAS-3879

       

        Attachments

        1. Ozone_volume.png
          154 kB
          Nikhil Bonte
        2. Hive_table_lineage_load_in_path.png
          193 kB
          Nikhil Bonte
        3. Hive_table_lineage.png
          157 kB
          Nikhil Bonte
        4. Ozone_bucket.png
          164 kB
          Nikhil Bonte
        5. Ozone_key.png
          183 kB
          Nikhil Bonte

          Issue Links

            Activity

              People

              • Assignee:
                nikhilbonte Nikhil Bonte
                Reporter:
                sarath Sarath Subramanian
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: