Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-3836

Add Apache Ozone support in hive hook

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • atlas-core

    Description

      Apache Ozone is the new object store for Hadoop - https://hadoop.apache.org/ozone/

      Apache Atlas needs to add entity types to support creation of Ozone entities. Hive hook should also be updated to create lineage between ozone entities and hive tables (for EXTERNAL TABLE)

      Approach :

      1. Refactored BaseHiveEvent.getPathEntity() -> moved to AtlasPathExtractorUtil.java
      2. Created PathExtractorContext.java to wrap most arguments.
      3. AtlasPathExtractorUtil.getPathEntity() -> accept Path, PathExtractorContext -> return AtlasEntityWithExtInfo
      4. Added specific condition in AtlasPathExtractorUtil.getPathEntity() to handle Ozone path
        -> path starts with "ofs://" or "o3fs://"
      5. Added UT around AtlasPathExtractorUtil.getPathEntity() -> AtlasPathExtractorUtilTest.java

       

      Examples :

      -> CREATE EXTERNAL TABLE sales (id int) row format delimited fields terminated by ' ' stored as textfile location 'o3fs://bucket1.volume1.ozone1/sale1/q1/sales';
        Name Qualified Name
      ozone_key /sale1/q1/sales o3fs://bucket1.volume1.ozone1/sale1/q1/sales@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

       

      -> create EXTERNAL table stocks (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/stocks.txt' into table stocks;
      
        Name Qualified Name
      ozone_key /stocks.txt o3fs://bucket1.volume1.ozone1/stocks.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

       

      -> create EXTERNAL table stocks_q1 (id int) row format delimited fields terminated by ' ' stored as textfile;
      -> load data inpath 'o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt' into table stocks_q1;
        Name Qualified Name
      ozone_key /quarter1/stocks_q1.txt o3fs://bucket1.volume1.ozone1/quarter1/stocks_q1.txt@cm
      ozone_bucket bucket1 o3fs://volume1.bucket1@cm
      ozone_volume volume1 o3fs://volume1@cm

       

      Note: The approach has been updated in ATLAS-3879

       

      Attachments

        1. Ozone_key.png
          183 kB
          Nikhil P Bonte
        2. Ozone_bucket.png
          164 kB
          Nikhil P Bonte
        3. Hive_table_lineage.png
          157 kB
          Nikhil P Bonte
        4. Hive_table_lineage_load_in_path.png
          193 kB
          Nikhil P Bonte
        5. Ozone_volume.png
          154 kB
          Nikhil P Bonte

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nbonte Nikhil P Bonte
            sarath Sarath Subramanian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment