Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-1948

Importing hive_table in a database which is a CTAS of another table in different database throws exception due to export order.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.0
    • 0.8.1, 1.0.0
    • atlas-core
    • None

    Description

      1.Created 2 databases db1 , db2 in cluster1
      2.Created 2 tables
      1. db1.t1
      2. db2.t2 as select * from db1.t1
      3.Exported db1.t1 into zip file.
      4.Imported zip file into cluster 2 with transforms option :

      {
        "options": {
         "transforms": "{ \"hive_column\": { \"qualifiedName\": [ \"replace:cl1:cl2\" ]} }"
        }
      }
      

      5. Import fails with

      {"errorCode":"ATLAS-500-00-001","errorMessage":"org.apache.atlas.exception.AtlasBaseException: ObjectId is not valid AtlasObjectId{guid='51c77c1e-265e-46ab-bbb5-5316cf80a53c', typeName='hive_column', uniqueAttributes={}}"}
      

      Only db1.t1 is imported into Atlas without any lineage.

      Attached the exception stack trace.

      After this exporting db2.t2 and importing completes successfully.
      That is , first import ,either db1.t1 or db2.t1 is unsuccessful with exception. Next import is successful.

      The exception doesn't happen and tables are successfully imported If both the tables are in a single database. Export order if tables are in same db is
      1.table1,
      2.db,
      3.table2,
      4.hive_process
      5. hive_column_lineage

      If the tables are in different db , the order is ,
      1.table1,
      2.db1,
      3.hive_process,
      4.hive_column_lineage
      5.ctas table
      6.db2
      which is possibly causing the issue.

      When cluster2 starts importing , it imports table1 , db1 and when it comes to hive_column_lineage , it finds that column specified in hive_column_lineage is not in cluster2 yet ,since ctas table comes after the hive_column_lineage in import order and it throws "ObjectId is not valid AtlasObjectId{guid='51c77c1e-265e-46ab-bbb5-5316cf80a53c', typeName='hive_column' ".

      Thanks ayubkhan for the analysis.

      Attachments

        1. ImportTransformsErrorOnCTASonDiffDB.txt
          43 kB
          Sharmadha S
        2. db1tb1.zip
          7 kB
          Sharmadha S
        3. ATLAS-1948-Residual-import.patch
          29 kB
          Ashutosh Mestry

        Activity

          People

            amestry Ashutosh Mestry
            sharmadhas Sharmadha S
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: