Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-2975

Hive hook generates duplicate column_lineage entities

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0, 0.8.3, 1.1.0
    • 0.8.4, 1.2.0, 2.0.0
    • atlas-intg
    • None

    Description

      Hive hook is expected to create one column-lineage entity for each column in the output table. However, for each output column, hive hook might generates multiple column-lineage entities when multiple partitions are involved - one entity for each partition. This can end up with large number of duplciate column-lineage entities, depending on the number of partitions. Such duplicate entities should be avoided.

      Here is the sample HSQL to repro this issue:

      CREATE TABLE visitors(name STRING, dob DATE) PARTITIONED BY (yob INT);
      CREATE TABLE visitors_log(name STRING, dob DATE);
      
      INSERT INTO TABLE visitors_log VALUES('John',  '1980-08-08'),
                                           ('Jack',  '1980-09-09'),
                                           ('Kevin', '1990-10-10'),
                                           ('Ken',   '1990-11-11'),
                                           ('Larry', '1995-12-12');
      
      SET hive.exec.dynamic.partition.mode=nonstrict;
      
      INSERT INTO TABLE visitors PARTITION(yob) SELECT name, dob, YEAR(dob) yob FROM visitors_log;
      

      In above case, columns visitors.name and visitors.dob will have 3 input lineage - one for each partition 1980, 1990 and 1995.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            madhan Madhan Neethiraj
            madhan Madhan Neethiraj
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment