Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.0, 0.8.3, 1.1.0
-
None
Description
Hive hook is expected to create one column-lineage entity for each column in the output table. However, for each output column, hive hook might generates multiple column-lineage entities when multiple partitions are involved - one entity for each partition. This can end up with large number of duplciate column-lineage entities, depending on the number of partitions. Such duplicate entities should be avoided.
Here is the sample HSQL to repro this issue:
CREATE TABLE visitors(name STRING, dob DATE) PARTITIONED BY (yob INT); CREATE TABLE visitors_log(name STRING, dob DATE); INSERT INTO TABLE visitors_log VALUES('John', '1980-08-08'), ('Jack', '1980-09-09'), ('Kevin', '1990-10-10'), ('Ken', '1990-11-11'), ('Larry', '1995-12-12'); SET hive.exec.dynamic.partition.mode=nonstrict; INSERT INTO TABLE visitors PARTITION(yob) SELECT name, dob, YEAR(dob) yob FROM visitors_log;
In above case, columns visitors.name and visitors.dob will have 3 input lineage - one for each partition 1980, 1990 and 1995.
Attachments
Attachments
Issue Links
- links to