Hive hook is expected to create one column-lineage entity for each column in the output table. However, for each output column, hive hook might generates multiple column-lineage entities when multiple partitions are involved - one entity for each partition. This can end up with large number of duplciate column-lineage entities, depending on the number of partitions. Such duplicate entities should be avoided.
Here is the sample HSQL to repro this issue:
In above case, columns visitors.name and visitors.dob will have 3 input lineage - one for each partition 1980, 1990 and 1995.