Pig
  1. Pig
  2. PIG-2023

lineage tracking for casting should compare LoadCaster returned from LoadFunc instead of comparing the FuncSpec

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When lineage of a column is tracked for the purpose of finding the LoadCaster associated with a column, and it finds that a column has two possible sources, it associates a LoadCaster (through a LoadFunc) only if the funcspec for LoadFunc in both cases are the same. But it is possible that the two LoadFunc with different func spec actually use the same LoadCaster (for example the default of Utf8StorageConverter). If the LoadFunc funcspec don't match, the LoadCaster returned by the LoadFunc should also be compred. If they are equal, this LoadCaster should be associated with the column . The LoadCaster implementation would need to override equals().

      For example, in this case the columns in relation u use the same LoadCaster -

      l1 = load 'x' using PigStorage(',') as (a,b);
      l2 = load 'y' using PigStorage(':') as (a,b);
      u = union l1,l2;
      

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            Thejas M Nair
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development