Pig
  1. Pig
  2. PIG-2454

Make use of primitive tuples in builtin UDFs and operators

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      With PIG-2359, we introduce efficient primitives-only tuples. This ticket is for converting existing code to make use of them.

      1. BugFix_1.log
        5 kB
        Alan Gates
      2. Cross_1.log
        4 kB
        Alan Gates
      3. PIG-2454.patch
        66 kB
        Dmitriy V. Ryaboy
      4. Types_6.log
        3 kB
        Alan Gates

        Issue Links

          Activity

          Hide
          Dmitriy V. Ryaboy added a comment -

          This patch converts a number of internal UDFs and Physical Operators to use primitive tuples.

          It also adds a couple of minor optimizations that address inefficiencies that popped up in my profiling – excessive use of progress reporting, using tuple.append instead of tuple.set when we could pre-allocated the right number of fields, memoizing array.size() results (yes, that popped up in the profiler), etc.

          Show
          Dmitriy V. Ryaboy added a comment - This patch converts a number of internal UDFs and Physical Operators to use primitive tuples. It also adds a couple of minor optimizations that address inefficiencies that popped up in my profiling – excessive use of progress reporting, using tuple.append instead of tuple.set when we could pre-allocated the right number of fields, memoizing array.size() results (yes, that popped up in the profiler), etc.
          Hide
          Alan Gates added a comment -

          I haven't had time to review this yet, but since it's a very comprehensive change I've started a full run of end-to-end tests. Will post results once they've finished.

          Show
          Alan Gates added a comment - I haven't had time to review this yet, but since it's a very comprehensive change I've started a full run of end-to-end tests. Will post results once they've finished.
          Hide
          Alan Gates added a comment -

          My run of the nightly e2e tests got aborts on the following tests: Cross_1, Cross_2, Cross_3, Cross_4, Types_6, Types_7, Lineage_2, BugFix_1, BugFix_2, BugFix_3. Most of them were NPEs or class cast exceptions. I can send you the logs if you'd like. I don't see these failing when I run against trunk.

          Show
          Alan Gates added a comment - My run of the nightly e2e tests got aborts on the following tests: Cross_1, Cross_2, Cross_3, Cross_4, Types_6, Types_7, Lineage_2, BugFix_1, BugFix_2, BugFix_3. Most of them were NPEs or class cast exceptions. I can send you the logs if you'd like. I don't see these failing when I run against trunk.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Great, yes please email traces or attach them to the ticket.

          Show
          Dmitriy V. Ryaboy added a comment - Great, yes please email traces or attach them to the ticket.
          Hide
          Alan Gates added a comment -

          The errors seemed to fall into three categories, so I picked a representative trace from each category. You can look up the associated scripts that were run in test/e2e/pig/tests/nightly.conf

          Show
          Alan Gates added a comment - The errors seemed to fall into three categories, so I picked a representative trace from each category. You can look up the associated scripts that were run in test/e2e/pig/tests/nightly.conf
          Hide
          Jie Li added a comment -

          Has the PrimitiveTuple been used by Pig? I can't find any place calling TupleFactory#newTupleForSchema to create the PrimitiveTuple.

          Show
          Jie Li added a comment - Has the PrimitiveTuple been used by Pig? I can't find any place calling TupleFactory#newTupleForSchema to create the PrimitiveTuple.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Jon's implementation of custom code-gen per schema tuple is better, we'll go with that instead.

          Show
          Dmitriy V. Ryaboy added a comment - Jon's implementation of custom code-gen per schema tuple is better, we'll go with that instead.

            People

            • Assignee:
              Dmitriy V. Ryaboy
              Reporter:
              Dmitriy V. Ryaboy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development