Pig
  1. Pig
  2. PIG-831

Records and bytes written reported by pig are wrong in a multi-store program

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.3.0
    • Fix Version/s: 0.4.0
    • Component/s: impl
    • Labels:
      None

      Description

      The stats features checked in as part of PIG-626 (reporting the number of records and bytes written at the end of the query) print wrong values (often but not always 0) when the pig script being run contains more than 1 store.

        Activity

        Alan Gates created issue -
        Hide
        Alan Gates added a comment -

        There are a couple of issues going on here.

        One, PigStats looks through the plan until it finds the first root and then stops. So for multi-store scripts that have multiple roots in their plans, this does not work.

        Two, Hadoop does not return accurate numbers for records written in many cases. I do not know if this is a bug in hadoop or a bug in the output format pig uses when doing multiple stores in one job.

        Show
        Alan Gates added a comment - There are a couple of issues going on here. One, PigStats looks through the plan until it finds the first root and then stops. So for multi-store scripts that have multiple roots in their plans, this does not work. Two, Hadoop does not return accurate numbers for records written in many cases. I do not know if this is a bug in hadoop or a bug in the output format pig uses when doing multiple stores in one job.
        Hide
        Alan Gates added a comment -

        This patch addresses the two problems listed above. It changes the stats patch to collect all root MR jobs instead of just the first it encounters. The second issue (that MR returns bogus results for multi-store scripts) is addressed by having pig not report records written in this case.

        Show
        Alan Gates added a comment - This patch addresses the two problems listed above. It changes the stats patch to collect all root MR jobs instead of just the first it encounters. The second issue (that MR returns bogus results for multi-store scripts) is addressed by having pig not report records written in this case.
        Alan Gates made changes -
        Field Original Value New Value
        Attachment PIG-831.patch [ 12409777 ]
        Hide
        Olga Natkovich added a comment -

        +1 on the patch. please, keep the bug open since we should at some point correctly report numbers for multiquery

        Show
        Olga Natkovich added a comment - +1 on the patch. please, keep the bug open since we should at some point correctly report numbers for multiquery
        Alan Gates made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hudson added a comment -

        Integrated in Pig-trunk #465 (See http://hudson.zones.apache.org/hudson/job/Pig-trunk/465/)
        : Turned off reporting of records and bytes written for mutli-store
        queries as the returned results are confusing and wrong.

        Show
        Hudson added a comment - Integrated in Pig-trunk #465 (See http://hudson.zones.apache.org/hudson/job/Pig-trunk/465/ ) : Turned off reporting of records and bytes written for mutli-store queries as the returned results are confusing and wrong.
        Hide
        Alan Gates added a comment -

        Fix checked in 6 June 2009

        Show
        Alan Gates added a comment - Fix checked in 6 June 2009
        Alan Gates made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.4.0 [ 12314042 ]
        Resolution Fixed [ 1 ]
        Alan Gates made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        2d 2h 37m 1 Alan Gates 04/Jun/09 22:58
        Patch Available Patch Available Resolved Resolved
        102d 17h 56m 1 Alan Gates 15/Sep/09 16:55
        Resolved Resolved Closed Closed
        190d 5h 17m 1 Alan Gates 24/Mar/10 22:13

          People

          • Assignee:
            Alan Gates
            Reporter:
            Alan Gates
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development