Pig
  1. Pig
  2. PIG-3739

The Warning_4 e2e test is broken in trunk

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      This is a regression of PIG-2207.

      The Warning_4 e2e test fails because expected warning messages are not printed to stderr. I confirmed that the following lines are present w/o PIG-2207, whereas they are not w/ PIG-2207.

      2014-02-02 04:48:11,211 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_3 10989 time(s).
      2014-02-02 04:48:11,211 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_4 22 time(s).
      

      Here is the test query-

      register ./lib/java/testudf.jar;
      a = load '/user/pig/tests/data/singlefile/studentnulltab10k' as (name, age: int, gpa: float);
      b = foreach a generate org.apache.pig.test.udf.evalfunc.TestWarningFunc(name, age, gpa);
      store b into '/user/pig/out/cheolsoop-1391202001-cmdline.conf-Warning/Warning_4.out';
      
      1. PIG-3739-4.patch
        8 kB
        Aniket Mokashi
      2. PIG-3739-1.patch
        1 kB
        Aniket Mokashi

        Issue Links

          Activity

          Hide
          Aniket Mokashi added a comment -

          This is expected due to backwards incompatibility of PIG-2207. I will submit a patch to fix the test.

          Show
          Aniket Mokashi added a comment - This is expected due to backwards incompatibility of PIG-2207 . I will submit a patch to fix the test.
          Hide
          Aniket Mokashi added a comment - - edited

          New counters

          Group Name Map Reduce Total
          org.​apache.​pig.​PigWarning SKIP_UDF_CALL_FOR_NULL 3 0 3
          org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_3 10980 0 10980
          org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_4 19 0 19

          Old counters (with pig11)

          Group Name Map Reduce Total
          org.​apache.​pig.​PigWarning UDF_WARNING_3 10989 0 10989
          org.​apache.​pig.​PigWarning UDF_WARNING_4 22 0 22

          PIG-2207 caused the grouping to change. What's causing counter numbers to change? Cheolsoo Park, Daniel Dai comments?

          Show
          Aniket Mokashi added a comment - - edited New counters Group Name Map Reduce Total org.​apache.​pig.​PigWarning SKIP_UDF_CALL_FOR_NULL 3 0 3 org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_3 10980 0 10980 org.​apache.​pig.​test.​udf.​evalfunc.​TestWarningFunc UDF_WARNING_4 19 0 19 Old counters (with pig11) Group Name Map Reduce Total org.​apache.​pig.​PigWarning UDF_WARNING_3 10989 0 10989 org.​apache.​pig.​PigWarning UDF_WARNING_4 22 0 22 PIG-2207 caused the grouping to change. What's causing counter numbers to change? Cheolsoo Park , Daniel Dai comments?
          Hide
          Cheolsoo Park added a comment -

          I think it has to do with PIG-3679 that made POUserFunc skip calling UDF for null tuples (i.e. all the fields are null).

          1. From the counter, there were 3 null tuples.
              SKIP_UDF_CALL_FOR_NULL = 3
            + UDF_WARNING_4          = 19
            -----------------------------
              UDF_WARNING_4          = 22
            
          2. Tuples in the test have 3 fields (name, age: int, gpa: float). If you look at TestWarningFunc.java, UDF_WARNING_3 is incremented per field.
              UDF_WARNING_3 = 10989
            - UDF_WARNING_3 = 10980
            -----------------------------
                            = 9
            

            The difference is 9, so 9 / 3 = 3 tuples are skipped. This matches #1.

          Show
          Cheolsoo Park added a comment - I think it has to do with PIG-3679 that made POUserFunc skip calling UDF for null tuples (i.e. all the fields are null). From the counter, there were 3 null tuples. SKIP_UDF_CALL_FOR_NULL = 3 + UDF_WARNING_4 = 19 ----------------------------- UDF_WARNING_4 = 22 Tuples in the test have 3 fields ( name, age: int, gpa: float ). If you look at TestWarningFunc.java, UDF_WARNING_3 is incremented per field. UDF_WARNING_3 = 10989 - UDF_WARNING_3 = 10980 ----------------------------- = 9 The difference is 9, so 9 / 3 = 3 tuples are skipped. This matches #1.
          Hide
          Daniel Dai added a comment -

          All null tuple is passed to UDF before PIG-3679. Seems only a single item tuple with null value is skipped.

          Show
          Daniel Dai added a comment - All null tuple is passed to UDF before PIG-3679 . Seems only a single item tuple with null value is skipped.
          Hide
          Aniket Mokashi added a comment -

          Thanks Cheolsoo Park for looking into this. This makes sense, so, we need to fix two things - fix the test and fix computeWarningAggregate to print the new counters on stdout.

          Show
          Aniket Mokashi added a comment - Thanks Cheolsoo Park for looking into this. This makes sense, so, we need to fix two things - fix the test and fix computeWarningAggregate to print the new counters on stdout.
          Hide
          Daniel Dai added a comment -

          The goal of PIG-3679 is to bring backward compatibility, we shall not fix the test but need to bring old behavior. Hope this will do:
          Change:

          boolean allNulls = true;
          for (int i = 0; i < t.size(); i++) {
              if (!t.isNull(i)) {
                  allNulls = false;
                  break;
              }
          }
          if (allNulls) {
            // skip
            ...
          }
          

          to something like:

          if (t.size()==1 && t.isNull(i)) {
            //skip
            ...
          } else {
            pass to UDF
          }
          
          Show
          Daniel Dai added a comment - The goal of PIG-3679 is to bring backward compatibility, we shall not fix the test but need to bring old behavior. Hope this will do: Change: boolean allNulls = true ; for ( int i = 0; i < t.size(); i++) { if (!t.isNull(i)) { allNulls = false ; break ; } } if (allNulls) { // skip ... } to something like: if (t.size()==1 && t.isNull(i)) { //skip ... } else { pass to UDF }
          Hide
          Aniket Mokashi added a comment -

          Thanks Daniel Dai, I have attached the suggested fix at PIG-3679.

          Show
          Aniket Mokashi added a comment - Thanks Daniel Dai , I have attached the suggested fix at PIG-3679 .
          Hide
          Aniket Mokashi added a comment -

          PIG-2207 changed group of WARNINGs from PigWarning(enum) to class of udf (string). This breaks computeWarningAggregate and hence no warning messages are displayed on stdout. We can get list of udfs for a job from jcc.getJobMroMap().get(job).UDFs (private->public or getter) and then iterate over udfs and get all the counters (not ideal). Comments?

          Show
          Aniket Mokashi added a comment - PIG-2207 changed group of WARNINGs from PigWarning(enum) to class of udf (string). This breaks computeWarningAggregate and hence no warning messages are displayed on stdout. We can get list of udfs for a job from jcc.getJobMroMap().get(job).UDFs (private->public or getter) and then iterate over udfs and get all the counters (not ideal). Comments?
          Hide
          Daniel Dai added a comment -

          That's fine as long as we skip the standard Hadoop counters.

          Show
          Daniel Dai added a comment - That's fine as long as we skip the standard Hadoop counters.
          Hide
          Aniket Mokashi added a comment -

          I tried PIG-3739-4.patch, but it turned out tricky. By the time, we query for counters, we lose mapping between jobs and MROs (and hence the udfs). Quickest way to fix this would be get the backwards compatible behavior from reverting a part of PIG-2207. Testing it.

          Show
          Aniket Mokashi added a comment - I tried PIG-3739 -4.patch, but it turned out tricky. By the time, we query for counters, we lose mapping between jobs and MROs (and hence the udfs). Quickest way to fix this would be get the backwards compatible behavior from reverting a part of PIG-2207 . Testing it.
          Hide
          Aniket Mokashi added a comment -

          Please review PIG-3739-1.patch that reverts a part of the backwards incompatible behavior in PIG-2207.

          Show
          Aniket Mokashi added a comment - Please review PIG-3739 -1.patch that reverts a part of the backwards incompatible behavior in PIG-2207 .
          Hide
          Cheolsoo Park added a comment -

          +1 to PIG-3739-1.patch. Let's get 0.13 out.

          Btw, I am curious whether it's okay to break backward compatibility of counters. I don't think it's necessary to keep counters backward compatible, but that's a separate discussion.

          Show
          Cheolsoo Park added a comment - +1 to PIG-3739 -1.patch . Let's get 0.13 out. Btw, I am curious whether it's okay to break backward compatibility of counters. I don't think it's necessary to keep counters backward compatible, but that's a separate discussion.
          Hide
          Aniket Mokashi added a comment -

          Committed to trunk and branch 0.13! Thanks Cheolsoo Park and Daniel Dai.

          Show
          Aniket Mokashi added a comment - Committed to trunk and branch 0.13! Thanks Cheolsoo Park and Daniel Dai .

            People

            • Assignee:
              Aniket Mokashi
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development