Pig
  1. Pig
  2. PIG-2207

Support custom counters for aggregating warnings from different udfs

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Pig allows udfs to aggregate warning messages instead of writing out a separate warning message each time. Udfs can do this by logging the warning using EvalFunc.warn(String msg, Enum) call. But the udfs are forced to use PigWarning class if the warning needs to be printed at the end of the pig script .

      For example, with the changes in PIG-2191, some of the builtin udfs are using PigWarning.UDF_WARNING_1 as argument in calls to EvalFunc.warn. This will result in the warning count being printed on STDERR -

      2011-08-05 22:10:29,285 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_1 2 time(s).
      2011-08-05 22:10:29,285 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
      

      But it would be better if a udf such as the LOWER udf could use a custom warning counter, and the STDERR is like -

      2011-08-05 22:10:29,285 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning LOWER_FUNC_INPUT_WARNING 2 time(s).
      2011-08-05 22:10:29,285 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
      

      A new function could be added to support this - (something like) EvalFunc.warn(String warnName, String warnMsg); A specific counter group could be used for udf warnings (see org.apache.hadoop.mapred.Counters), and counters for that group could be done during final warning aggregation in done in MapReduceLauncher.computeWarningAggregate().

      1. PIG-2207-5.patch
        6 kB
        Aniket Mokashi
      2. PIG-2207-1.patch
        2 kB
        Aniket Mokashi

        Issue Links

          Activity

          Hide
          Dmitriy V. Ryaboy added a comment -

          You read my mind.

          I think for starters, we should increment the warning counters in groups based on the UDF. Right now all the UDF_WARNING_1s roll together, but we could split them into MyUDF / UDF_WARNING_1, MyOtherUDF / UDF_WARNING_2

          I also think the warning message should be printed "at least once" – just the counters alone aren't always sufficient. We could get this fairly cheaply by keeping a weakref hashmap of seen warnings, and logging the message every time we see a new warning.

          Show
          Dmitriy V. Ryaboy added a comment - You read my mind. I think for starters, we should increment the warning counters in groups based on the UDF. Right now all the UDF_WARNING_1s roll together, but we could split them into MyUDF / UDF_WARNING_1, MyOtherUDF / UDF_WARNING_2 I also think the warning message should be printed "at least once" – just the counters alone aren't always sufficient. We could get this fairly cheaply by keeping a weakref hashmap of seen warnings, and logging the message every time we see a new warning.
          Hide
          Thejas M Nair added a comment -

          It will not be a good idea for udfs to use separate counters until the next-gen mapreduce is in (MR-279). A large number of counters can cause the the job to be killed, for hadoop 0.20.x (see MAPREDUCE-1943). MR-279 will enable much larger number of counters to be used.

          Show
          Thejas M Nair added a comment - It will not be a good idea for udfs to use separate counters until the next-gen mapreduce is in (MR-279). A large number of counters can cause the the job to be killed, for hadoop 0.20.x (see MAPREDUCE-1943 ). MR-279 will enable much larger number of counters to be used.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Your proposal suggests custom counters via strings, so we have this problem either way. At least making the group match the class that issues the warning constrains the space of possible counters to num_udfs x 4_generic_warnings, whereas the space of possible strings is (for our purposes) unlimited.

          Show
          Dmitriy V. Ryaboy added a comment - Your proposal suggests custom counters via strings, so we have this problem either way. At least making the group match the class that issues the warning constrains the space of possible counters to num_udfs x 4_generic_warnings, whereas the space of possible strings is (for our purposes) unlimited.
          Hide
          Thejas M Nair added a comment -

          I also think the warning message should be printed "at least once" – just the counters alone aren't always sufficient. We could get this fairly cheaply by keeping a weakref hashmap of seen warnings, and logging the message every time we see a new warning.

          This way the single line of warning can be printed in the backend (task) logs, but not in client to the user. It does not provide a means for the backend to client communication.

          Your proposal suggests custom counters via strings, so we have this problem either way. At least making the group match the class that issues the warning constrains the space of possible counters to num_udfs x 4_generic_warnings, whereas the space of possible strings is (for our purposes) unlimited.

          I think the counter name should based on warnName + first warnMsg, in EvalFunc.warn(String warnName, String warnMsg). That way a single warning message also will be displayed on client side, when warning aggregations is turned on. When warning aggregation is turned off, each warnMsg goes into log.

          Show
          Thejas M Nair added a comment - I also think the warning message should be printed "at least once" – just the counters alone aren't always sufficient. We could get this fairly cheaply by keeping a weakref hashmap of seen warnings, and logging the message every time we see a new warning. This way the single line of warning can be printed in the backend (task) logs, but not in client to the user. It does not provide a means for the backend to client communication. Your proposal suggests custom counters via strings, so we have this problem either way. At least making the group match the class that issues the warning constrains the space of possible counters to num_udfs x 4_generic_warnings, whereas the space of possible strings is (for our purposes) unlimited. I think the counter name should based on warnName + first warnMsg, in EvalFunc.warn(String warnName, String warnMsg). That way a single warning message also will be displayed on client side, when warning aggregations is turned on. When warning aggregation is turned off, each warnMsg goes into log.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Right, the client communication is done via counters, and if they want to debug, they can go look in the logs. Other than syntax errors, all of my Pig job debugging starts with looking at task logs anyway.

          I see what you meant with the counter names now – so the counter would be "PigCounters.UDF_WARNING_1-dereferencing_null_in_LOWER" for example. This accomplishes the goal of communicating the error to the client, but still produces many counters, which you just said goes against the recommendation in MAPREDUCE-1943, and also makes the counter name too long – 1943 restricts counter names to 64 chars.

          What is the value of keeping all udf counters in the same group?

          Show
          Dmitriy V. Ryaboy added a comment - Right, the client communication is done via counters, and if they want to debug, they can go look in the logs. Other than syntax errors, all of my Pig job debugging starts with looking at task logs anyway. I see what you meant with the counter names now – so the counter would be "PigCounters.UDF_WARNING_1-dereferencing_null_in_LOWER" for example. This accomplishes the goal of communicating the error to the client, but still produces many counters, which you just said goes against the recommendation in MAPREDUCE-1943 , and also makes the counter name too long – 1943 restricts counter names to 64 chars. What is the value of keeping all udf counters in the same group?
          Hide
          Thejas M Nair added a comment -

          What is the value of keeping all udf counters in the same group?

          On the client side that will help finding which counters are warning counters, so that those can be printed.

          Other than syntax errors, all of my Pig job debugging starts with looking at task logs anyway.

          I was thinking of the case where you don't know that you are supposed to start debugging . I guess, the generic udf warnings counter could might be good enough for that.

          The 64 char limit would be another reason why what I proposed will not work with MAPREDUCE-1943. But even MR-279 might also end up having such limitations in the future. (Unless pig ships with a custom MR application master!)

          Show
          Thejas M Nair added a comment - What is the value of keeping all udf counters in the same group? On the client side that will help finding which counters are warning counters, so that those can be printed. Other than syntax errors, all of my Pig job debugging starts with looking at task logs anyway. I was thinking of the case where you don't know that you are supposed to start debugging . I guess, the generic udf warnings counter could might be good enough for that. The 64 char limit would be another reason why what I proposed will not work with MAPREDUCE-1943 . But even MR-279 might also end up having such limitations in the future. (Unless pig ships with a custom MR application master!)
          Hide
          Thejas M Nair added a comment -

          I think for now, it makes sense to go with what Dmitriy proposed in his first comment .

          Show
          Thejas M Nair added a comment - I think for now, it makes sense to go with what Dmitriy proposed in his first comment .
          Show
          Aniket Mokashi added a comment - RB: https://reviews.apache.org/r/17087/
          Hide
          Aniket Mokashi added a comment -

          Committed to trunk! Thanks Daniel Dai for the review.

          Show
          Aniket Mokashi added a comment - Committed to trunk! Thanks Daniel Dai for the review.
          Hide
          Daniel Dai added a comment -

          Also this is a minor backward incompatibility change, we need to put it in "INCOMPATIBLE CHANGES" section, and properly document it.

          Show
          Daniel Dai added a comment - Also this is a minor backward incompatibility change, we need to put it in "INCOMPATIBLE CHANGES" section, and properly document it.
          Hide
          Aniket Mokashi added a comment -

          Backward's incompatible because someone consuming the counter programmatically would not find it?

          Show
          Aniket Mokashi added a comment - Backward's incompatible because someone consuming the counter programmatically would not find it?
          Hide
          Daniel Dai added a comment -

          Yes, it is exposed in MRJobStats, might affect some ppl.

          Show
          Daniel Dai added a comment - Yes, it is exposed in MRJobStats, might affect some ppl.
          Hide
          Cheolsoo Park added a comment -

          Aniket Mokashi, do you mind taking a quick look at PIG-3739? It fails after this commit, and I am not sure whether the test needs to be modified or this patch introduces a bug. Thanks!

          Show
          Cheolsoo Park added a comment - Aniket Mokashi , do you mind taking a quick look at PIG-3739 ? It fails after this commit, and I am not sure whether the test needs to be modified or this patch introduces a bug. Thanks!
          Hide
          Aniket Mokashi added a comment -

          Cheolsoo Park, its not a bug, we have changed the message as well as the counter. I will patch the e2e test.

          Show
          Aniket Mokashi added a comment - Cheolsoo Park , its not a bug, we have changed the message as well as the counter. I will patch the e2e test.
          Hide
          Cheolsoo Park added a comment -

          Very cool! Thanks a lot!

          Show
          Cheolsoo Park added a comment - Very cool! Thanks a lot!

            People

            • Assignee:
              Aniket Mokashi
              Reporter:
              Thejas M Nair
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development