Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5019

Pig generates tons of warnings for udf with enabled warnings aggregation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 0.17.0, 0.16.1
    • internal-udfs
    • None
    • Reviewed

    Description

      For data set containing 9 lines the aggregated warning message is displayed

      2016-09-01 19:40:33,664 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning UDF_WARNING_1 6 time(s).
      

      but in contained logs I see a separate log message "Cannot
      extract group for input" for every not matching value

      2016-09-01 19:40:28,115 INFO [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: Aliases being processed per job phase (AliasName[line,offset]): M
      : b[10,4],b[-1,-1],extract_fields[17,17] C:  R: 
      2016-09-01 19:40:28,122 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtrac
      t : Cannot extract group for input /v1=1&v3=9
      2016-09-01 19:40:28,124 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtrac
      t : Cannot extract group for input /v2=3&v3=7
      2016-09-01 19:40:28,124 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v1=4&v3=6
      2016-09-01 19:40:28,125 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v2=5&v3=5
      2016-09-01 19:40:28,125 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v1=8&v3=2
      2016-09-01 19:40:28,125 WARN [main] org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger: org.apache.pig.builtin.REGEX_EXTRACT(UDF_WARNING_1): RegexExtract : Cannot extract group for input /v3=9&v2=1
      

      It does not log the warning messages in the task logs.

      The patch for PIG-2207 was committed to
      Pig 0.13+

      In 0.12 we had a single counter for all UDF warnings, but in 0.13+ we have
      separate counter and message for every unique warning log line.

      Two lines below are unique
      /v2=3&v3=7
      /v1=4&v3=6

      That's why Pig print both of them to the console.

      Printing a separate log message for every data line slows down the overall performance as well.

      Attachments

        1. PIG-5019_3.patch
          1 kB
          Rohini Palaniswamy
        2. PIG-5019_2.patch
          1.0 kB
          Murshid Chalaev
        3. PIG-5019.patch
          1 kB
          Murshid Chalaev
        4. test_pig14_udf .pig
          0.3 kB
          Murshid Chalaev
        5. input_example.gz
          0.1 kB
          Murshid Chalaev

        Issue Links

          Activity

            People

              murshyd Murshid Chalaev
              murshyd Murshid Chalaev
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: