Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4930

Skewed Join Breaks On Empty Sampled Input When Key is From Map

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.9.2, 0.16.0
    • 0.17.0, 0.16.1
    • None
    • None
    • Reviewed

    Description

      When using a skewed join, if the left relation gets its key from a map and said relation is empty, then the skewed join fails during the sampling phase with:

      org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: Local Rearrange[tuple]

      {tuple}

      (false) - scope-27 Operator Key: scope-27): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POMapLookUp (Name: POMapLookUp[bytearray] - scope-14 Operator Key: scope-14) children: null at [null[3,17]]]: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
      at org.apache.hadoop.mapred.Child.main(Child.java:249)

      I think the problem is more fundamental to Pig's skewed join implementation than maps, but it is easily demonstrable with them. I have written an additional test in TestSkewedJoin that demonstrates the problem. The join works correctly if we remove "using 'skewed'"

      Attachments

        1. empty_skew.diff
          3 kB
          William Butler
        2. PIG-4930.patch
          3 kB
          Nándor Kollár
        3. PIG-4930-2.patch
          3 kB
          Rohini Palaniswamy

        Activity

          People

            nkollar Nándor Kollár
            butlerw William Butler
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: