Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5277

Spark mode is writing nulls among tuples to the output

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • spark
    • None

    Description

      After committing PIG-3655 a couple of Spark mode tests (e.g. org.apache.pig.test.TestEvalPipeline.testCogroupAfterDistinct) started failing on:

      java.lang.Error: java.io.IOException: Corrupt data file, expected tuple type byte, but seen 27
      	at org.apache.pig.backend.hadoop.executionengine.HJob$1.hasNext(HJob.java:122)
      	at org.apache.pig.test.TestEvalPipeline.testCogroupAfterDistinct(TestEvalPipeline.java:1052)
      Caused by: java.io.IOException: Corrupt data file, expected tuple type byte, but seen 27
      	at org.apache.pig.impl.io.InterRecordReader.readDataOrEOF(InterRecordReader.java:158)
      	at org.apache.pig.impl.io.InterRecordReader.nextKeyValue(InterRecordReader.java:194)
      	at org.apache.pig.impl.io.InterStorage.getNext(InterStorage.java:79)
      	at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:238)
      	at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:218)
      	at org.apache.pig.backend.hadoop.executionengine.HJob$1.hasNext(HJob.java:115)
      

      This is because InterRecordReader became much stricter after PIG-3655. Before it just simply skipped these bytes thinking that they are just garbage on the split beginning. Now when we expect a proper tuple with a tuple type byte we see these nulls and throw an Exception.

      As I can see it this is happening because JoinGroupSparkConverter has to return something even when it shouldn't.
      When the POPackage operator returns a POStatus.STATUS_NULL, the converter shouldn't return a thing, but it can't do better than returning a null. This then gets written out by Spark..

      Attachments

        Issue Links

          Activity

            People

              szita Ádám Szita
              szita Ádám Szita
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: