Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5277

Spark mode is writing nulls among tuples to the output

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: spark
    • Labels:
      None

      Description

      After committing PIG-3655 a couple of Spark mode tests (e.g. org.apache.pig.test.TestEvalPipeline.testCogroupAfterDistinct) started failing on:

      java.lang.Error: java.io.IOException: Corrupt data file, expected tuple type byte, but seen 27
      	at org.apache.pig.backend.hadoop.executionengine.HJob$1.hasNext(HJob.java:122)
      	at org.apache.pig.test.TestEvalPipeline.testCogroupAfterDistinct(TestEvalPipeline.java:1052)
      Caused by: java.io.IOException: Corrupt data file, expected tuple type byte, but seen 27
      	at org.apache.pig.impl.io.InterRecordReader.readDataOrEOF(InterRecordReader.java:158)
      	at org.apache.pig.impl.io.InterRecordReader.nextKeyValue(InterRecordReader.java:194)
      	at org.apache.pig.impl.io.InterStorage.getNext(InterStorage.java:79)
      	at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:238)
      	at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:218)
      	at org.apache.pig.backend.hadoop.executionengine.HJob$1.hasNext(HJob.java:115)
      

      This is because InterRecordReader became much stricter after PIG-3655. Before it just simply skipped these bytes thinking that they are just garbage on the split beginning. Now when we expect a proper tuple with a tuple type byte we see these nulls and throw an Exception.

      As I can see it this is happening because JoinGroupSparkConverter has to return something even when it shouldn't.
      When the POPackage operator returns a POStatus.STATUS_NULL, the converter shouldn't return a thing, but it can't do better than returning a null. This then gets written out by Spark..

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                szita Ádám Szita
                Reporter:
                szita Ádám Szita
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: