[PIG-5277] Spark mode is writing nulls among tuples to the output - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: spark
Labels:
None

Description

After committing ~~PIG-3655~~ a couple of Spark mode tests (e.g. org.apache.pig.test.TestEvalPipeline.testCogroupAfterDistinct) started failing on:

java.lang.Error: java.io.IOException: Corrupt data file, expected tuple type byte, but seen 27
	at org.apache.pig.backend.hadoop.executionengine.HJob$1.hasNext(HJob.java:122)
	at org.apache.pig.test.TestEvalPipeline.testCogroupAfterDistinct(TestEvalPipeline.java:1052)
Caused by: java.io.IOException: Corrupt data file, expected tuple type byte, but seen 27
	at org.apache.pig.impl.io.InterRecordReader.readDataOrEOF(InterRecordReader.java:158)
	at org.apache.pig.impl.io.InterRecordReader.nextKeyValue(InterRecordReader.java:194)
	at org.apache.pig.impl.io.InterStorage.getNext(InterStorage.java:79)
	at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:238)
	at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:218)
	at org.apache.pig.backend.hadoop.executionengine.HJob$1.hasNext(HJob.java:115)

This is because InterRecordReader became much stricter after ~~PIG-3655~~. Before it just simply skipped these bytes thinking that they are just garbage on the split beginning. Now when we expect a proper tuple with a tuple type byte we see these nulls and throw an Exception.

As I can see it this is happening because JoinGroupSparkConverter has to return something even when it shouldn't.
When the POPackage operator returns a POStatus.STATUS_NULL, the converter shouldn't return a thing, but it can't do better than returning a null. This then gets written out by Spark..

Attachments

Issue Links

relates to

PIG-3655 BinStorage and InterStorage approach to record markers is broken

Resolved

Activity

People

Assignee:: Ádám Szita

Reporter:: Ádám Szita

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 31/Jul/17 11:28

Updated:: 13/Aug/17 16:20