Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.10.0
-
None
-
None
-
None
-
CDH4.2, using MR1
Description
- My deserializer is expecting to receive one of 2 different subclasses of Writable, but in certain circumstances it receives an empty instance of org.apache.hadoop.io.Text. This only happens for task attempts where I observe the file called "emptyFile" in the list of input splits.
I'm doing queries over an external year/month/day partitioned table that have eagerly created partitions for, so as of today for example, I may do a query where year = 2013 and month = 3 which includes empty partitions.
In the course of investigation I downloaded the sequence files to confirm they were ok. Once I realized that processing of empty partitions was to blame, I am able to work around the issue by bounding my queries to populated partitions.
Can the need for the emptyFile be eliminated in the case where there's already a bunch of splits being processed? Failing that, can the mapper detect the current input is from emptyFile and not call the deserializer.