Description
testRankWithEmptyReduce added in PIG-4392 failed in tez mode. The reason is POReservoirSample produce more sample than necessary. In particular, if the input of the vertex is empty, it produces a fake tuple which does not have the original data, but a marked field plus 0 rowNum. That cause the WeightedRangePartitioner fail:
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:115) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPigNullableWritable(WeightedRangePartitioner.java:192)
Another issue I found is GetMemNumRows, I erroneously add the size of mark tuple, which make the size estimation inaccurate. I put the fix in the same patch.
Attachments
Attachments
Issue Links
- is related to
-
PIG-4392 RANK BY fails when default_parallel is greater than cardinality of field being ranked by
- Closed