[PIG-4410] Fix testRankWithEmptyReduce in tez mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.15.0
Component/s: tez
Labels:
None

Hadoop Flags:

Reviewed

Description

testRankWithEmptyReduce added in ~~PIG-4392~~ failed in tez mode. The reason is POReservoirSample produce more sample than necessary. In particular, if the input of the vertex is empty, it produces a fake tuple which does not have the original data, but a marked field plus 0 rowNum. That cause the WeightedRangePartitioner fail:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
	at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:115)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPigNullableWritable(WeightedRangePartitioner.java:192)

Another issue I found is GetMemNumRows, I erroneously add the size of mark tuple, which make the size estimation inaccurate. I put the fix in the same patch.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-4410-1.patch
04/Feb/15 21:42
1 kB
Daniel Dai

Issue Links

is related to

PIG-4392 RANK BY fails when default_parallel is greater than cardinality of field being ranked by

Closed

Activity

People

Assignee:: Daniel Dai

Reporter:: Daniel Dai

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Feb/15 21:40

Updated:: 07/Jun/15 03:48

Resolved:: 04/Feb/15 22:15