Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
This bug is similar to PIG-4842.
Scenario:
cat input.txt 1 1 2
Pig script:
REGISTER myudfs.jar; A = LOAD 'input.txt' USING myudfs.DummyCollectableLoader() AS (id); B = GROUP A by $0 USING 'collected'; -- (1, {(1),(1)}), (2,{(2)}) C = STREAM B THROUGH ` awk '{ print $0; }'`; DUMP C;
Expected Result:
(1,{(1),(1)}) (2,{(2)})
Actual Result:
(1,{(1),(1)})
The last record is missing...
Root Cause:
When the flag endOfAllInput was set as true by the predecessor, the predecessor buffers the last record which is the input of Stream. Then POStream find endOfAllInput is true, in fact, the last input is not consumed yet.
Attachments
Attachments
Issue Links
- Is contained by
-
PIG-4876 OutputConsumeIterator can't handle the last buffered tuples for some Operators
- Closed