Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
In MapRecordProcessor::getFinalOp() due to external cause(not known), the TezDummyStoreOperator may have MergeJoin Op as child intermittently. Due to this, the fetchDone remains set to true for the DummyOp which was set by previous task. Ideally, fetchDone should be reset for each task. This eventually leads to the join op skip rows from that dummy op resulting in wrong results.
Good init order
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = TS[3] (core)
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = FIL[24]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = SEL[5]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating children of dummy op DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp returns DUMMY_STORE[45]
2018-11-01 21:42:33,677 [INFO] [TezChild] |tez.MapRecordProcessor|: InitProcessor : setting fetchDone to false
Bad init order
2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = TS[3] (core) 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = FIL[24] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = SEL[5] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = DUMMY_STORE[45] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Iterating children of dummy op DUMMY_STORE[45] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: Child of Dummy Op MERGEJOIN[44] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = MERGEJOIN[44] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = SEL[13] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp child Ops = RS[14] 2018-11-01 21:42:33,304 [INFO] [TezChild] |tez.MapRecordProcessor|: getFinalOp returns RS[14]