Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
0.11
-
None
-
None
-
subsequent stack analyzing using jstack shown that code is stuck in one of udf classes.
Description
Hi, I've met strange problem. Maybe it's related to data. BUt I'm not sure. I'm working with derivative in avro format so all "bad data" should be caught on early stages.
My pig script worked to 2 days each hour (invoked using oozie coordinator).
Now it stucks. It always have one reducer which shows progess = 67.55%
I see in TT log that it does merge, sort, then starts reduce.
I do use custom UDF in my pig script.
I've added counters trying to debug the situation.My UDF works with bags.
Counter says that UDF worked fine because "Reduce input groups" = "invocation times of UDF".
I even see counters of output:
Map-Reduce Framework Combine input records 0 Combine output records 0 Reduce input groups 31 019 Reduce shuffle bytes 65 071 957 Reduce input records 96 727 Reduce output records 0 Spilled Records 0 CPU time spent (ms) 48 870 Physical memory (bytes) snapshot 643 358 720 Virtual memory (bytes) snapshot 3 821 920 256 Total committed heap usage (bytes) 1 057 357 824 MarkEndPointsForCurrentHour callTimes 31 019 totalExecutionTime 13 690 MultiStoreCounters Output records in _0_08 26 862 Output records in _1_09 7 383
Counters say that all data passed through my UDF and even some output has been written.
But reducer (always only 1 of 54 total reducers) stucks for 1 hour an then killed by JT because of timeout. All other 53 reducers finished in 7 minutes.
How can I debug MultiStore?