Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3416

Looks like MultiStore stucks



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 0.11
    • None
    • data
    • None
    • subsequent stack analyzing using jstack shown that code is stuck in one of udf classes.


      Hi, I've met strange problem. Maybe it's related to data. BUt I'm not sure. I'm working with derivative in avro format so all "bad data" should be caught on early stages.
      My pig script worked to 2 days each hour (invoked using oozie coordinator).
      Now it stucks. It always have one reducer which shows progess = 67.55%
      I see in TT log that it does merge, sort, then starts reduce.

      I do use custom UDF in my pig script.
      I've added counters trying to debug the situation.My UDF works with bags.
      Counter says that UDF worked fine because "Reduce input groups" = "invocation times of UDF".

      I even see counters of output:

      Map-Reduce Framework
      Combine input records	0
      Combine output records	0
      Reduce input groups	31 019
      Reduce shuffle bytes	65 071 957
      Reduce input records	96 727
      Reduce output records	0
      Spilled Records	0
      CPU time spent (ms)	48 870
      Physical memory (bytes) snapshot	643 358 720
      Virtual memory (bytes) snapshot	3 821 920 256
      Total committed heap usage (bytes)	1 057 357 824
      callTimes	31 019
      totalExecutionTime	13 690
      Output records in _0_08	26 862
      Output records in _1_09	7 383

      Counters say that all data passed through my UDF and even some output has been written.
      But reducer (always only 1 of 54 total reducers) stucks for 1 hour an then killed by JT because of timeout. All other 53 reducers finished in 7 minutes.

      How can I debug MultiStore?




            Unassigned Unassigned
            serega_sheypak Sergey
            0 Vote for this issue
            1 Start watching this issue