Pig
  1. Pig
  2. PIG-3266

Pig takes forever to parse scripts with foreach + multi level binconds

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.10.0, 0.11
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Following pig script parsing takes

      • 1 second in pig-0.8
      • 90 seconds in pig-0.9
      • forever in pig-0.10 (it's taking literally hours)
      A = load 'input.txt' as (mynum:float, mychar:chararray);
      B = foreach A generate mychar,
      (mynum < 0 ? 0 :
      (mynum < 1 ? 1 :
      (mynum < 2 ? 2 :
      (mynum < 3 ? 3 :
      (mynum < 4 ? 4 :
      (mynum < 5 ? 5 :
      (mynum < 6 ? 6 :
      (mynum < 7 ? 7 :
      (mynum < 8 ? 8 :
      (mynum < 9 ? 9 :
      (mynum < 10 ? 10 :
      (mynum < 11 ? 11 :
      (mynum < 12 ? 12 :
      (mynum < 13 ? 13 :
      (mynum < 14 ? 14 :
      (mynum < 15 ? 15 :
      (mynum < 16 ? 16 :
      (mynum < 17 ? 17 :
      (mynum < 18 ? 18 :
      (mynum < 19 ? 19 :
      (mynum < 20 ? 20 : 21)))))))))))))))))))));
      dump A;
      

        Issue Links

          Activity

          Hide
          Koji Noguchi added a comment -

          If I revert the change from PIG:1387, parsing speed comes back to 90 seconds (pig-0.9 level)

          src/org/apache/pig/parser/QueryParser.g
          -projectable_expr: func_eval | col_ref | bin_expr | type_conversion
          +projectable_expr: func_eval | col_ref | bin_expr
          

          I don't know anything about antlr, but I guess it cannot tell whether the given tokens are bin_expr or type_conversion when starting with '(' so spending extra cycles to check both.

          Show
          Koji Noguchi added a comment - If I revert the change from PIG:1387, parsing speed comes back to 90 seconds (pig-0.9 level) src/org/apache/pig/parser/QueryParser.g -projectable_expr: func_eval | col_ref | bin_expr | type_conversion +projectable_expr: func_eval | col_ref | bin_expr I don't know anything about antlr, but I guess it cannot tell whether the given tokens are bin_expr or type_conversion when starting with '(' so spending extra cycles to check both.
          Hide
          Xuefu Zhang added a comment -

          Does it finish in the end, or never? With the change in PIG-1387 and your test case, I speculate that it shouldn't take that long unless there is something wrong with the grammar or antlr. I said this w/o looking the definition of type_conversion, which could be inaccurate.

          Show
          Xuefu Zhang added a comment - Does it finish in the end, or never? With the change in PIG-1387 and your test case, I speculate that it shouldn't take that long unless there is something wrong with the grammar or antlr. I said this w/o looking the definition of type_conversion, which could be inaccurate.
          Hide
          Koji Noguchi added a comment -

          Does it finish in the end, or never?

          I would guess it'll finish but I don't know. It has been running for 4 hours now.

          Show
          Koji Noguchi added a comment - Does it finish in the end, or never? I would guess it'll finish but I don't know. It has been running for 4 hours now.
          Hide
          Koji Noguchi added a comment -

          > > Does it finish in the end, or never?
          > I would guess it'll finish but I don't know. It has been running for 4 hours now.
          >
          I had to kill it after 28 hours of never-ending parsing...

          Show
          Koji Noguchi added a comment - > > Does it finish in the end, or never? > I would guess it'll finish but I don't know. It has been running for 4 hours now. > I had to kill it after 28 hours of never-ending parsing...
          Hide
          Xuefu Zhang added a comment -

          Hi Koji,

          I assume there is an infinite loop. Next time could you do a jstack before killing pig process and attach it here for the record?

          Thanks,
          Xuefu

          Show
          Xuefu Zhang added a comment - Hi Koji, I assume there is an infinite loop. Next time could you do a jstack before killing pig process and attach it here for the record? Thanks, Xuefu
          Hide
          Koji Noguchi added a comment -

          I assume there is an infinite loop. Next time could you do a jstack before killing pig process and attach it here for the record?

          A bit confused. I can certainly do that, but are you saying you cannot reproduce this issue on your side using my test script? If so, I need to look at my test environment more carefully.

          Show
          Koji Noguchi added a comment - I assume there is an infinite loop. Next time could you do a jstack before killing pig process and attach it here for the record? A bit confused. I can certainly do that, but are you saying you cannot reproduce this issue on your side using my test script? If so, I need to look at my test environment more carefully.
          Hide
          Xuefu Zhang added a comment -

          Sorry, I didn't mean it's not reproducible. I don't have a setup currently, and like to make sense of things from what is currently available.

          Show
          Xuefu Zhang added a comment - Sorry, I didn't mean it's not reproducible. I don't have a setup currently, and like to make sense of things from what is currently available.
          Hide
          Prashant Kommireddi added a comment -

          Koji Noguchi, I think this was fixed. I don't see the issue on trunk.

          Show
          Prashant Kommireddi added a comment - Koji Noguchi , I think this was fixed. I don't see the issue on trunk.
          Hide
          Koji Noguchi added a comment -

          Koji Noguchi, I think this was fixed. I don't see the issue on trunk.

          Just realize that. Thanks! Can you show me which jira fixed this?
          I should have tested with trunk before creating this jira. I think I even tried with pig-0.11 to confirm the problem.

          Show
          Koji Noguchi added a comment - Koji Noguchi, I think this was fixed. I don't see the issue on trunk. Just realize that. Thanks! Can you show me which jira fixed this? I should have tested with trunk before creating this jira. I think I even tried with pig-0.11 to confirm the problem.

            People

            • Assignee:
              Unassigned
              Reporter:
              Koji Noguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development