Pig
  1. Pig
  2. PIG-2769

a simple logic causes very long compiling time on pig 0.10.0

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0, 0.11.2
    • Component/s: build
    • Labels:
      None
    • Environment:

      Apache Pig version 0.10.0-SNAPSHOT (rexported)

      Description

      We found the following simple logic will cause very long compiling time for pig 0.10.0, while using pig 0.8.1, everything is fine.

      A = load 'A.txt' using PigStorage() AS (m: int);

      B = FOREACH A

      { days_str = (chararray) (m == 1 ? 31: (m == 2 ? 28: (m == 3 ? 31: (m == 4 ? 30: (m == 5 ? 31: (m == 6 ? 30: (m == 7 ? 31: (m == 8 ? 31: (m == 9 ? 30: (m == 10 ? 31: (m == 11 ? 30:31))))))))))); GENERATE days_str as days_str; }


      store B into 'B';

      and here's a simple input file example: A.txt
      1
      2
      3

      The pig version we used in the test
      Apache Pig version 0.10.0-SNAPSHOT (rexported)

      1. case1.tar
        10 kB
        Dan Li
      2. PIG-2769.0.patch
        55 kB
        Nick White
      3. PIG-2769.1.patch
        57 kB
        Nick White
      4. PIG-2769.2.patch
        57 kB
        Alan Gates
      5. TEST-org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TestInputSizeReducerEstimator.txt
        8 kB
        Alan Gates

        Issue Links

          Activity

          Hide
          Dmitriy V. Ryaboy added a comment -

          in 0.11 branch now.

          Show
          Dmitriy V. Ryaboy added a comment - in 0.11 branch now.
          Hide
          Koji Noguchi added a comment -

          We should put this into 0.11 branch, maybe there will be an 0.11.2 before 12 comes out.

          If we can fix this in 0.11, that would be really nice.
          On our clusters, there were multiple users hit with this issue on 0.10.

          Show
          Koji Noguchi added a comment - We should put this into 0.11 branch, maybe there will be an 0.11.2 before 12 comes out. If we can fix this in 0.11, that would be really nice. On our clusters, there were multiple users hit with this issue on 0.10.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Didn't see earlier that this only went into trunk (thanks Koji Noguchi for pointing this out!).
          We should put this into 0.11 branch, maybe there will be an 0.11.2 before 12 comes out.

          Show
          Dmitriy V. Ryaboy added a comment - Didn't see earlier that this only went into trunk (thanks Koji Noguchi for pointing this out!). We should put this into 0.11 branch, maybe there will be an 0.11.2 before 12 comes out.
          Hide
          Dmitriy V. Ryaboy added a comment -

          this is great, thank you!

          Show
          Dmitriy V. Ryaboy added a comment - this is great, thank you!
          Hide
          Alan Gates added a comment -

          Patch checked in. Thanks Nick.

          Show
          Alan Gates added a comment - Patch checked in. Thanks Nick.
          Hide
          Alan Gates added a comment -

          Two small changes from the last patch. I fixed one issue in negative.conf where there was a "" instead of ". Also changed TestInputSizeReducerEstimator as suggested by Rohini which fixed the unit test issue (thanks Rohini).

          Show
          Alan Gates added a comment - Two small changes from the last patch. I fixed one issue in negative.conf where there was a "" instead of ". Also changed TestInputSizeReducerEstimator as suggested by Rohini which fixed the unit test issue (thanks Rohini).
          Hide
          Nick White added a comment -

          Thanks! I've attached a version of the patch which fixes the e2e tests you mentioned.

          Show
          Nick White added a comment - Thanks! I've attached a version of the patch which fixes the e2e tests you mentioned.
          Hide
          Rohini Palaniswamy added a comment -

          When I do a clean first as Cheolsoo advises it works, though I don't fully understand that since I started out with a clean checkout.

          Order of tests run must have been the cause. It might have passed after doing a clean because you might have just run TestInputSizeReducerEstimator without running other tests. TestInputSizeReducerEstimator needs to be fixed to do new Configuration(false); instead of new Configuration(); which makes it pick up hadoop-site.xml from a previously run test.

          Show
          Rohini Palaniswamy added a comment - When I do a clean first as Cheolsoo advises it works, though I don't fully understand that since I started out with a clean checkout. Order of tests run must have been the cause. It might have passed after doing a clean because you might have just run TestInputSizeReducerEstimator without running other tests. TestInputSizeReducerEstimator needs to be fixed to do new Configuration(false); instead of new Configuration(); which makes it pick up hadoop-site.xml from a previously run test.
          Hide
          Alan Gates added a comment -

          When I do a clean first as Cheolsoo advises it works, though I don't fully understand that since I started out with a clean checkout.

          In the system tests NegForeach_7, NegForeach_9, SyntaxErrors_4, Macro_Error_4 all fail because the error messages have changed. You can find these in test/e2e/pig/tests/negative.conf and macro.conf. Search on each of the group names (NegForeach, ...) and then find the test number under that. In each case you can run the query and change the expected error message to match the new one.

          Other than that, +1, patch looks good.

          Show
          Alan Gates added a comment - When I do a clean first as Cheolsoo advises it works, though I don't fully understand that since I started out with a clean checkout. In the system tests NegForeach_7, NegForeach_9, SyntaxErrors_4, Macro_Error_4 all fail because the error messages have changed. You can find these in test/e2e/pig/tests/negative.conf and macro.conf. Search on each of the group names (NegForeach, ...) and then find the test number under that. In each case you can run the query and change the expected error message to match the new one. Other than that, +1, patch looks good.
          Hide
          Jonathan Coveney added a comment -

          Nick, this its awesome. Well done, and thanks for tackling it. It's some cleanup pig sorely needed.

          Show
          Jonathan Coveney added a comment - Nick, this its awesome. Well done, and thanks for tackling it. It's some cleanup pig sorely needed.
          Hide
          Cheolsoo Park added a comment -

          Hi Alan,

          The test failure that you're seeing may be because of a stale hadoop-site.xml in build/classes from a previous test. Any test that uses MiniCluster generates hadoop-site.xml in build/classes, and since build/classes is in the classpath of unit test, a stale hadoop-site.xml will be picked up by TestInputSizeReducerEstimator if exists.

          I am able to reproduce the exactly same error by adding a hadoop-site.xml to build/classes as follows:

          hadoop-site.xml
          <configuration>
            <property>
              <name>fs.default.name</name>
              <value>hdfs://localhost:45399</value>
            </property>
          </configuration>
          

          With this hadoop-site.xml in classpath, the test tries to connect to a NN on port 45399, which it shouldn't.

          Can you verify that running ant clean test instead of ant test lets the error go away? When I do ant clean test, the test always passes for me.

          Show
          Cheolsoo Park added a comment - Hi Alan, The test failure that you're seeing may be because of a stale hadoop-site.xml in build/classes from a previous test. Any test that uses MiniCluster generates hadoop-site.xml in build/classes , and since build/classes is in the classpath of unit test, a stale hadoop-site.xml will be picked up by TestInputSizeReducerEstimator if exists. I am able to reproduce the exactly same error by adding a hadoop-site.xml to build/classes as follows: hadoop-site.xml <configuration> <property> <name>fs. default .name</name> <value>hdfs: //localhost:45399</value> </property> </configuration> With this hadoop-site.xml in classpath, the test tries to connect to a NN on port 45399, which it shouldn't. Can you verify that running ant clean test instead of ant test lets the error go away? When I do ant clean test , the test always passes for me.
          Hide
          Alan Gates added a comment -

          Log file of failed test. Not intended as a patch.

          Show
          Alan Gates added a comment - Log file of failed test. Not intended as a patch.
          Hide
          Alan Gates added a comment -

          That's the only failure I see in the unit tests that's new after I apply your patch. It fails consistently for me with your patch and passes consistently without it. I'll attach the logs from the test.

          Show
          Alan Gates added a comment - That's the only failure I see in the unit tests that's new after I apply your patch. It fails consistently for me with your patch and passes consistently without it. I'll attach the logs from the test.
          Hide
          Nick White added a comment -

          Were there any others? What do the test logs say? It passes for me locally, so I'd appreciate some help identifying the failure. Thanks -

          Show
          Nick White added a comment - Were there any others? What do the test logs say? It passes for me locally, so I'd appreciate some help identifying the failure. Thanks -
          Hide
          Alan Gates added a comment -

          I see failures in TestInputSizeReducerEstimator with this patch applied. There may be others, as the test run hasn't finished yet.

          Show
          Alan Gates added a comment - I see failures in TestInputSizeReducerEstimator with this patch applied. There may be others, as the test run hasn't finished yet.
          Hide
          Alan Gates added a comment -

          I'll take a look at this and start running the tests.

          Show
          Alan Gates added a comment - I'll take a look at this and start running the tests.
          Hide
          Nick White added a comment -

          I've attached a patch that refactors the QueryParser's grammar so that it doesn't need global backtracking (in fact, it only has backtracking in one rule as empty map casts look very much like map literals in brackets). I've added a lot of comments, so hope it's clear what's going on, and have checked all the unit tests pass. I've also added a lot more unit tests, including the query on this ticket (which now parses in about 0.03s), so I'm pretty sure it's just a drop-in replacement.

          Show
          Nick White added a comment - I've attached a patch that refactors the QueryParser's grammar so that it doesn't need global backtracking (in fact, it only has backtracking in one rule as empty map casts look very much like map literals in brackets). I've added a lot of comments, so hope it's clear what's going on, and have checked all the unit tests pass. I've also added a lot more unit tests, including the query on this ticket (which now parses in about 0.03s), so I'm pretty sure it's just a drop-in replacement.
          Hide
          Timothy Chen added a comment -

          I decided to leave this issue open to grab for now since this issue is mostly in the ANTLR generated code and I am still learning about ANTLR.
          It does seems that it's recursively calling foreach_complex_statement pretty excessively, but I don't have enough knowledge to deduce the grammer so it still works.
          I did verify that the AST generated in 0.9.2 and 0.10-SNAPSHOT generated is exactly the same if that helps.

          Show
          Timothy Chen added a comment - I decided to leave this issue open to grab for now since this issue is mostly in the ANTLR generated code and I am still learning about ANTLR. It does seems that it's recursively calling foreach_complex_statement pretty excessively, but I don't have enough knowledge to deduce the grammer so it still works. I did verify that the AST generated in 0.9.2 and 0.10-SNAPSHOT generated is exactly the same if that helps.
          Hide
          Thejas M Nair added a comment -

          I have assigned it to you. I have also added you to contributors list, now you should be able to assign jiras to yourself.

          Show
          Thejas M Nair added a comment - I have assigned it to you. I have also added you to contributors list, now you should be able to assign jiras to yourself.
          Hide
          Timothy Chen added a comment -

          I'd like to look at this for my first bug fix for pig, as I've already glanced through the parsing and query plan code. Can't find a way to assign this to myself via Jira though, probably need someone else to do so?

          Show
          Timothy Chen added a comment - I'd like to look at this for my first bug fix for pig, as I've already glanced through the parsing and query plan code. Can't find a way to assign this to myself via Jira though, probably need someone else to do so?
          Hide
          Dan Li added a comment -

          Thanks for the clarification, and looking forward to the fix.

          Dan

          Show
          Dan Li added a comment - Thanks for the clarification, and looking forward to the fix. Dan
          Hide
          Daniel Dai added a comment -

          No, it is not fixed. 5 min to compile a plan is not acceptable. I am not saying a particular version works, just to post what I observe. Sorry for the confusion. We need to fix it.

          Show
          Daniel Dai added a comment - No, it is not fixed. 5 min to compile a plan is not acceptable. I am not saying a particular version works, just to post what I observe. Sorry for the confusion. We need to fix it.
          Hide
          Dan Li added a comment -

          Hi, Daniel,

          I'm little bit confused. is this problem fixed or not?

          Thanks.
          Dan

          Show
          Dan Li added a comment - Hi, Daniel, I'm little bit confused. is this problem fixed or not? Thanks. Dan
          Hide
          Dan Li added a comment -

          From your comment on 27/Jun/12 18:43, is the number [ 12316246 ] the revision number?

          Show
          Dan Li added a comment - From your comment on 27/Jun/12 18:43, is the number [ 12316246 ] the revision number?
          Hide
          Daniel Dai added a comment -

          What is revision 12316246? Where did you get this?

          Show
          Daniel Dai added a comment - What is revision 12316246? Where did you get this?
          Hide
          Dan Li added a comment -

          Hi, Daniel,

          I tried to check out the revision 0.10.0 [ 12316246 ] , but got "No such revision 12316246" error. The command I used is following

          svn co -r 12316246 http://svn.apache.org/repos/asf/pig/branches/branch-0.10/

          Thanks.
          Dan

          Show
          Dan Li added a comment - Hi, Daniel, I tried to check out the revision 0.10.0 [ 12316246 ] , but got "No such revision 12316246" error. The command I used is following svn co -r 12316246 http://svn.apache.org/repos/asf/pig/branches/branch-0.10/ Thanks. Dan
          Hide
          Daniel Dai added a comment -

          It takes me 5 min to compile the logical plan.

          Show
          Daniel Dai added a comment - It takes me 5 min to compile the logical plan.
          Hide
          Jonathan Coveney added a comment -

          My guess is that this has to do with recursive backtracking.

          from QueryParser.g

          foreach_statement : ( ( alias EQUAL )?  FOREACH rel LEFT_CURLY ) => foreach_complex_statement
                            | foreach_simple_statement
          ;
          

          The fact that it works fine up until some n and then dies is typical of when these things die. It could be some interaction with another recursive backtracking rule that is blowing up.

          The ideal solution is to get rid of recursive backtracking in the parser...

          Show
          Jonathan Coveney added a comment - My guess is that this has to do with recursive backtracking. from QueryParser.g foreach_statement : ( ( alias EQUAL )? FOREACH rel LEFT_CURLY ) => foreach_complex_statement | foreach_simple_statement ; The fact that it works fine up until some n and then dies is typical of when these things die. It could be some interaction with another recursive backtracking rule that is blowing up. The ideal solution is to get rid of recursive backtracking in the parser...
          Hide
          Dan Li added a comment -

          It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the degradation with Pig 0.10.0.

          The degradation in performance seems to have a knee as 4 or 5 conditionals works as expected but as presented, the script takes about 6 minutes at the GRUNT> prompt after hitting enter; before any Hadoop execution.

          -Clay

          Show
          Dan Li added a comment - It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the degradation with Pig 0.10.0. The degradation in performance seems to have a knee as 4 or 5 conditionals works as expected but as presented, the script takes about 6 minutes at the GRUNT> prompt after hitting enter; before any Hadoop execution. -Clay
          Hide
          Dan Li added a comment -

          example code and data file

          Show
          Dan Li added a comment - example code and data file

            People

            • Assignee:
              Nick White
              Reporter:
              Dan Li
            • Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development