Pig
  1. Pig
  2. PIG-3135

HExecutionEngine should look for resources in user passed Properties

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Looking at this snippet:

          private void init(Properties properties) throws ExecException {
                .
                .
                .
      
                  // Check existence of hadoop-site.xml or core-site.xml
                  Configuration testConf = new Configuration();
                  ClassLoader cl = testConf.getClassLoader();
                  URL hadoop_site = cl.getResource( HADOOP_SITE );
                  URL core_site = cl.getResource( CORE_SITE );
                 
                  if( hadoop_site == null && core_site == null ) {
                      throw new ExecException("Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath)." +
                              " If you plan to use local mode, please put -x local option in command line",
                              4010);
                  }
      

      This assumes the resources (*-site.xml) are set on the classpath, but this will not always be the case when run with Pig's Java APIs. One could want to programatically set the resources and the code here should additionally check if they are available in there.

      Example: When a Configuration object is created and resources are added before passing it on to Pig.

      Configuration conf = new Configuration(false);
      conf.addResource("foo/core-site.xml");
      conf.addResource("bar/hadoop-site.xml");
      
      PigServer pServer = new PigServer(ExecType.MAPREDUCE, conf);
      

      The above conf is not used right now to obtain resources.

      1. PIG-3135_1.patch
        6 kB
        Prashant Kommireddi
      2. PIG-3135.patch
        5 kB
        Prashant Kommireddi

        Activity

        Hide
        Prashant Kommireddi added a comment -

        Thanks Cheolsoo.

        Show
        Prashant Kommireddi added a comment - Thanks Cheolsoo.
        Hide
        Cheolsoo Park added a comment -

        My bad. I forgot to add it. It is committed now:
        http://svn.apache.org/viewvc?view=revision&revision=1449553

        Thanks!

        Show
        Cheolsoo Park added a comment - My bad. I forgot to add it. It is committed now: http://svn.apache.org/viewvc?view=revision&revision=1449553 Thanks!
        Hide
        Prashant Kommireddi added a comment -

        Hi Cheolsoo Park, I don't see the new tests "test/org/apache/pig/test/TestHExecutionEngine.java" making it in with this commit. Can you please take a look when you get a chance?

        $ svn diff --summarize -c r1449436
        M       conf/pig.properties
        M       CHANGES.txt
        M       src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
        
        Show
        Prashant Kommireddi added a comment - Hi Cheolsoo Park , I don't see the new tests "test/org/apache/pig/test/TestHExecutionEngine.java" making it in with this commit. Can you please take a look when you get a chance? $ svn diff --summarize -c r1449436 M conf/pig.properties M CHANGES.txt M src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java
        Hide
        Cheolsoo Park added a comment -

        +1 again.

        Verified that PIG-3200 fixes the test failure, and committed the patch to trunk again.

        Thanks Prashant!

        Show
        Cheolsoo Park added a comment - +1 again. Verified that PIG-3200 fixes the test failure, and committed the patch to trunk again. Thanks Prashant!
        Hide
        Cheolsoo Park added a comment -

        Prashant Kommireddi, I totally agree with you. Can we fix PIG-3200 first then?

        Show
        Cheolsoo Park added a comment - Prashant Kommireddi , I totally agree with you. Can we fix PIG-3200 first then?
        Hide
        Prashant Kommireddi added a comment -

        Hey Cheolsoo Park, I think the issue is with the fact that MiniCluster.buildCluster() generates a hadoop-site.xml file under build/classes. However, MiniCluster.shutDown() does not delete this file. Ideally, hadoop-site.xml should be deleted on cluster shutdown.

        With respect to TestHExecutionEngine, the issue is with the fact a previous test run that uses mini-cluster generates hadoop-site.xml and does not delete it. At the time TestHExecutionEngine runs this file is present on the classpath but mini-dfs and mini-mr were shutdown at the completion of previous test. pigContext.connect() in TestHExecutionEngine tries to establish a connection with mini-cluster which is no longer up.

        The fix would be to:

        1. Delete hadoop-site-xml on mini-cluster shutdown (to be done in another JIRA)
        2. Check if this patch stabilizes (and nothing else breaks)

        Does that make sense?

        Show
        Prashant Kommireddi added a comment - Hey Cheolsoo Park , I think the issue is with the fact that MiniCluster.buildCluster() generates a hadoop-site.xml file under build/classes. However, MiniCluster.shutDown() does not delete this file. Ideally, hadoop-site.xml should be deleted on cluster shutdown. With respect to TestHExecutionEngine, the issue is with the fact a previous test run that uses mini-cluster generates hadoop-site.xml and does not delete it. At the time TestHExecutionEngine runs this file is present on the classpath but mini-dfs and mini-mr were shutdown at the completion of previous test. pigContext.connect() in TestHExecutionEngine tries to establish a connection with mini-cluster which is no longer up. The fix would be to: 1. Delete hadoop-site-xml on mini-cluster shutdown (to be done in another JIRA) 2. Check if this patch stabilizes (and nothing else breaks) Does that make sense?
        Hide
        Prashant Kommireddi added a comment -

        Sure, I will take a look.

        Show
        Prashant Kommireddi added a comment - Sure, I will take a look.
        Hide
        Cheolsoo Park added a comment -

        Prashant Kommireddi, very sorry for reopening the jira, but this makes unit test fail.

        Note that ant clean test -Dtestcase=TestHExecutionEngine always passes. But when you run the test as part of unit test (i.e. ant clean test), it fails.

        To reproduce, please do the following:

        1. Run a unit test that uses mini-cluster (e.g. TestAccumulator).
          ant clean test -Dtestcase=TestAccumulator
          
        2. Run TestHExecutionEngine without ant clean.
          ant test -Dtestcase=TestHExecutionEngine
          

          This will fail with the following errors in logs:

          java.net.ConnectException: Call to localhost.localdomain/127.0.0.1:59119 failed on connection exception: java.net.ConnectException: Connection refused
          

          In fact, the same issue was discussed in PIG-2769 before. What I don't fully understand is why this happens even though you're doing what Rohini suggests.

        For the time being, I will back out the patch to stabilize the build. As soon as you update the patch, I will test it again. Does this make sense?

        Show
        Cheolsoo Park added a comment - Prashant Kommireddi , very sorry for reopening the jira, but this makes unit test fail. Note that ant clean test -Dtestcase=TestHExecutionEngine always passes. But when you run the test as part of unit test (i.e. ant clean test), it fails. To reproduce, please do the following: Run a unit test that uses mini-cluster (e.g. TestAccumulator). ant clean test -Dtestcase=TestAccumulator Run TestHExecutionEngine without ant clean . ant test -Dtestcase=TestHExecutionEngine This will fail with the following errors in logs: java.net.ConnectException: Call to localhost.localdomain/127.0.0.1:59119 failed on connection exception: java.net.ConnectException: Connection refused In fact, the same issue was discussed in PIG-2769 before. What I don't fully understand is why this happens even though you're doing what Rohini suggests . For the time being, I will back out the patch to stabilize the build. As soon as you update the patch, I will test it again. Does this make sense?
        Hide
        Prashant Kommireddi added a comment -

        Thanks for the review and commit, Cheolsoo.

        Show
        Prashant Kommireddi added a comment - Thanks for the review and commit, Cheolsoo.
        Hide
        Cheolsoo Park added a comment -

        Committed to trunk. Thanks Prashant!

        Show
        Cheolsoo Park added a comment - Committed to trunk. Thanks Prashant!
        Hide
        Cheolsoo Park added a comment -

        +1. I will commit it after running tests.

        Show
        Cheolsoo Park added a comment - +1. I will commit it after running tests.
        Hide
        Prashant Kommireddi added a comment -

        Hi Cheolsoo, find attached a revised patch as per our previous discussion.

        Show
        Prashant Kommireddi added a comment - Hi Cheolsoo, find attached a revised patch as per our previous discussion.
        Hide
        Prashant Kommireddi added a comment -

        Thanks for the quick review. Your comments make sense, I will make the above changes and upload a new patch.

        Show
        Prashant Kommireddi added a comment - Thanks for the quick review. Your comments make sense, I will make the above changes and upload a new patch.
        Hide
        Cheolsoo Park added a comment -

        Prashant Kommireddi Overall looks good to me. I have 3 questions.

        • Can we call the property "pig.use.overriden.hadoop.configs"? What we're overriding here are basically Hadoop confs, so I think that a more specific name is better. Do you agree?
        • On a related note, can you update the following comment in testJobConfGeneration?
          From:
          // This should fail as pig expects classpath to be set
          

          To something like:

          // This should fail as pig expects Hadoop configs are present in classpath.
          
        • Can you add this new property to conf/pig.properties with some explanation, so people can know about it? It would be nice if we could mention that this is a Mr-mode-specific property too. Please let me know if you have a better suggestion regarding how to document this.

        Thanks!

        Show
        Cheolsoo Park added a comment - Prashant Kommireddi Overall looks good to me. I have 3 questions. Can we call the property "pig.use.overriden.hadoop.configs"? What we're overriding here are basically Hadoop confs, so I think that a more specific name is better. Do you agree? On a related note, can you update the following comment in testJobConfGeneration? From: // This should fail as pig expects classpath to be set To something like: // This should fail as pig expects Hadoop configs are present in classpath. Can you add this new property to conf/pig.properties with some explanation, so people can know about it? It would be nice if we could mention that this is a Mr-mode-specific property too. Please let me know if you have a better suggestion regarding how to document this. Thanks!
        Hide
        Prashant Kommireddi added a comment -

        Would appreciate if someone can take a look. I have added a new Test class for HExecutionEngine as these couple tests don't fit in any other place, and it makes sense to have its own Test for it.

        Show
        Prashant Kommireddi added a comment - Would appreciate if someone can take a look. I have added a new Test class for HExecutionEngine as these couple tests don't fit in any other place, and it makes sense to have its own Test for it.
        Hide
        Prashant Kommireddi added a comment -

        To get around the current limitation of depending on *site.xml files being set on classpath, we could add a property "pig.use.override.configs" to be able to provide your own configs and have JobConf be built from that.

        Currently:

        jc = new JobConf();
        

        Proposal:

        1. If "pig.use.override.configs" is present, generate JobConf using properties

        jc = new JobConf(ConfigurationUtil.toConfiguration(properties));
        

        This change would be backward compatible, and those who wish to bypass classpath limitation can do so by setting the override property.

        Thoughts?

        Show
        Prashant Kommireddi added a comment - To get around the current limitation of depending on *site.xml files being set on classpath, we could add a property "pig.use.override.configs" to be able to provide your own configs and have JobConf be built from that. Currently: jc = new JobConf(); Proposal: 1. If "pig.use.override.configs" is present, generate JobConf using properties jc = new JobConf(ConfigurationUtil.toConfiguration(properties)); This change would be backward compatible, and those who wish to bypass classpath limitation can do so by setting the override property. Thoughts?

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Prashant Kommireddi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development