Pig
  1. Pig
  2. PIG-3046

An empty file name in -Dpig.additional.jars throws an error

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0, 0.11
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
    • Patch Info:
      Patch Available

      Description

      This issue was raised on the user mailing list. To reproduce it, please run the following command in MR mode:

      pig -Dpig.additional.jars=<jar1>::<jar2> <pig script>
      

      As can be seen, I put :: in the middle of -Dpig.additional.jars, and this causes the following error:

      Caused by: java.lang.IllegalArgumentException: Can not create a Path from
      an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
      at org.apache.hadoop.fs.Path.<init>(Path.java:90)
      at org.apache.hadoop.fs.Path.<init>(Path.java:45)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.shipToHDFS(JobControlCompiler.java:1455)
      at
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.putJarOnClassPathThroughDistributedCache(JobControlCompiler.java:1432)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:508)
      

      Although it's not too hard to see what's wrong, it's not always easy to track down where an empty file name is from. In particular if various environment variables are set in another start-up script, it's time-consuming to identify the root cause.

      In fact, Pig should just skip an empty file name instead attempts to convert it to a path and throws an exception like this.

      1. PIG-3046.patch
        1.0 kB
        Prashant Kommireddi
      2. PIG-3046_1.patch
        1.0 kB
        Prashant Kommireddi
      3. PIG-3046_2.patch
        0.6 kB
        Johnny Zhang
      4. PIG-3046_3.patch
        1.0 kB
        Prashant Kommireddi

        Activity

        Hide
        Michael Czerwiński added a comment -

        Also this issue occurs whenever you specify in the -Dpig.additional.jars a directory path instead of the file path. This is quite often happening because its advised on forums to include HIVE_HOME and HADOOP_HOME in the PIG_CLASSPATH which is then passed to -Dpig.additional.jars.

        Show
        Michael Czerwiński added a comment - Also this issue occurs whenever you specify in the -Dpig.additional.jars a directory path instead of the file path. This is quite often happening because its advised on forums to include HIVE_HOME and HADOOP_HOME in the PIG_CLASSPATH which is then passed to -Dpig.additional.jars.
        Hide
        Prashant Kommireddi added a comment -

        Should pig.additional.jars support globbing, something like

        -Dpig.additonal.jars=/home/pig/jars/*.jar
        

        Or we rather check for jars within if the path is a directory?
        I think Pig already supports globbing via REGISTER.

        I am not sure why we need HIVE_HOME or HADOOP_HOME to be shipped to HDFS, those confs are required only on the client?

        Show
        Prashant Kommireddi added a comment - Should pig.additional.jars support globbing, something like -Dpig.additonal.jars=/home/pig/jars/*.jar Or we rather check for jars within if the path is a directory? I think Pig already supports globbing via REGISTER. I am not sure why we need HIVE_HOME or HADOOP_HOME to be shipped to HDFS, those confs are required only on the client?
        Hide
        Cheolsoo Park added a comment -

        Hi Prashant,

        You're probably right that you can workaround this problem in many ways. I opened this jira because it's trivial to skip an empty string when handling pig.additional.jars in Pig, and that seems to improve usability. Let me know if you think otherwise.

        Thanks!

        Show
        Cheolsoo Park added a comment - Hi Prashant, You're probably right that you can workaround this problem in many ways. I opened this jira because it's trivial to skip an empty string when handling pig.additional.jars in Pig, and that seems to improve usability. Let me know if you think otherwise. Thanks!
        Hide
        Prashant Kommireddi added a comment -

        Hi Cheolsoo,

        How do you feel about making the quick easy fix on this ticket (I can add a patch soon). And open a new one for adding globbing support?

        Show
        Prashant Kommireddi added a comment - Hi Cheolsoo, How do you feel about making the quick easy fix on this ticket (I can add a patch soon). And open a new one for adding globbing support?
        Hide
        Cheolsoo Park added a comment -

        Sure!

        Show
        Cheolsoo Park added a comment - Sure!
        Hide
        Prashant Kommireddi added a comment -

        Patch contains a fix for empty jar path. This does not handle the case when a user specifies a directory instead of a jar file. We will handle that in a separate JIRA where all jars within a dir can be loaded.

        Show
        Prashant Kommireddi added a comment - Patch contains a fix for empty jar path. This does not handle the case when a user specifies a directory instead of a jar file. We will handle that in a separate JIRA where all jars within a dir can be loaded.
        Hide
        Johnny Zhang added a comment -

        Prashant Kommireddi, not insist on it, but I think it might be better it is WARN instead of INFO

        Show
        Johnny Zhang added a comment - Prashant Kommireddi , not insist on it, but I think it might be better it is WARN instead of INFO
        Hide
        Prashant Kommireddi added a comment -

        You are right!

        Show
        Prashant Kommireddi added a comment - You are right!
        Hide
        Johnny Zhang added a comment -

        also, the patch seems also impact the common REGISTER jar function. I think in registerJar(String name), which already handle this to some extend

        ......
        if (!f.canRead()) {
                            int errCode = 4002;
                            String msg = "Can't read jar file: " + name;
                            throw new FrontendException(msg, errCode, PigException.USER_ENVIRONMENT);
                          }
        ......
        

        our fix should be specifically for -Dpig.additional.jars, should we handle this before it goes to function registerJar(String name) ?

        Show
        Johnny Zhang added a comment - also, the patch seems also impact the common REGISTER jar function. I think in registerJar(String name), which already handle this to some extend ...... if (!f.canRead()) { int errCode = 4002; String msg = "Can't read jar file: " + name; throw new FrontendException(msg, errCode, PigException.USER_ENVIRONMENT); } ...... our fix should be specifically for -Dpig.additional.jars, should we handle this before it goes to function registerJar(String name) ?
        Hide
        Johnny Zhang added a comment -

        this patch handles it in function addJarsFromProperties()

        Show
        Johnny Zhang added a comment - this patch handles it in function addJarsFromProperties()
        Hide
        Johnny Zhang added a comment -

        actually I think even PIG-3046_2.patch is not proper enough. I am working on another patch, thanks.

        Show
        Johnny Zhang added a comment - actually I think even PIG-3046 _2.patch is not proper enough. I am working on another patch, thanks.
        Hide
        Prashant Kommireddi added a comment -

        registerJar is the common place where jars are added. It does not make a lot of sense to do it specifically for pig.additional.jars only. A call PigServer.registerJar("") is valid and making a change to addJarsFromProperties only would not handle that.

        Also, registerJar(String name) does not handle the case as you have mentioned in the snippet above. If you take a look at the following

        URL resource = locateJarFromResources(name);
        

        You will notice that "resource" in case of an empty string is not null and if(!f.canRead) loop never gets invoked.

        Show
        Prashant Kommireddi added a comment - registerJar is the common place where jars are added. It does not make a lot of sense to do it specifically for pig.additional.jars only. A call PigServer.registerJar("") is valid and making a change to addJarsFromProperties only would not handle that. Also, registerJar(String name) does not handle the case as you have mentioned in the snippet above. If you take a look at the following URL resource = locateJarFromResources(name); You will notice that "resource" in case of an empty string is not null and if(!f.canRead) loop never gets invoked.
        Hide
        Cheolsoo Park added a comment -

        +1 for Prashant's patch (PIG-3046_1.patch).

        I will commit it after running tests.

        @Johnny,
        I think that you misunderstood something. Let me know if you have any questions/concerns.

        Show
        Cheolsoo Park added a comment - +1 for Prashant's patch ( PIG-3046 _1.patch). I will commit it after running tests. @Johnny, I think that you misunderstood something. Let me know if you have any questions/concerns.
        Hide
        Cheolsoo Park added a comment -

        @Prashant,
        Can you make a very small change to your patch? Can you please move your code to after if(name != null) in registerJar(String name). Since registerJar(String name) is a public method, I think that it's possible for somebody to call it with a null. Then, a NullPointerException will be thrown.

        Also, is there any reason why you didn't use name.isEmpty()?

        Thanks!

        Show
        Cheolsoo Park added a comment - @Prashant, Can you make a very small change to your patch? Can you please move your code to after if(name != null) in registerJar(String name) . Since registerJar(String name) is a public method, I think that it's possible for somebody to call it with a null. Then, a NullPointerException will be thrown. Also, is there any reason why you didn't use name.isEmpty() ? Thanks!
        Hide
        Prashant Kommireddi added a comment -

        Thanks for the review, Cheolsoo. Added a new patch.

        Show
        Prashant Kommireddi added a comment - Thanks for the review, Cheolsoo. Added a new patch.
        Hide
        Cheolsoo Park added a comment -

        Committed to trunk. Thanks Prashant!

        Show
        Cheolsoo Park added a comment - Committed to trunk. Thanks Prashant!

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development