Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3046

An empty file name in -Dpig.additional.jars throws an error

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0, 0.11
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
    • Patch Info:
      Patch Available

      Description

      This issue was raised on the user mailing list. To reproduce it, please run the following command in MR mode:

      pig -Dpig.additional.jars=<jar1>::<jar2> <pig script>
      

      As can be seen, I put :: in the middle of -Dpig.additional.jars, and this causes the following error:

      Caused by: java.lang.IllegalArgumentException: Can not create a Path from
      an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
      at org.apache.hadoop.fs.Path.<init>(Path.java:90)
      at org.apache.hadoop.fs.Path.<init>(Path.java:45)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.shipToHDFS(JobControlCompiler.java:1455)
      at
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.putJarOnClassPathThroughDistributedCache(JobControlCompiler.java:1432)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:508)
      

      Although it's not too hard to see what's wrong, it's not always easy to track down where an empty file name is from. In particular if various environment variables are set in another start-up script, it's time-consuming to identify the root cause.

      In fact, Pig should just skip an empty file name instead attempts to convert it to a path and throws an exception like this.

      1. PIG-3046_1.patch
        1.0 kB
        Prashant Kommireddi
      2. PIG-3046_2.patch
        0.6 kB
        Johnny Zhang
      3. PIG-3046_3.patch
        1.0 kB
        Prashant Kommireddi
      4. PIG-3046.patch
        1.0 kB
        Prashant Kommireddi

        Activity

        Hide
        mcz Michał Czerwiński added a comment -

        Also this issue occurs whenever you specify in the -Dpig.additional.jars a directory path instead of the file path. This is quite often happening because its advised on forums to include HIVE_HOME and HADOOP_HOME in the PIG_CLASSPATH which is then passed to -Dpig.additional.jars.

        Show
        mcz Michał Czerwiński added a comment - Also this issue occurs whenever you specify in the -Dpig.additional.jars a directory path instead of the file path. This is quite often happening because its advised on forums to include HIVE_HOME and HADOOP_HOME in the PIG_CLASSPATH which is then passed to -Dpig.additional.jars.
        Hide
        prkommireddi Prashant Kommireddi added a comment -

        Should pig.additional.jars support globbing, something like

        -Dpig.additonal.jars=/home/pig/jars/*.jar
        

        Or we rather check for jars within if the path is a directory?
        I think Pig already supports globbing via REGISTER.

        I am not sure why we need HIVE_HOME or HADOOP_HOME to be shipped to HDFS, those confs are required only on the client?

        Show
        prkommireddi Prashant Kommireddi added a comment - Should pig.additional.jars support globbing, something like -Dpig.additonal.jars=/home/pig/jars/*.jar Or we rather check for jars within if the path is a directory? I think Pig already supports globbing via REGISTER. I am not sure why we need HIVE_HOME or HADOOP_HOME to be shipped to HDFS, those confs are required only on the client?
        Hide
        cheolsoo Cheolsoo Park added a comment -

        Hi Prashant,

        You're probably right that you can workaround this problem in many ways. I opened this jira because it's trivial to skip an empty string when handling pig.additional.jars in Pig, and that seems to improve usability. Let me know if you think otherwise.

        Thanks!

        Show
        cheolsoo Cheolsoo Park added a comment - Hi Prashant, You're probably right that you can workaround this problem in many ways. I opened this jira because it's trivial to skip an empty string when handling pig.additional.jars in Pig, and that seems to improve usability. Let me know if you think otherwise. Thanks!
        Hide
        prkommireddi Prashant Kommireddi added a comment -

        Hi Cheolsoo,

        How do you feel about making the quick easy fix on this ticket (I can add a patch soon). And open a new one for adding globbing support?

        Show
        prkommireddi Prashant Kommireddi added a comment - Hi Cheolsoo, How do you feel about making the quick easy fix on this ticket (I can add a patch soon). And open a new one for adding globbing support?
        Hide
        cheolsoo Cheolsoo Park added a comment -

        Sure!

        Show
        cheolsoo Cheolsoo Park added a comment - Sure!
        Hide
        prkommireddi Prashant Kommireddi added a comment -

        Patch contains a fix for empty jar path. This does not handle the case when a user specifies a directory instead of a jar file. We will handle that in a separate JIRA where all jars within a dir can be loaded.

        Show
        prkommireddi Prashant Kommireddi added a comment - Patch contains a fix for empty jar path. This does not handle the case when a user specifies a directory instead of a jar file. We will handle that in a separate JIRA where all jars within a dir can be loaded.
        Hide
        dreambird Johnny Zhang added a comment -

        Prashant Kommireddi, not insist on it, but I think it might be better it is WARN instead of INFO

        Show
        dreambird Johnny Zhang added a comment - Prashant Kommireddi , not insist on it, but I think it might be better it is WARN instead of INFO
        Hide
        prkommireddi Prashant Kommireddi added a comment -

        You are right!

        Show
        prkommireddi Prashant Kommireddi added a comment - You are right!
        Hide
        dreambird Johnny Zhang added a comment -

        also, the patch seems also impact the common REGISTER jar function. I think in registerJar(String name), which already handle this to some extend

        ......
        if (!f.canRead()) {
                            int errCode = 4002;
                            String msg = "Can't read jar file: " + name;
                            throw new FrontendException(msg, errCode, PigException.USER_ENVIRONMENT);
                          }
        ......
        

        our fix should be specifically for -Dpig.additional.jars, should we handle this before it goes to function registerJar(String name) ?

        Show
        dreambird Johnny Zhang added a comment - also, the patch seems also impact the common REGISTER jar function. I think in registerJar(String name), which already handle this to some extend ...... if (!f.canRead()) { int errCode = 4002; String msg = "Can't read jar file: " + name; throw new FrontendException(msg, errCode, PigException.USER_ENVIRONMENT); } ...... our fix should be specifically for -Dpig.additional.jars, should we handle this before it goes to function registerJar(String name) ?
        Hide
        dreambird Johnny Zhang added a comment -

        this patch handles it in function addJarsFromProperties()

        Show
        dreambird Johnny Zhang added a comment - this patch handles it in function addJarsFromProperties()
        Hide
        dreambird Johnny Zhang added a comment -

        actually I think even PIG-3046_2.patch is not proper enough. I am working on another patch, thanks.

        Show
        dreambird Johnny Zhang added a comment - actually I think even PIG-3046 _2.patch is not proper enough. I am working on another patch, thanks.
        Hide
        prkommireddi Prashant Kommireddi added a comment -

        registerJar is the common place where jars are added. It does not make a lot of sense to do it specifically for pig.additional.jars only. A call PigServer.registerJar("") is valid and making a change to addJarsFromProperties only would not handle that.

        Also, registerJar(String name) does not handle the case as you have mentioned in the snippet above. If you take a look at the following

        URL resource = locateJarFromResources(name);
        

        You will notice that "resource" in case of an empty string is not null and if(!f.canRead) loop never gets invoked.

        Show
        prkommireddi Prashant Kommireddi added a comment - registerJar is the common place where jars are added. It does not make a lot of sense to do it specifically for pig.additional.jars only. A call PigServer.registerJar("") is valid and making a change to addJarsFromProperties only would not handle that. Also, registerJar(String name) does not handle the case as you have mentioned in the snippet above. If you take a look at the following URL resource = locateJarFromResources(name); You will notice that "resource" in case of an empty string is not null and if(!f.canRead) loop never gets invoked.
        Hide
        cheolsoo Cheolsoo Park added a comment -

        +1 for Prashant's patch (PIG-3046_1.patch).

        I will commit it after running tests.

        @Johnny,
        I think that you misunderstood something. Let me know if you have any questions/concerns.

        Show
        cheolsoo Cheolsoo Park added a comment - +1 for Prashant's patch ( PIG-3046 _1.patch). I will commit it after running tests. @Johnny, I think that you misunderstood something. Let me know if you have any questions/concerns.
        Hide
        cheolsoo Cheolsoo Park added a comment -

        @Prashant,
        Can you make a very small change to your patch? Can you please move your code to after if(name != null) in registerJar(String name). Since registerJar(String name) is a public method, I think that it's possible for somebody to call it with a null. Then, a NullPointerException will be thrown.

        Also, is there any reason why you didn't use name.isEmpty()?

        Thanks!

        Show
        cheolsoo Cheolsoo Park added a comment - @Prashant, Can you make a very small change to your patch? Can you please move your code to after if(name != null) in registerJar(String name) . Since registerJar(String name) is a public method, I think that it's possible for somebody to call it with a null. Then, a NullPointerException will be thrown. Also, is there any reason why you didn't use name.isEmpty() ? Thanks!
        Hide
        prkommireddi Prashant Kommireddi added a comment -

        Thanks for the review, Cheolsoo. Added a new patch.

        Show
        prkommireddi Prashant Kommireddi added a comment - Thanks for the review, Cheolsoo. Added a new patch.
        Hide
        cheolsoo Cheolsoo Park added a comment -

        Committed to trunk. Thanks Prashant!

        Show
        cheolsoo Cheolsoo Park added a comment - Committed to trunk. Thanks Prashant!
        Hide
        AsmaData Asma Dhaouadi added a comment -

        Hi,
        I have the same problem. And i hope find your help. I'm blocked here.
        I use the Talend Open Studion for Big Data 6.3.0 with the Cloudera CDH 5.8.
        What i must do in my case?
        Thank you

        Show
        AsmaData Asma Dhaouadi added a comment - Hi, I have the same problem. And i hope find your help. I'm blocked here. I use the Talend Open Studion for Big Data 6.3.0 with the Cloudera CDH 5.8. What i must do in my case? Thank you
        Hide
        daijy Daniel Dai added a comment -

        Most probably you don't have this patch in CDH. The best option is to patch by yourself and replace relevant jars in CDH.

        Show
        daijy Daniel Dai added a comment - Most probably you don't have this patch in CDH. The best option is to patch by yourself and replace relevant jars in CDH.
        Hide
        szita Adam Szita added a comment -

        Asma Dhaouadi,
        CDH 5.8 has Pig 0.12.0 so this patch should be included.
        I also ran a small test to see this in action, and the WARN message does come up:

        -bash-4.1$ pig -Dpig.additional.jars=:test1.jar:test2.jar
        
        log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
        log4j:WARN Please initialize the log4j system properly.
        log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
        2017-01-04 00:50:32,904 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.4-SNAPSHOT (rexported) compiled Jan 03 2017, 01:36:52
        2017-01-04 00:50:32,905 [main] INFO  org.apache.pig.Main - Logging error messages to: /var/lib/hadoop-hdfs/pig_1483519832863.log
        2017-01-04 00:50:32,950 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-hdfs/.pigbootup not found
        2017-01-04 00:50:34,104 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
        2017-01-04 00:50:34,104 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
        2017-01-04 00:50:34,104 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://ns1
        2017-01-04 00:50:36,160 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
        2017-01-04 00:50:36,170 [main] WARN  org.apache.pig.PigServer - Empty string specified for jar path
        
        Show
        szita Adam Szita added a comment - Asma Dhaouadi , CDH 5.8 has Pig 0.12.0 so this patch should be included. I also ran a small test to see this in action, and the WARN message does come up: -bash-4.1$ pig -Dpig.additional.jars=:test1.jar:test2.jar log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http: //logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 2017-01-04 00:50:32,904 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.8.4-SNAPSHOT (rexported) compiled Jan 03 2017, 01:36:52 2017-01-04 00:50:32,905 [main] INFO org.apache.pig.Main - Logging error messages to: / var /lib/hadoop-hdfs/pig_1483519832863.log 2017-01-04 00:50:32,950 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file / var /lib/hadoop-hdfs/.pigbootup not found 2017-01-04 00:50:34,104 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2017-01-04 00:50:34,104 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs. default .name is deprecated. Instead, use fs.defaultFS 2017-01-04 00:50:34,104 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs: //ns1 2017-01-04 00:50:36,160 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs. default .name is deprecated. Instead, use fs.defaultFS 2017-01-04 00:50:36,170 [main] WARN org.apache.pig.PigServer - Empty string specified for jar path
        Hide
        AsmaData Asma Dhaouadi added a comment -

        Thank's for Daniel Dai and Adam Szita,
        i'll try with another example of project and i'll send my result.

        Thank for all

        Show
        AsmaData Asma Dhaouadi added a comment - Thank's for Daniel Dai and Adam Szita , i'll try with another example of project and i'll send my result. Thank for all

          People

          • Assignee:
            prkommireddi Prashant Kommireddi
            Reporter:
            cheolsoo Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development