Hive
  1. Hive
  2. HIVE-2372

java.io.IOException: error=7, Argument list too long

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None
    • Release Note:
      Committed thanks

      Description

      I execute a huge query on a table with a lot of 2-level partitions. There is a perl reducer in my query. Maps worked ok, but every reducer fails with the following exception:

      2011-08-11 04:58:29,865 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/perl, <reducer.pl>, <my_argument>]
      2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null
      2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null
      2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null
      2011-08-11 04:58:29,935 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":

      {"reducesinkkey0":129390185139228,"reducesinkkey1":"00008AF10000000063CA6F"}

      ,"value":

      {"_col0":"00008AF10000000063CA6F","_col1":"2011-07-27 22:48:52","_col2":129390185139228,"_col3":2006,"_col4":4100,"_col5":"10017388=6","_col6":1063,"_col7":"NULL","_col8":"address.com","_col9":"NULL","_col10":"NULL"}

      ,"alias":0}
      at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
      at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
      at org.apache.hadoop.mapred.Child.main(Child.java:262)
      Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
      at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
      at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
      at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
      at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
      ... 7 more
      Caused by: java.io.IOException: Cannot run program "/usr/bin/perl": java.io.IOException: error=7, Argument list too long
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
      at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279)
      ... 15 more
      Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long
      at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
      at java.lang.ProcessImpl.start(ProcessImpl.java:65)
      at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
      ... 16 more

      It seems to me, I found the cause. ScriptOperator.java puts a lot of configs as environment variables to the child reduce process. One of variables is mapred.input.dir, which in my case more than 150KB. There are a huge amount of input directories in this variable. In short, the problem is that Linux (up to 2.6.23 kernel version) limits summary size of environment variables for child processes to 132KB. This problem could be solved by upgrading the kernel. But strings limitations still be 132KB per string in environment variable. So such huge variable doesn't work even on my home computer (2.6.32). You can read more information on (http://www.kernel.org/doc/man-pages/online/pages/man2/execve.2.html).

      For now all our work has been stopped because of this problem and I can't find the solution. The only solution, which seems to me more reasonable is to get rid of this variable in reducers.

      1. HIVE-2372.2.patch.txt
        7 kB
        Sergey Tryuber
      2. HIVE-2372.1.patch.txt
        4 kB
        Sergey Tryuber

        Activity

        Hide
        Sergey Tryuber added a comment -

        I've added a quick fix on our cluster. The fix just erases mapred_input_dir from environment of child process.
        ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java:
        ProcessBuilder pb = new ProcessBuilder(wrappedCmdArgs);
        Map<String, String> env = pb.environment();
        addJobConfToEnvironment(hconf, env);
        +
        + LOG.info("HIVE-2372. HOTFIX. Removing mapred_input_dir from environment variables");
        + env.remove("mapred_input_dir");
        +
        env.put(safeEnvVarName(HiveConf.ConfVars.HIVEALIAS.varname), String
        .valueOf(alias));

        All queries are work fine now. If anyone proposes more correct decision (add a property which will disable this property), I'll try to make code changes and prepare a patch.

        Show
        Sergey Tryuber added a comment - I've added a quick fix on our cluster. The fix just erases mapred_input_dir from environment of child process. ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java: ProcessBuilder pb = new ProcessBuilder(wrappedCmdArgs); Map<String, String> env = pb.environment(); addJobConfToEnvironment(hconf, env); + + LOG.info(" HIVE-2372 . HOTFIX. Removing mapred_input_dir from environment variables"); + env.remove("mapred_input_dir"); + env.put(safeEnvVarName(HiveConf.ConfVars.HIVEALIAS.varname), String .valueOf(alias)); All queries are work fine now. If anyone proposes more correct decision (add a property which will disable this property), I'll try to make code changes and prepare a patch.
        Hide
        He Yongqiang added a comment -

        Sergey Tryuber, can you put a patch?

        Show
        He Yongqiang added a comment - Sergey Tryuber, can you put a patch?
        Hide
        Siying Dong added a comment -

        Instead of simply removing the key, we should truncate the value to a shorter one if it is too long.

        Show
        Siying Dong added a comment - Instead of simply removing the key, we should truncate the value to a shorter one if it is too long.
        Hide
        Sergey Tryuber added a comment -

        Siying, what the limit we should set on this property? Limiting it to 132KB won't work on kernels prior to 2.6.23, because there are other properties exist and sum limit will exceed 132KB threshold for those kernels. May be we could introduce on more property that (if set to true) will remove the key?

        Show
        Sergey Tryuber added a comment - Siying, what the limit we should set on this property? Limiting it to 132KB won't work on kernels prior to 2.6.23, because there are other properties exist and sum limit will exceed 132KB threshold for those kernels. May be we could introduce on more property that (if set to true) will remove the key?
        Hide
        Siying Dong added a comment -

        I'm thinking a very short limit, like 1024 chars or so. A longer chars won't get user any information.

        Show
        Siying Dong added a comment - I'm thinking a very short limit, like 1024 chars or so. A longer chars won't get user any information.
        Hide
        Edward Capriolo added a comment -

        I am surprised no one has ran into this sooner. I had assumed the limit was much shorter. Streaming is 'a neat hack' but with the UDF/UDAF frameworks I have never been convinced it is needed. Most often I see use it improperly to make hacky map side joins etc. Maybe your real problem is having to many input files and you should merge your input first, or use bucketing instead of two level partitioning.

        Show
        Edward Capriolo added a comment - I am surprised no one has ran into this sooner. I had assumed the limit was much shorter. Streaming is 'a neat hack' but with the UDF/UDAF frameworks I have never been convinced it is needed. Most often I see use it improperly to make hacky map side joins etc. Maybe your real problem is having to many input files and you should merge your input first, or use bucketing instead of two level partitioning.
        Hide
        Sergey Tryuber added a comment -

        Edward, firstly, I also was the same problems/questions on other forums, unsolved. The issue is that we need hourly based access to our data in HDFS. Sometimes, we just need to process couple of ours quickly, and, sometimes, we need to process several monthes of data. We don't use "hacky map side joins". Our custom reducer performs only aggregations (quite complicated), nothing more.
        Hive manages all our workflow quite well. Actually, this still be our only problem and we hope to use Hive later on.
        Ok, what's the final decision: cut this option's lenght to 1,5,10KB? Or imply another option which enables this option removing from environment variables?

        Show
        Sergey Tryuber added a comment - Edward, firstly, I also was the same problems/questions on other forums, unsolved. The issue is that we need hourly based access to our data in HDFS. Sometimes, we just need to process couple of ours quickly, and, sometimes, we need to process several monthes of data. We don't use "hacky map side joins". Our custom reducer performs only aggregations (quite complicated), nothing more. Hive manages all our workflow quite well. Actually, this still be our only problem and we hope to use Hive later on. Ok, what's the final decision: cut this option's lenght to 1,5,10KB? Or imply another option which enables this option removing from environment variables?
        Hide
        Luis Ramos added a comment -

        We are running into the same problem here, my list of input files is even larger. In my case we are using a TRANSFORM() python script to convert the output of the reducers to csv format. I confirmed that lowering the number of input files works by specifying a limited partition or merging the inputs. Either way, I don't think it should pass that entire list through environment variables. I can go with an option to remove that.

        Show
        Luis Ramos added a comment - We are running into the same problem here, my list of input files is even larger. In my case we are using a TRANSFORM() python script to convert the output of the reducers to csv format. I confirmed that lowering the number of input files works by specifying a limited partition or merging the inputs. Either way, I don't think it should pass that entire list through environment variables. I can go with an option to remove that.
        Hide
        Luis Ramos added a comment -

        I just wanted to note, that in my case where I use a stream script for TRANSFORM, which only happens at the very end, a workaround was to simply force it to have 2 stages so that by the last stage, the input list has been merged. I understand that REDUCE scripts will work differently.

        Also +1 on Sergey Triuber quick fix. But might not be a solid solution. Thanks.

        Show
        Luis Ramos added a comment - I just wanted to note, that in my case where I use a stream script for TRANSFORM, which only happens at the very end, a workaround was to simply force it to have 2 stages so that by the last stage, the input list has been merged. I understand that REDUCE scripts will work differently. Also +1 on Sergey Triuber quick fix. But might not be a solid solution. Thanks.
        Hide
        Siying Dong added a comment -

        is there anyone still working on this?

        Show
        Siying Dong added a comment - is there anyone still working on this?
        Hide
        Luis Ramos added a comment -

        That's a good question, if anyone official wants to vote on a patch or Sergey Tryuber HOTFIX.

        After applying the HOTFIX and running "ant test" I didn't see any issues. And this definitely fixed our current problems with the limitation. +1 Thanks.

        Show
        Luis Ramos added a comment - That's a good question, if anyone official wants to vote on a patch or Sergey Tryuber HOTFIX. After applying the HOTFIX and running "ant test" I didn't see any issues. And this definitely fixed our current problems with the limitation. +1 Thanks.
        Hide
        Sergey Tryuber added a comment -

        Well, guys, give me a week to prepare a patch (just have no time right now) that will truncate this variable to 20KB if string length is too large (I'm going to hardcode this value) and prints WARN to log if such thing happend.

        Show
        Sergey Tryuber added a comment - Well, guys, give me a week to prepare a patch (just have no time right now) that will truncate this variable to 20KB if string length is too large (I'm going to hardcode this value) and prints WARN to log if such thing happend.
        Hide
        Sergey Tryuber added a comment -

        Patch, 1st version

        Show
        Sergey Tryuber added a comment - Patch, 1st version
        Hide
        Sergey Tryuber added a comment -

        I've attached a patch (as an attachment, not by "submit patch", as described on wiki HowToContribute). When I cloned trunk and run tests without any changes, for about 5 hours, there was several test errors((( Build and testing with my changes showed the same errors count. So, please, review this patch and make remarks.

        Show
        Sergey Tryuber added a comment - I've attached a patch (as an attachment, not by "submit patch", as described on wiki HowToContribute). When I cloned trunk and run tests without any changes, for about 5 hours, there was several test errors((( Build and testing with my changes showed the same errors count. So, please, review this patch and make remarks.
        Hide
        Siying Dong added a comment -

        Is 2048 per value too long?

        Show
        Siying Dong added a comment - Is 2048 per value too long?
        Hide
        Sergey Tryuber added a comment -

        Limitation in my patch is 20KB, I don't think, that smaller limit is good idea, because in some cases user might be really want to set long values. Actually, in some cases 2KB (as you proposed) is even not enough for for "hive.query.string". But, of course, if you insist, I'll set limitation to 1KB, just let me know about it.

        Show
        Sergey Tryuber added a comment - Limitation in my patch is 20KB, I don't think, that smaller limit is good idea, because in some cases user might be really want to set long values. Actually, in some cases 2KB (as you proposed) is even not enough for for "hive.query.string". But, of course, if you insist, I'll set limitation to 1KB, just let me know about it.
        Hide
        Siying Dong added a comment -

        How about make it configurable? (though I hate to add more and more parameters.)

        Show
        Siying Dong added a comment - How about make it configurable? (though I hate to add more and more parameters.)
        Hide
        Sergey Tryuber added a comment -

        That's was my initial idea (you even can see that in my comments). But after that I've changed my mind because I also "hate to add more and more parameters". This bug affects may be 0.001% of Hive users and I can't figure out the case of those 0.001% users when 20KB is not enough. Make this limit less then 20KB also has no sense, because that's not a bottleneck on modern environment. So, introducing one more useless option, prepare documentation for it and compel new Hive users to read/understand it...

        Show
        Sergey Tryuber added a comment - That's was my initial idea (you even can see that in my comments). But after that I've changed my mind because I also "hate to add more and more parameters". This bug affects may be 0.001% of Hive users and I can't figure out the case of those 0.001% users when 20KB is not enough. Make this limit less then 20KB also has no sense, because that's not a bottleneck on modern environment. So, introducing one more useless option, prepare documentation for it and compel new Hive users to read/understand it...
        Hide
        Luis Ramos added a comment -

        As a hive user affected by this bug, I would have no problem with truncating the variable to 20KB. That should be enough to give you an idea of the input list. 2KB sounds low to me.

        Show
        Luis Ramos added a comment - As a hive user affected by this bug, I would have no problem with truncating the variable to 20KB. That should be enough to give you an idea of the input list. 2KB sounds low to me.
        Hide
        Edward Capriolo added a comment -

        Lets get this committed. I think we should introduce the variables. In the normal case we want someone to get all the input they would expect. If they run into an exception they can turn the variable on to get out of the problem.

        Lets do this. Rebase your patch and add hive.scriptoperator.truncate.env=true|false to control this feature. Default it to false which would be how hive works now. I will review and commit.

        Show
        Edward Capriolo added a comment - Lets get this committed. I think we should introduce the variables. In the normal case we want someone to get all the input they would expect. If they run into an exception they can turn the variable on to get out of the problem. Lets do this. Rebase your patch and add hive.scriptoperator.truncate.env=true|false to control this feature. Default it to false which would be how hive works now. I will review and commit.
        Hide
        Sergey Tryuber added a comment -

        Ok, got your point. Edward, could you answer some questions:
        1. As I understood from sources, there is no hive-default.xml (it is deprecated)?
        2. I'm going to add one more entry to HiveConf#ConfVars. Is that right way?
        3. Where is place (in code base) for inserting comments for properties, or it just hive wiki?

        Show
        Sergey Tryuber added a comment - Ok, got your point. Edward, could you answer some questions: 1. As I understood from sources, there is no hive-default.xml (it is deprecated)? 2. I'm going to add one more entry to HiveConf#ConfVars. Is that right way? 3. Where is place (in code base) for inserting comments for properties, or it just hive wiki?
        Hide
        Edward Capriolo added a comment -

        1. There is a hive-default.xml.template you should add it there along with a short description
        2. Yes adding a ConfVar is the right way
        3. You can add some inline comments if you wish although generally if the property name is chosen correctly it usually explains itself.

        Show
        Edward Capriolo added a comment - 1. There is a hive-default.xml.template you should add it there along with a short description 2. Yes adding a ConfVar is the right way 3. You can add some inline comments if you wish although generally if the property name is chosen correctly it usually explains itself.
        Hide
        Sergey Tryuber added a comment -

        Patch that has hive.script.operator.truncate.env option that enables truncation.

        Show
        Sergey Tryuber added a comment - Patch that has hive.script.operator.truncate.env option that enables truncation.
        Hide
        Sergey Tryuber added a comment -

        Edward, I've named my option hive.script.operator.truncate.env (instead of hive.scriptoperator.truncate.env as you proposed), to be analogous of hive.script.operator.id.env.var. Actually, I have a question, why do we have hive-default.xml.template if all options (their names and default values) are hardcoded in ConfVar?

        Also, as I wrote in previous posts, there was some Hive tests failures in my "ant test" before. I just want to share this information with other developers: this problem was solved by changing locale in my system from Russian to EN. So, I have all tests successfully completed.

        And the last notice. Edward, could you add *.iml files to gitignore. I use Idea as IDE and it creates all those files.

        Show
        Sergey Tryuber added a comment - Edward, I've named my option hive.script.operator.truncate.env (instead of hive.scriptoperator.truncate.env as you proposed), to be analogous of hive.script.operator.id.env.var. Actually, I have a question, why do we have hive-default.xml.template if all options (their names and default values) are hardcoded in ConfVar? Also, as I wrote in previous posts, there was some Hive tests failures in my "ant test" before. I just want to share this information with other developers: this problem was solved by changing locale in my system from Russian to EN. So, I have all tests successfully completed. And the last notice. Edward, could you add *.iml files to gitignore. I use Idea as IDE and it creates all those files.
        Hide
        Edward Capriolo added a comment -

        I will take take a look at this. You should mark as match available when you are ready for review.

        Show
        Edward Capriolo added a comment - I will take take a look at this. You should mark as match available when you are ready for review.
        Hide
        Edward Capriolo added a comment -

        +1 tests pass will commit

        Show
        Edward Capriolo added a comment - +1 tests pass will commit
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1450 (See https://builds.apache.org/job/Hive-trunk-h0.21/1450/)
        HIVE-2372 Argument list too long when streaming (Sergey Tryuber via egc) (Revision 1342841)

        Result = FAILURE
        ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342841
        Files :

        • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        • /hive/trunk/conf/hive-default.xml.template
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
        • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1450 (See https://builds.apache.org/job/Hive-trunk-h0.21/1450/ ) HIVE-2372 Argument list too long when streaming (Sergey Tryuber via egc) (Revision 1342841) Result = FAILURE ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342841 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-2372 Argument list too long when streaming (Sergey Tryuber via egc) (Revision 1342841)

        Result = ABORTED
        ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342841
        Files :

        • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        • /hive/trunk/conf/hive-default.xml.template
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
        • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2372 Argument list too long when streaming (Sergey Tryuber via egc) (Revision 1342841) Result = ABORTED ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1342841 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestOperators.java
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Unassigned
            Reporter:
            Sergey Tryuber
          • Votes:
            4 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development