Hadoop Common
  1. Hadoop Common
  2. HADOOP-10245

Hadoop command line always appends "-Xmx" option twice

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: 2.2.0
    • Fix Version/s: None
    • Component/s: bin, scripts
    • Labels:
      None

      Description

      The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE env variable will take no effect because it is overwritten by the second "-Xmx" option.

      For example, here is the java cmd generated for command "hadoop fs -ls /", Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the command line:

      java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log -Dhadoop.root.logger=INFO,c
      onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX org.apache.hadoop.fs.FsShell -ls /

      Here is the root cause:
      The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls hadoop-env.sh.
      In hadoop.sh, the command line is generated by the following pseudo code:
      java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...

      In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user didn't set $HADOOP_HEAP_SIZE env variable.

      In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
      export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

      To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. If we really want to change the memory settings we need to use $HADOOP_HEAP_SIZE env variable.

        Issue Links

          Activity

          Hide
          Allen Wittenauer added a comment -

          This has been fixed as part of HADOOP-9902. Closing as 'Not a problem'.

          Show
          Allen Wittenauer added a comment - This has been fixed as part of HADOOP-9902 . Closing as 'Not a problem'.
          Hide
          Wei Yan added a comment -

          shanyu zhao, sorry for the late reply. So you mean we only let users specify -Xmx through $JAVA_HEAP_MAX?

          Show
          Wei Yan added a comment - shanyu zhao , sorry for the late reply. So you mean we only let users specify -Xmx through $JAVA_HEAP_MAX?
          Hide
          shanyu zhao added a comment -

          Wei Yan Thank you for your comment! If we remove -Xmx512m from HADOOP_CLIENT_OPTS in hadoop_env.cmd, there will be one and only one -Xmx, which is the $JAVA_HEAP_MAX in bin/hadoop.

          HADOOP-9870 may have solved the problem for you, but I think the fix in HADOOP-9870 might be too complicated and hard to maintain. For example, what about user use "-Xmx" in HADOOP_OPTS instead of HADOOP_CLIENT_OPTS? I think we should avoid using HADOOP_CLIENT_OPTS or HADOOP_OPTS to specify memory, because the fact that we've defined HADOOP_HEAPSIZE but not using it for memory specification is confusing. If you want to change heap size, just change HADOOP_HEAPSIZE, I think this is simple and clear. Thoughts?

          Show
          shanyu zhao added a comment - Wei Yan Thank you for your comment! If we remove -Xmx512m from HADOOP_CLIENT_OPTS in hadoop_env.cmd, there will be one and only one -Xmx, which is the $JAVA_HEAP_MAX in bin/hadoop. HADOOP-9870 may have solved the problem for you, but I think the fix in HADOOP-9870 might be too complicated and hard to maintain. For example, what about user use "-Xmx" in HADOOP_OPTS instead of HADOOP_CLIENT_OPTS? I think we should avoid using HADOOP_CLIENT_OPTS or HADOOP_OPTS to specify memory, because the fact that we've defined HADOOP_HEAPSIZE but not using it for memory specification is confusing. If you want to change heap size, just change HADOOP_HEAPSIZE, I think this is simple and clear. Thoughts?
          Hide
          Wei Yan added a comment -

          shanyu zhao, as discussed, there are multiple places configuring -Xmx. In the lastest patch in HADOOP-9870 provided by Jayesh, $HADOOP_HEAPSIZE is checked firstly; if not set, assign -Xmx512m. Additionally, in bin/hadoop, also check the -Xmx configuration, to avoid duplicate configurations.
          Simply remove -Xmx512m from HADOOP_CLIENT_OPTS may still generate multiple -Xmx, as bin/hadoop also has a default $JAVA_HEAP_MAX, which is 1000m.
          IMO, I think HADOOP-9870 has fixed this issue.

          Show
          Wei Yan added a comment - shanyu zhao , as discussed, there are multiple places configuring -Xmx. In the lastest patch in HADOOP-9870 provided by Jayesh , $HADOOP_HEAPSIZE is checked firstly; if not set, assign -Xmx512m. Additionally, in bin/hadoop, also check the -Xmx configuration, to avoid duplicate configurations. Simply remove -Xmx512m from HADOOP_CLIENT_OPTS may still generate multiple -Xmx, as bin/hadoop also has a default $JAVA_HEAP_MAX, which is 1000m. IMO, I think HADOOP-9870 has fixed this issue.
          Hide
          shanyu zhao added a comment -

          Kai Zheng, Harsh J would you please help review this patch?

          Show
          shanyu zhao added a comment - Kai Zheng , Harsh J would you please help review this patch?
          Hide
          shanyu zhao added a comment -

          Wei Yan yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087.

          I went through these JIRAs and here are my thoughts:
          We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of $HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug issues. And I've seen many real world issues caused by this confusion.

          There are arguments that $HADOOP_HEAPSIZE is only for service, and client should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE which is initialized to 512m and used in hadoop.sh. But personally I think it does not worth it to add this new env variable. The client can just simply use $HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java class executed by "hadoop jar" command has a large memory requirements. A real world example: Hive's MapredLocalTask calls "hadoop jar" to build a local hash table.

          Also, if there's a need to change the heapsize, one can always set env variable $HADOOP_HEAPSIZE.

          Show
          shanyu zhao added a comment - Wei Yan yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087 . I went through these JIRAs and here are my thoughts: We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of $HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug issues. And I've seen many real world issues caused by this confusion. There are arguments that $HADOOP_HEAPSIZE is only for service, and client should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE which is initialized to 512m and used in hadoop.sh. But personally I think it does not worth it to add this new env variable. The client can just simply use $HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java class executed by "hadoop jar" command has a large memory requirements. A real world example: Hive's MapredLocalTask calls "hadoop jar" to build a local hash table. Also, if there's a need to change the heapsize, one can always set env variable $HADOOP_HEAPSIZE.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12624030/HADOOP-10245.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624030/HADOOP-10245.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//console This message is automatically generated.
          Hide
          Wei Yan added a comment -

          Hey, shanyu. Is this one related to HADOOP-9870?

          Show
          Wei Yan added a comment - Hey, shanyu. Is this one related to HADOOP-9870 ?

            People

            • Assignee:
              shanyu zhao
              Reporter:
              shanyu zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development