Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3057

Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: jobhistoryserver, mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      History server was started with -Xmx10000m
      Ran GridMix V3 with 1200 Jobs trace in STRESS mode on 350 nodes with each node 4 NMS.
      All jobs finished as reported by RM Web UI and HADOOP_MAPRED_HOME/bin/mapred job -list all
      But found that GridMix job client was stuck while trying connect to HistoryServer
      Then tried to do HADOOP_MAPRED_HOME/bin/mapred job -status jobid
      JobClient also got stuck while looking for token to connect to History server
      Then looked at History Server logs and found History is trowing "java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

      With 10GB of Heap space and 1200 Jobs, History Server should not go out of memory .
      No matter what are the type of jobs.

        Activity

        Hide
        Eric Payne added a comment -

        Great! Thanks Mahadev.

        Show
        Eric Payne added a comment - Great! Thanks Mahadev.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #831 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/831/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #831 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/831/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Build #52 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/52/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #52 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/52/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #861 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/861/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #861 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/861/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #40 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/40/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #40 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/40/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-0.23-Commit #4 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/4/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #4 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/4/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #1105 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1105/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1105 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1105/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Commit #3 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/3/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #3 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/3/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #1086 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1086/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1086 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1086/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-0.23-Commit #2 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/2/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #2 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/2/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) - Merging r1183545 from trunk mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183547 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #1164 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1164/)
        MAPREDUCE-3057. Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1164 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1164/ ) MAPREDUCE-3057 . Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB. (Eric Payne via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1183545 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistory.java
        Hide
        Mahadev konar added a comment -

        Just pushed this. Thanks Eric.

        Show
        Mahadev konar added a comment - Just pushed this. Thanks Eric.
        Hide
        Mahadev konar added a comment -

        ok. Ill go ahead and commit the patch as it is then.

        Show
        Mahadev konar added a comment - ok. Ill go ahead and commit the patch as it is then.
        Hide
        Siddharth Seth added a comment -

        Ran a couple of sleep jobs (usage from a heap dump):
        1000m, 1r -> 15.9M (~1.6K per map)
        1m, 1000r -> 20.1M (~2 K per reduce)

        This cache is only populated when the job is actually accessed. Something like Oozie checking for job completion status (single call so will be read from hdfs. the job would be cached but may not be accessed again), or users accessing the UI/CLI (first call will be a read from HDFS).
        The more important cache is the jobListCache which is effectively the list of jobs available on the UI. Default size is 20K (couple K per job).

        We can't really make assumption about access patterns - and whether an average is good enough to decide on the default size. A scripted fetch of all instances of a single job (1000r * 50) would cause an OOM on a 1G heap - which then affects all other users.
        I'd prefer keeping the default really low, and let individual deployments adjust the value via mapred-site.

        Show
        Siddharth Seth added a comment - Ran a couple of sleep jobs (usage from a heap dump): 1000m, 1r -> 15.9M (~1.6K per map) 1m, 1000r -> 20.1M (~2 K per reduce) This cache is only populated when the job is actually accessed. Something like Oozie checking for job completion status (single call so will be read from hdfs. the job would be cached but may not be accessed again), or users accessing the UI/CLI (first call will be a read from HDFS). The more important cache is the jobListCache which is effectively the list of jobs available on the UI. Default size is 20K (couple K per job). We can't really make assumption about access patterns - and whether an average is good enough to decide on the default size. A scripted fetch of all instances of a single job (1000r * 50) would cause an OOM on a 1G heap - which then affects all other users. I'd prefer keeping the default really low, and let individual deployments adjust the value via mapred-site.
        Hide
        Eric Payne added a comment -

        It doesn't look like loadedjobs.cache.size is in mapred-default.xml, unless I'm missing something. Should I add it?

        Show
        Eric Payne added a comment - It doesn't look like loadedjobs.cache.size is in mapred-default.xml, unless I'm missing something. Should I add it?
        Hide
        Robert Joseph Evans added a comment -

        If we get the value too small then the server runs slow. If we get it too big the server does not run at all. I would rather err on the side of caution, but with some number to back up my decision.

        From my previous back of the envelope calculation:

        Number of Jobs Heap Needed Heap Rounded Up to a Reasonable Value
        5 42.5MB 256MB
        100 850MB 1GB
        500 4.15GB 4.5GB

        100 jobs would require around 850MB of heap maybe 1GB to be safe. Is that the default value that we are going to use to launch the history server with? It easily falls in the range of 32-bits, so if we make sure that the default max heap and the default job cache are in line with one another it seems good to me. But, if the default is 2GB then lets go with a default of 200. It would also be good to update the documentation and value in mapred-default.xml to indicate that this ties directly to the heap needed.

        +1 for 100 (non-binding)

        Show
        Robert Joseph Evans added a comment - If we get the value too small then the server runs slow. If we get it too big the server does not run at all. I would rather err on the side of caution, but with some number to back up my decision. From my previous back of the envelope calculation: Number of Jobs Heap Needed Heap Rounded Up to a Reasonable Value 5 42.5MB 256MB 100 850MB 1GB 500 4.15GB 4.5GB 100 jobs would require around 850MB of heap maybe 1GB to be safe. Is that the default value that we are going to use to launch the history server with? It easily falls in the range of 32-bits, so if we make sure that the default max heap and the default job cache are in line with one another it seems good to me. But, if the default is 2GB then lets go with a default of 200. It would also be good to update the documentation and value in mapred-default.xml to indicate that this ties directly to the heap needed. +1 for 100 (non-binding)
        Hide
        Eric Payne added a comment -

        Thanks for your prompt feedback, guys.

        I see 5, 100, and 500 as the suggested default for loadedjobs.cache.size. Can we settle on 100?

        Show
        Eric Payne added a comment - Thanks for your prompt feedback, guys. I see 5, 100, and 500 as the suggested default for loadedjobs.cache.size . Can we settle on 100?
        Hide
        Mahadev konar added a comment - - edited

        Assuming 25K jobs per day (on a really large cluster), we can keep the last 30 mins of jobs in memory (== 500)? Just a suggestion, am fine either way.

        Show
        Mahadev konar added a comment - - edited Assuming 25K jobs per day (on a really large cluster), we can keep the last 30 mins of jobs in memory (== 500)? Just a suggestion, am fine either way.
        Hide
        Arun C Murthy added a comment -

        Or even few tens?

        Show
        Arun C Murthy added a comment - Or even few tens?
        Hide
        Arun C Murthy added a comment -

        5 seems too small?? Maybe 100 or so?

        Show
        Arun C Murthy added a comment - 5 seems too small?? Maybe 100 or so?
        Hide
        Siddharth Seth added a comment -

        Looks good to me.
        To reproduce it, you could run multiple sleep jobs with 1000s of Map/Reduce tasks. Start the history server with a low Xmx value and make sure to pull job details via the history UI or the CLI.

        Show
        Siddharth Seth added a comment - Looks good to me. To reproduce it, you could run multiple sleep jobs with 1000s of Map/Reduce tasks. Start the history server with a low Xmx value and make sure to pull job details via the history UI or the CLI.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12498928/MAPREDUCE-3057.v1.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1009//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1009//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12498928/MAPREDUCE-3057.v1.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1009//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1009//console This message is automatically generated.
        Hide
        Eric Payne added a comment -

        I'm having trouble reproducing this on my 1-node cluster. I'm continuing to try to test this, but since it's just a change to the default config value, I think it's safe to just post the patch and make sure this is what was agreed to.

        Show
        Eric Payne added a comment - I'm having trouble reproducing this on my 1-node cluster. I'm continuing to try to test this, but since it's just a change to the default config value, I think it's safe to just post the patch and make sure this is what was agreed to.
        Hide
        Siddharth Seth added a comment -

        The JHS has a couple of caches. One for the job listing page - which is reasonably light weight.
        This second cache (includes all job + task details) is populated when a there's a request to get job/task details via the CLI or UI.
        Somehow missed reducing it earlier. As a short term fix, am also +1 for reducing the default value to something like 5 or even lower, and setting it conservatively per deployment. Longer term fix based on allocated space instead of #jobs.

        Show
        Siddharth Seth added a comment - The JHS has a couple of caches. One for the job listing page - which is reasonably light weight. This second cache (includes all job + task details) is populated when a there's a request to get job/task details via the CLI or UI. Somehow missed reducing it earlier. As a short term fix, am also +1 for reducing the default value to something like 5 or even lower, and setting it conservatively per deployment. Longer term fix based on allocated space instead of #jobs.
        Hide
        Robert Joseph Evans added a comment -

        I agree. It looks like it would probably be about 1K to 2K per task (690 bytes if all of the jobs had 11700 mappers and 1200 reducers) Thanks for humoring me.

        +1 on reducing the default number of jobs in the cache.

        Show
        Robert Joseph Evans added a comment - I agree. It looks like it would probably be about 1K to 2K per task (690 bytes if all of the jobs had 11700 mappers and 1200 reducers) Thanks for humoring me. +1 on reducing the default number of jobs in the cache.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The number of maps per job ranged from 1 to 11700, and reducers from 0 to 1200.

        I don't suspect any memory leak. The way we limit the cache-size is by #jobs but not their memory usage in JHS, that is the problem here.

        Show
        Vinod Kumar Vavilapalli added a comment - The number of maps per job ranged from 1 to 11700, and reducers from 0 to 1200. I don't suspect any memory leak. The way we limit the cache-size is by #jobs but not their memory usage in JHS, that is the problem here.
        Hide
        Robert Joseph Evans added a comment -

        Lets do a little math here. 10GB heap/1200 jobs is about 8.5 MB per job (a bit less because we are running a web server after all). This seems like a lot to store a job. How many task and task attempts are we looking at per job? I just want to be sure that there is not a memory leak somewhere in here.

        Show
        Robert Joseph Evans added a comment - Lets do a little math here. 10GB heap/1200 jobs is about 8.5 MB per job (a bit less because we are running a web server after all). This seems like a lot to store a job. How many task and task attempts are we looking at per job? I just want to be sure that there is not a memory leak somewhere in here.
        Hide
        Mahadev konar added a comment -

        +1 for just reducing the size of the loaded jobs in cache.

        Show
        Mahadev konar added a comment - +1 for just reducing the size of the loaded jobs in cache.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        The default size of loaded-jobs' cache is 2000, which turned out to be too much in the current case. The short term fix is to reduce this, sure.

        For long term, we've a couple of options:

        • If it isn't too difficult, JHS should adjust its cache size depending on the available heap.
        • OTOH, I think, we can live with not loading the task-information and retrieve task-specific information on demand. I'd expect the demand for Job-level information to be far higher.
        Show
        Vinod Kumar Vavilapalli added a comment - The default size of loaded-jobs' cache is 2000, which turned out to be too much in the current case. The short term fix is to reduce this, sure. For long term, we've a couple of options: If it isn't too difficult, JHS should adjust its cache size depending on the available heap. OTOH, I think, we can live with not loading the task-information and retrieve task-specific information on demand. I'd expect the demand for Job-level information to be far higher.

          People

          • Assignee:
            Eric Payne
            Reporter:
            Karam Singh
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development