Hadoop Common
  1. Hadoop Common
  2. HADOOP-6812

fs.inmemory.size.mb not listed in conf. Cluster setup page gives wrong advice.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.2, 0.21.0, 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: documentation
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      http://hadoop.apache.org/common/docs/current/cluster_setup.html

      fs.inmemory.size.mb does not appear in any xml file

      grep "fs.inmemory.size.mb" ./mapred/mapred-default.xml 
      [edward@ec src]$ grep "fs.inmemory.size.mb" ./hdfs/hdfs-default.xml 
      [edward@ec src]$ grep "fs.inmemory.size.mb" ./core/core-default.xml 
      

      http://hadoop.apache.org/common/docs/current/cluster_setup.html
      Documentation error:
      Real-World Cluster Configurations

      conf/core-site.xml  	io.sort.factor  	100  	More streams merged at once while sorting files.
      conf/core-site.xml 	io.sort.mb 	200 	Higher memory-limit while sorting data.
      

      core — io.sort.factor – should be mapred
      core — io.sort.mb – should be mapred

      1. M1726-0.patch
        3 kB
        Chris Douglas
      2. M1726-0v20.patch
        4 kB
        Chris Douglas

        Activity

        Hide
        Chris Douglas added a comment -

        fs.inmemory.size.mb does not appear in any xml file

        It isn't used in the source. HADOOP-3446

        core — io.sort.factor - should be mapred
        core — io.sort.mb - should be mapred

        I don't follow. These are the correct names in 0.20.2, no?

        Show
        Chris Douglas added a comment - fs.inmemory.size.mb does not appear in any xml file It isn't used in the source. HADOOP-3446 core — io.sort.factor - should be mapred core — io.sort.mb - should be mapred I don't follow. These are the correct names in 0.20.2, no?
        Hide
        Chris Douglas added a comment -

        Oh, I see what you mean. io.sort.factor and io.sort.mb are also used in SequenceFile, so the config in 0.20 is in core.

        Show
        Chris Douglas added a comment - Oh, I see what you mean. io.sort.factor and io.sort.mb are also used in SequenceFile, so the config in 0.20 is in core.
        Hide
        Edward Capriolo added a comment -


        It is confusing to me. Usually I determine which configuration a variable should go into by looking at the default-xml files in <hadoop>/src. io.sort.factor and io.sort.mb are specified in mapred. They should either be in both or just in core, correct?

        Show
        Edward Capriolo added a comment - It is confusing to me. Usually I determine which configuration a variable should go into by looking at the default-xml files in <hadoop>/src. io.sort.factor and io.sort.mb are specified in mapred. They should either be in both or just in core, correct?
        Hide
        Edward Capriolo added a comment -

        In the future, can you please not close a ticket before I even have a chance to reply.

        1) the generated documentation on the site is wrong.
        2) the generated xml files in the src directory are putting variables in the wrong files.

        People who are not 'In the know' will put configuration variables in the wrong file and not get the effect they desire.

        Show
        Edward Capriolo added a comment - In the future, can you please not close a ticket before I even have a chance to reply. 1) the generated documentation on the site is wrong. 2) the generated xml files in the src directory are putting variables in the wrong files. People who are not 'In the know' will put configuration variables in the wrong file and not get the effect they desire.
        Hide
        Chris Douglas added a comment -

        Sorry for closing the issue prematurely, but I'm still unclear on what this issue is about. It sounded like you were saying that io.sort.factor and io.sort.mb belong in mapred-default.xml rather than core-default.xml, which I thought I'd answered by noting that these parameters are also used in o.a.h.io.SequenceFile (which is in core, not mapred). Given that fs.inmemory.size.mb is unused, that it doesn't appear in the default configs is also correct.

        the generated documentation on the site is wrong.

        the generated xml files in the src directory are putting variables in the wrong files.

        How? Can you either explain what is "wrong" or post a patch correcting the error?

        Show
        Chris Douglas added a comment - Sorry for closing the issue prematurely, but I'm still unclear on what this issue is about. It sounded like you were saying that io.sort.factor and io.sort.mb belong in mapred-default.xml rather than core-default.xml, which I thought I'd answered by noting that these parameters are also used in o.a.h.io.SequenceFile (which is in core, not mapred). Given that fs.inmemory.size.mb is unused, that it doesn't appear in the default configs is also correct. the generated documentation on the site is wrong. the generated xml files in the src directory are putting variables in the wrong files. How? Can you either explain what is "wrong" or post a patch correcting the error?
        Hide
        Edward Capriolo added a comment -

        If I understandard correctly the docs for current are based on current stable 0.20.2. Current stable does not use fs.inmemory.size.mb.

        http://hadoop.apache.org/common/docs/current/cluster_setup.html. Under real world configurations

        conf/core-site.xml  	fs.inmemory.size.mb  	200  	 Larger amount of memory allocated for the in-memory file-system used to merge map-outputs at the reduces. 
        

        As to "io.sort.factor and io.sort.mb"

        They both appear in mapred-default.xml

        [edward@ec src]$ grep -R "io.sort.factor" */*.xml
        mapred/mapred-default.xml:  <name>io.sort.factor</name>
        

        They should be in core-default.xml (only), or in both core-default.xml and mapred-default.conf.

        Think about the end user. An end user might read a blog that states, "io.sort.factor is a magic tune set this to XXXX for awesome performance". Which file should end user put this variable in?

        grep -R "io.sort.factor" */*.xml    
        mapred/mapred-default.xml:  <name>io.sort.factor</name>
        

        End user thinks, "Since I found this variable in mapred-default.xml it makese sense that I should override it in mapred-site.xml"

        The user puts the variable in the wrong place, because end user has no (easy) way of knowing that SequenceFile uses io.sort.factor or io.sort.mb. Does that make sense?

        Show
        Edward Capriolo added a comment - If I understandard correctly the docs for current are based on current stable 0.20.2. Current stable does not use fs.inmemory.size.mb. http://hadoop.apache.org/common/docs/current/cluster_setup.html . Under real world configurations conf/core-site.xml fs.inmemory.size.mb 200 Larger amount of memory allocated for the in-memory file-system used to merge map-outputs at the reduces. As to "io.sort.factor and io.sort.mb" They both appear in mapred-default.xml [edward@ec src]$ grep -R "io.sort.factor" */*.xml mapred/mapred-default.xml: <name>io.sort.factor</name> They should be in core-default.xml (only), or in both core-default.xml and mapred-default.conf. Think about the end user. An end user might read a blog that states, "io.sort.factor is a magic tune set this to XXXX for awesome performance". Which file should end user put this variable in? grep -R "io.sort.factor" */*.xml mapred/mapred-default.xml: <name>io.sort.factor</name> End user thinks, "Since I found this variable in mapred-default.xml it makese sense that I should override it in mapred-site.xml" The user puts the variable in the wrong place, because end user has no (easy) way of knowing that SequenceFile uses io.sort.factor or io.sort.mb. Does that make sense?
        Hide
        Chris Douglas added a comment -

        Moving issue to MAPREDUCE, as that is the current home of the cluster setup docs.

        If I understandard correctly the docs for current are based on current stable 0.20.2. Current stable does not use fs.inmemory.size.mb.

        OK, I understand. HADOOP-3446 updated the mapred tutorial, but failed to update the cluster setup docs.

        Think about the end user. An end user might read a blog that states, "io.sort.factor is a magic tune set this to XXXX for awesome performance". Which file should end user put this variable in?

        It is inconsistent, but as long as the user adds the preferred value to one of the -site.xml files it should make no functional difference in MapReduce. You're right, though: the properties you cite are usually added to mapred-site.xml anyway, because the SequenceFile sort is rarely used. The situation is worse in trunk, where the properties are mapreduce specific, but users are still directed to core-site.xml.

        Thanks for clarifying.

        Show
        Chris Douglas added a comment - Moving issue to MAPREDUCE, as that is the current home of the cluster setup docs. If I understandard correctly the docs for current are based on current stable 0.20.2. Current stable does not use fs.inmemory.size.mb. OK, I understand. HADOOP-3446 updated the mapred tutorial, but failed to update the cluster setup docs. Think about the end user. An end user might read a blog that states, "io.sort.factor is a magic tune set this to XXXX for awesome performance". Which file should end user put this variable in? It is inconsistent, but as long as the user adds the preferred value to one of the -site.xml files it should make no functional difference in MapReduce. You're right, though: the properties you cite are usually added to mapred-site.xml anyway, because the SequenceFile sort is rarely used. The situation is worse in trunk, where the properties are mapreduce specific, but users are still directed to core-site.xml . Thanks for clarifying.
        Hide
        Chris Douglas added a comment -

        Patch for 0.20 branch

        Show
        Chris Douglas added a comment - Patch for 0.20 branch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12442758/M1726-0.patch
        against trunk revision 937201.

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442758/M1726-0.patch against trunk revision 937201. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/135/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Moved back to Common after MAPREDUCE-1404

        Show
        Chris Douglas added a comment - Moved back to Common after MAPREDUCE-1404
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12442759/M1726-0v20.patch
        against trunk revision 1031422.

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/32//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442759/M1726-0v20.patch against trunk revision 1031422. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/32//console This message is automatically generated.
        Hide
        Konstantin Shvachko added a comment -

        Chris, should we also open an issue to remove fs.ramfs.impl, since there is no InMemoryFileSystem any more?

        Show
        Konstantin Shvachko added a comment - Chris, should we also open an issue to remove fs.ramfs.impl, since there is no InMemoryFileSystem any more?
        Hide
        Chris Douglas added a comment -

        Sorry, I had missed this.

        Chris, should we also open an issue to remove fs.ramfs.impl, since there is no InMemoryFileSystem any more?

        That makes sense to me. I assume it's still in the configs; is it also mentioned in the documentation?

        Show
        Chris Douglas added a comment - Sorry, I had missed this. Chris, should we also open an issue to remove fs.ramfs.impl, since there is no InMemoryFileSystem any more? That makes sense to me. I assume it's still in the configs; is it also mentioned in the documentation?
        Hide
        Konstantin Shvachko added a comment -

        It is only in the configs. Not in the documentation.

        Show
        Konstantin Shvachko added a comment - It is only in the configs. Not in the documentation.
        Hide
        Todd Lipcon added a comment -

        Also fs.inmemory.size.mb is mentioned in the terasort javadoc

        Show
        Todd Lipcon added a comment - Also fs.inmemory.size.mb is mentioned in the terasort javadoc
        Hide
        Konstantin Shvachko added a comment -

        The terasort mentioning is relevant. It's just that it should be configured in mapred-site rather than core-site.

        Show
        Konstantin Shvachko added a comment - The terasort mentioning is relevant. It's just that it should be configured in mapred-site rather than core-site.
        Hide
        Konstantin Shvachko added a comment -

        I just committed this. Thank you Chris.

        Show
        Konstantin Shvachko added a comment - I just committed this. Thank you Chris.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-22-branch #21 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/21/)
        HADOOP-6812. Merge -r 1064916:1064917 from trunk to branch 0.22.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-22-branch #21 (See https://hudson.apache.org/hudson/job/Hadoop-Common-22-branch/21/ ) HADOOP-6812 . Merge -r 1064916:1064917 from trunk to branch 0.22.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #590 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/590/)
        HADOOP-6812. Change documentation for correct placement of configuration variables: mapreduce.reduce.input.buffer.percent, mapreduce.task.io.sort.factor, mapreduce.task.io.sort.mb. Contributed by Chris Douglas.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #590 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/590/ ) HADOOP-6812 . Change documentation for correct placement of configuration variables: mapreduce.reduce.input.buffer.percent, mapreduce.task.io.sort.factor, mapreduce.task.io.sort.mb. Contributed by Chris Douglas.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #492 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/492/)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #492 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/492/ )

          People

          • Assignee:
            Chris Douglas
            Reporter:
            Edward Capriolo
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development