Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      This is a place holder to share our code and experiences about implementing a Hot Standby for the HDFS NameNode for hadoop 0.20.

      1. 0001-0.20.3_rc2-AvatarNode.patch
        187 kB
        Tony Valderrama
      2. AvatarNode.20.patch
        286 kB
        Dmytro Molkov
      3. AvatarNode.patch
        269 kB
        dhruba borthakur
      4. AvatarNodeDescription.txt
        8 kB
        dhruba borthakur
      5. AvatarPatch.2.patch
        282 kB
        Dmytro Molkov

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          A short writeup and code to implement a Hot failover for NameNode in hadoop 0.20. This code is by no means complete and extensive testing is required before it can be deployed. It is developed purely as a contrib module so that the HA functionality is completely separated from the existing NameNode.

          Show
          dhruba borthakur added a comment - A short writeup and code to implement a Hot failover for NameNode in hadoop 0.20. This code is by no means complete and extensive testing is required before it can be deployed. It is developed purely as a contrib module so that the HA functionality is completely separated from the existing NameNode.
          Hide
          aleojiang added a comment -

          Would you please provide the steps and configurations on how to do test on your patch?

          Show
          aleojiang added a comment - Would you please provide the steps and configurations on how to do test on your patch?
          Hide
          dhruba borthakur added a comment -

          I had an offline discussion with Dmytro. He will upload a new version of this patch (with details on how to configure it).

          Show
          dhruba borthakur added a comment - I had an offline discussion with Dmytro. He will upload a new version of this patch (with details on how to configure it).
          Hide
          Dmytro Molkov added a comment -

          I am attaching a newer version of the patch.
          We fixed a few issues that were leading to data corruption and are actively testing this version to see if there are any more problems.

          Show
          Dmytro Molkov added a comment - I am attaching a newer version of the patch. We fixed a few issues that were leading to data corruption and are actively testing this version to see if there are any more problems.
          Hide
          Mingjie Lai added a comment -

          @Dmytro

          Which branch did you make the patch against?

          I checked out hadoop-common branch-0.20, and had been trying to apply your patch, but always got compilation errors – http://pastebin.ca/1881910.

          I'm pretty new to the hdfs code base. Did I miss anything here?

          compile:
          [echo] contrib: highavailability
          [javac] Compiling 10 source files to /home/mlai/git/hadoop/hadoop-common/build/contrib/highavailability/classes
          [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarClient.java:258: init() has protected access in org.apache.hadoop.fs.FsShell
          [javac] fh.init();
          [javac] ^
          [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarClient.java:260: delete(org.apache.hadoop.fs.Path,org.apache.hadoop.fs.FileSystem,boolean,boolean) has private access in org.apache.hadoop.fs.FsShell
          [javac] fh.delete(p, p.getFileSystem(conf), recursive, false);
          [javac] ^
          [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarClient.java:1849: cannot find symbol
          [javac] symbol : class BlockMissingException
          [javac] location: class org.apache.hadoop.hdfs.AvatarClient.DFSInputStream
          [javac] throw new BlockMissingException(src, "Could not obtain block: " + blockInfo, block.getStartOffset());
          [javac] ^
          [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/server/datanode/AvatarDataNode.java:659: cannot find symbol
          [javac] symbol : method getNameNodeAddress(org.apache.hadoop.conf.Configuration)
          [javac] location: class org.apache.hadoop.hdfs.server.datanode.DataNode
          [javac] return DataNode.getNameNodeAddress(newconf);
          [javac] ^
          [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/server/namenode/AvatarNode.java:291: cannot find symbol
          [javac] symbol : method writeLock()
          [javac] location: class org.apache.hadoop.hdfs.server.namenode.FSNamesystem
          [javac] namesystem.writeLock();
          ...

          Show
          Mingjie Lai added a comment - @Dmytro Which branch did you make the patch against? I checked out hadoop-common branch-0.20, and had been trying to apply your patch, but always got compilation errors – http://pastebin.ca/1881910 . I'm pretty new to the hdfs code base. Did I miss anything here? compile: [echo] contrib: highavailability [javac] Compiling 10 source files to /home/mlai/git/hadoop/hadoop-common/build/contrib/highavailability/classes [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarClient.java:258: init() has protected access in org.apache.hadoop.fs.FsShell [javac] fh.init(); [javac] ^ [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarClient.java:260: delete(org.apache.hadoop.fs.Path,org.apache.hadoop.fs.FileSystem,boolean,boolean) has private access in org.apache.hadoop.fs.FsShell [javac] fh.delete(p, p.getFileSystem(conf), recursive, false); [javac] ^ [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarClient.java:1849: cannot find symbol [javac] symbol : class BlockMissingException [javac] location: class org.apache.hadoop.hdfs.AvatarClient.DFSInputStream [javac] throw new BlockMissingException(src, "Could not obtain block: " + blockInfo, block.getStartOffset()); [javac] ^ [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/server/datanode/AvatarDataNode.java:659: cannot find symbol [javac] symbol : method getNameNodeAddress(org.apache.hadoop.conf.Configuration) [javac] location: class org.apache.hadoop.hdfs.server.datanode.DataNode [javac] return DataNode.getNameNodeAddress(newconf); [javac] ^ [javac] /home/mlai/git/hadoop/hadoop-common/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/server/namenode/AvatarNode.java:291: cannot find symbol [javac] symbol : method writeLock() [javac] location: class org.apache.hadoop.hdfs.server.namenode.FSNamesystem [javac] namesystem.writeLock(); ...
          Hide
          Dmytro Molkov added a comment -

          Sorry, the last patch was made agains our internal tree.
          This one is modified to be applied agains hadoop-0.20 trunk.

          I will be working on creating a patch against trunk, but it will take some time. In the meanwhile this one seems to compile OK against hadoop-0.20

          Show
          Dmytro Molkov added a comment - Sorry, the last patch was made agains our internal tree. This one is modified to be applied agains hadoop-0.20 trunk. I will be working on creating a patch against trunk, but it will take some time. In the meanwhile this one seems to compile OK against hadoop-0.20
          Hide
          seahigh added a comment -

          I user hadoop-0.20.2, and had been trying to apply your patch, but always got compilation errors.
          How can I user this patch?
          compile:
          [echo] contrib: highavailability
          [javac] Compiling 10 source files to /home/hadoop/hadoop-0.20.2/build/contrib/highavailability/classes
          [javac] /home/hadoop/hadoop-0.20.2/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/server/namenode/Standby.java:341: cannot find symbol
          [javac] symbol : method saveNamespace(boolean)
          [javac] location: class org.apache.hadoop.hdfs.server.namenode.FSImage
          [javac] fsImage.saveNamespace(true);
          [javac] ^
          [javac] Note: Some input files use or override a deprecated API.
          [javac] Note: Recompile with -Xlint:deprecation for details.
          [javac] 1 error

          BUILD FAILED
          /home/hadoop/hadoop-0.20.2/build.xml:497: The following error occurred while executing this line:
          /home/hadoop/hadoop-0.20.2/src/contrib/build.xml:30: The following error occurred while executing this line:
          /home/hadoop/hadoop-0.20.2/src/contrib/build-contrib.xml:133: Compile failed; see the compiler error output for details.

          Show
          seahigh added a comment - I user hadoop-0.20.2, and had been trying to apply your patch, but always got compilation errors. How can I user this patch? compile: [echo] contrib: highavailability [javac] Compiling 10 source files to /home/hadoop/hadoop-0.20.2/build/contrib/highavailability/classes [javac] /home/hadoop/hadoop-0.20.2/src/contrib/highavailability/src/java/org/apache/hadoop/hdfs/server/namenode/Standby.java:341: cannot find symbol [javac] symbol : method saveNamespace(boolean) [javac] location: class org.apache.hadoop.hdfs.server.namenode.FSImage [javac] fsImage.saveNamespace(true); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 1 error BUILD FAILED /home/hadoop/hadoop-0.20.2/build.xml:497: The following error occurred while executing this line: /home/hadoop/hadoop-0.20.2/src/contrib/build.xml:30: The following error occurred while executing this line: /home/hadoop/hadoop-0.20.2/src/contrib/build-contrib.xml:133: Compile failed; see the compiler error output for details.
          Hide
          Murugaprabu Marimuthu added a comment -

          I am also getting the same error when applying the patches to the hadoop-0.20.2 code. Can somebody help.

          Show
          Murugaprabu Marimuthu added a comment - I am also getting the same error when applying the patches to the hadoop-0.20.2 code. Can somebody help.
          Hide
          Tony Valderrama added a comment -

          This is a patch against hadoop-0.20.3-rc2, which was extracted from facebook's hadoop-20-warehouse release in December 2010 (plus some bugfixes). This also includes example startup scripts. One major bug is that the hdfs cluster must run with permissions disabled (dfs.permissions=false) because the standby avatar node doesn't use the login credentials when attempting to enter safe mode. Otherwise, it seems to function correctly.

          Show
          Tony Valderrama added a comment - This is a patch against hadoop-0.20.3-rc2, which was extracted from facebook's hadoop-20-warehouse release in December 2010 (plus some bugfixes). This also includes example startup scripts. One major bug is that the hdfs cluster must run with permissions disabled (dfs.permissions=false) because the standby avatar node doesn't use the login credentials when attempting to enter safe mode. Otherwise, it seems to function correctly.
          Hide
          wangfeng added a comment -

          I have downloaded the patch and patched it successfully,but I don't know how to configure .pls help me. listint the config detail is very good.

          Show
          wangfeng added a comment - I have downloaded the patch and patched it successfully,but I don't know how to configure .pls help me. listint the config detail is very good.
          Hide
          W S Chung added a comment -

          I try applying the patch AvatarNode.20.patch against the current stable release 0.20.203.0. I also get the compilation error that FSImage does not have the saveNamespace() method. Which version of 0.20 should I be apply the patch against?

          Show
          W S Chung added a comment - I try applying the patch AvatarNode.20.patch against the current stable release 0.20.203.0. I also get the compilation error that FSImage does not have the saveNamespace() method. Which version of 0.20 should I be apply the patch against?
          Hide
          W S Chung added a comment -

          Nevermind. I did not realize that the merged code can be downloaded from https://github.com/facebook/hadoop-20-warehouse.

          Show
          W S Chung added a comment - Nevermind. I did not realize that the merged code can be downloaded from https://github.com/facebook/hadoop-20-warehouse .
          Hide
          Shanmuganathan Ramalingam added a comment -

          I have downloaded code from https://github.com/facebook/hadoop-20-warehouse site.When I compiled this patch, I got the following error.How can I rectify it?

          compile-mapred-classes:
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/build.xml:392: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
          [javac] Compiling 304 source files to /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/build/classes
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1200: warning: [unchecked] unchecked cast
          [javac] found : java.lang.Class<capture#419 of ? extends org.apache.hadoop.mapred.Reducer>
          [javac] required: java.lang.Class<? extends org.apache.hadoop.mapred.Reducer<K,V,K,V>>
          [javac] (Class<? extends Reducer<K,V,K,V>>) job.getCombinerClass();
          [javac] ^
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1202: warning: [unchecked] unchecked call to OldCombinerRunner(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer<K,V,K,V>>,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.Counters.Counter,org.apache.hadoop.mapred.Task.TaskReporter) as a member of the raw type org.apache.hadoop.mapred.Task.OldCombinerRunner
          [javac] return new OldCombinerRunner(cls, job, inputCounter, reporter);
          [javac] ^
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1202: warning: [unchecked] unchecked conversion
          [javac] found : org.apache.hadoop.mapred.Task.OldCombinerRunner
          [javac] required: org.apache.hadoop.mapred.Task.CombinerRunner<K,V>
          [javac] return new OldCombinerRunner(cls, job, inputCounter, reporter);
          [javac] ^
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1209: warning: [unchecked] unchecked cast
          [javac] found : java.lang.Class<capture#361 of ? extends org.apache.hadoop.mapreduce.Reducer<?,?,?,?>>
          [javac] required: java.lang.Class<? extends org.apache.hadoop.mapreduce.Reducer<K,V,K,V>>
          [javac] taskContext.getCombinerClass();
          [javac] ^
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1231: warning: [unchecked] unchecked cast
          [javac] found : java.lang.Class<capture#478 of ?>
          [javac] required: java.lang.Class<K>
          [javac] keyClass = (Class<K>) job.getMapOutputKeyClass();
          [javac] ^
          .
          .
          .
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java:1262: reference to CorruptFileBlocks is ambiguous, both class org.apache.hadoop.hdfs.protocol.CorruptFileBlocks in org.apache.hadoop.hdfs.protocol and class org.apache.hadoop.fs.CorruptFileBlocks in org.apache.hadoop.fs match
          [javac] return new CorruptFileBlocks(str.toArray(new String[str.size()]), "");
          [javac] ^
          [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/hdfs/org/apache/hadoop/hdfs/tools/JMXGet.java:148: warning: sun.management.ConnectorAddressLink is Sun proprietary API and may be removed in a future release
          [javac] url_string = ConnectorAddressLink.importFrom(Integer.parseInt(localVMPid));
          [javac] ^
          [javac] Note: Some input files use or override a deprecated API.
          [javac] Note: Recompile with -Xlint:deprecation for details.
          [javac] 7 errors
          [javac] 2 warnings

          BUILD FAILED
          /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/build.xml:428: Compile failed; see the compiler error output for details.

          Total time: 4 minutes 17 seconds

          Show
          Shanmuganathan Ramalingam added a comment - I have downloaded code from https://github.com/facebook/hadoop-20-warehouse site.When I compiled this patch, I got the following error.How can I rectify it? compile-mapred-classes: [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/build.xml:392: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 304 source files to /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/build/classes [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1200: warning: [unchecked] unchecked cast [javac] found : java.lang.Class<capture#419 of ? extends org.apache.hadoop.mapred.Reducer> [javac] required: java.lang.Class<? extends org.apache.hadoop.mapred.Reducer<K,V,K,V>> [javac] (Class<? extends Reducer<K,V,K,V>>) job.getCombinerClass(); [javac] ^ [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1202: warning: [unchecked] unchecked call to OldCombinerRunner(java.lang.Class<? extends org.apache.hadoop.mapred.Reducer<K,V,K,V>>,org.apache.hadoop.mapred.JobConf,org.apache.hadoop.mapred.Counters.Counter,org.apache.hadoop.mapred.Task.TaskReporter) as a member of the raw type org.apache.hadoop.mapred.Task.OldCombinerRunner [javac] return new OldCombinerRunner(cls, job, inputCounter, reporter); [javac] ^ [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1202: warning: [unchecked] unchecked conversion [javac] found : org.apache.hadoop.mapred.Task.OldCombinerRunner [javac] required: org.apache.hadoop.mapred.Task.CombinerRunner<K,V> [javac] return new OldCombinerRunner(cls, job, inputCounter, reporter); [javac] ^ [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1209: warning: [unchecked] unchecked cast [javac] found : java.lang.Class<capture#361 of ? extends org.apache.hadoop.mapreduce.Reducer<?,?,?,?>> [javac] required: java.lang.Class<? extends org.apache.hadoop.mapreduce.Reducer<K,V,K,V>> [javac] taskContext.getCombinerClass(); [javac] ^ [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/mapred/org/apache/hadoop/mapred/Task.java:1231: warning: [unchecked] unchecked cast [javac] found : java.lang.Class<capture#478 of ?> [javac] required: java.lang.Class<K> [javac] keyClass = (Class<K>) job.getMapOutputKeyClass(); [javac] ^ . . . [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java:1262: reference to CorruptFileBlocks is ambiguous, both class org.apache.hadoop.hdfs.protocol.CorruptFileBlocks in org.apache.hadoop.hdfs.protocol and class org.apache.hadoop.fs.CorruptFileBlocks in org.apache.hadoop.fs match [javac] return new CorruptFileBlocks(str.toArray(new String [str.size()] ), ""); [javac] ^ [javac] /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/src/hdfs/org/apache/hadoop/hdfs/tools/JMXGet.java:148: warning: sun.management.ConnectorAddressLink is Sun proprietary API and may be removed in a future release [javac] url_string = ConnectorAddressLink.importFrom(Integer.parseInt(localVMPid)); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 7 errors [javac] 2 warnings BUILD FAILED /home/test/ips/facebook-hadoop-20-warehouse-bbfed86/build.xml:428: Compile failed; see the compiler error output for details. Total time: 4 minutes 17 seconds
          Hide
          Harsh J added a comment -

          Can we close this out in face of HDFS-1623, which has been implemented and also is in use at various places?

          Show
          Harsh J added a comment - Can we close this out in face of HDFS-1623 , which has been implemented and also is in use at various places?
          Hide
          Suresh Srinivas added a comment -

          +1. Please close the jira.

          Show
          Suresh Srinivas added a comment - +1. Please close the jira.
          Hide
          Harsh J added a comment -

          A working HDFS HA mode has been implemented via HDFS-1623. Closing this one out as a 'dupe'.

          Show
          Harsh J added a comment - A working HDFS HA mode has been implemented via HDFS-1623 . Closing this one out as a 'dupe'.

            People

            • Assignee:
              Dmytro Molkov
              Reporter:
              dhruba borthakur
            • Votes:
              2 Vote for this issue
              Watchers:
              61 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development