Hadoop Common
  1. Hadoop Common
  2. HADOOP-3337

Name-node fails to start because DatanodeInfo format changed.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.0
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None

      Description

      HADOOP-3283 introduced a new field ipcPort in DatanodeInfo, which was not reflected in the reading/writing file system image files.
      Particularly, reading edits generated by the previous version of hadoop throws the following exception:

      08/05/02 00:02:50 ERROR dfs.NameNode: java.lang.IllegalArgumentException: No enum const class org.apache.hadoop.dfs.DatanodeInfo$AdminStates.0?
      /56.313
      	at java.lang.Enum.valueOf(Enum.java:192)
      	at org.apache.hadoop.io.WritableUtils.readEnum(WritableUtils.java:399)
      	at org.apache.hadoop.dfs.DatanodeInfo.readFields(DatanodeInfo.java:318)
      	at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
      	at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:499)
      	at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:794)
      	at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:664)
      	at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:280)
      	at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)
      	at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:276)
      	at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:257)
      	at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)
      	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
      	at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
      	at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:777)
      	at org.apache.hadoop.dfs.NameNode.main(NameNode.java:786)
      

      and startup fails.

      1. 3337_20080501.patch
        3 kB
        Tsz Wo Nicholas Sze
      2. 3337_20080501b.patch
        4 kB
        Tsz Wo Nicholas Sze
      3. 3337_20080502.patch
        7 kB
        Tsz Wo Nicholas Sze
      4. 3337_20080502b.patch
        8 kB
        Tsz Wo Nicholas Sze

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #483 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/483/ )
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Nicholas!

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Nicholas!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Tested manually.

          The 4 new javac warnings are due to the use of UTF8 for backward compatibility.

          Show
          Tsz Wo Nicholas Sze added a comment - Tested manually. The 4 new javac warnings are due to the use of UTF8 for backward compatibility.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12381344/3337_20080502b.patch
          against trunk revision 645773.

          @author +1. The patch does not contain any @author tags.

          tests included -1. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac -1. The applied patch generated 462 javac compiler warnings (more than the trunk's current 458 warnings).

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381344/3337_20080502b.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 462 javac compiler warnings (more than the trunk's current 458 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2371/console This message is automatically generated.
          Hide
          Konstantin Shvachko added a comment -

          DatanodeDescriptor is not sent over RPC and is not supposed to. You can never get DatanodeDescriptor on the other end.
          DatanodeDescriptor is sort of a name-node private class.
          Although the actual class is DatanodeDescriptor, rpc serializes the base class DatanodeInfo
          using its Writable implementation and sends the latter over the network.
          The problem here is that the serialization intended for DatanodeDescriptor (which is only serialized to disk)
          is mixed with the serialization of DatanodeInfo (which should be used only for rpc).
          We have been through this before.
          I think we should introduce 2 new static methods in the DatanodeDescriptor that would provide serialization to disk.

          Show
          Konstantin Shvachko added a comment - DatanodeDescriptor is not sent over RPC and is not supposed to. You can never get DatanodeDescriptor on the other end. DatanodeDescriptor is sort of a name-node private class. Although the actual class is DatanodeDescriptor, rpc serializes the base class DatanodeInfo using its Writable implementation and sends the latter over the network. The problem here is that the serialization intended for DatanodeDescriptor (which is only serialized to disk) is mixed with the serialization of DatanodeInfo (which should be used only for rpc). We have been through this before. I think we should introduce 2 new static methods in the DatanodeDescriptor that would provide serialization to disk.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3337_20080502b.patch: use static methods instead of subclass.

          Show
          Tsz Wo Nicholas Sze added a comment - 3337_20080502b.patch: use static methods instead of subclass.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Raghu, thank you for pointing out that DatanodeDescriptor is sent in RPC in some hidden way.

          3337_20080502.patch: created a subclass to fix this bug.

          Show
          Tsz Wo Nicholas Sze added a comment - Raghu, thank you for pointing out that DatanodeDescriptor is sent in RPC in some hidden way. 3337_20080502.patch: created a subclass to fix this bug.
          Hide
          Raghu Angadi added a comment -

          > Example use: [...]
          Actually this is not a potential problem, rather real one. Most unit tests fail with this patch.

          Show
          Raghu Angadi added a comment - > Example use: [...] Actually this is not a potential problem, rather real one. Most unit tests fail with this patch.
          Hide
          Raghu Angadi added a comment -

          > DatanodeDescriptor is not used in RPC only DatanodeInfo.

          Sure it is. Even if it is not, I don't think its a good practice to silently break the contract because we think the contract is not used (yet), (especially for widely used interfaces like Writables)

          Example use: ClientProtocol.getBlockLocations() returns LocatedBlocks, if you trace its implementation, you will see that LocatedBlock is created using DatanodeDescriptor (around FSNamesystem.java:747) .. so DatanodeDescriptor.read() etc do get called out side of FSEditLog.

          Show
          Raghu Angadi added a comment - > DatanodeDescriptor is not used in RPC only DatanodeInfo. Sure it is. Even if it is not, I don't think its a good practice to silently break the contract because we think the contract is not used (yet), (especially for widely used interfaces like Writables) Example use: ClientProtocol.getBlockLocations() returns LocatedBlocks , if you trace its implementation, you will see that LocatedBlock is created using DatanodeDescriptor (around FSNamesystem.java:747) .. so DatanodeDescriptor.read() etc do get called out side of FSEditLog.
          Hide
          Raghu Angadi added a comment - - edited

          Does it mean the current patch is ok?

          But we should not have wrong implementations of Writable interface for DatanodeDescriptor, right?

          Could you describe the fix (and may be problem)?

          > We will remove storing DatanodeDescriptor to FSEditLog In HADOOP-3329 soon.

          Was this stored before HADOOP-3283?

          Show
          Raghu Angadi added a comment - - edited Does it mean the current patch is ok? But we should not have wrong implementations of Writable interface for DatanodeDescriptor, right? Could you describe the fix (and may be problem)? > We will remove storing DatanodeDescriptor to FSEditLog In HADOOP-3329 soon. Was this stored before HADOOP-3283 ?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I think this needs to be fixed better.

          I agree. We will remove storing DatanodeDescriptor to FSEditLog In HADOOP-3329 soon. Therefore, I don't want to introduce layout change or protocol change in this patch.

          Show
          Tsz Wo Nicholas Sze added a comment - > I think this needs to be fixed better. I agree. We will remove storing DatanodeDescriptor to FSEditLog In HADOOP-3329 soon. Therefore, I don't want to introduce layout change or protocol change in this patch.
          Hide
          Konstantin Shvachko added a comment -

          DatanodeDescriptor is not used in RPC only DatanodeInfo.

          Show
          Konstantin Shvachko added a comment - DatanodeDescriptor is not used in RPC only DatanodeInfo.
          Hide
          Raghu Angadi added a comment -

          Wouldn't this affect readFields() and write() of DatanodeDescriptor (used everywhere : RPCs etc) ? This patch looks like a problematic hack. I think this needs to be fixed better. If EditLog requires to read and write differently these different serialization should used there instead of everywhere.

          Show
          Raghu Angadi added a comment - Wouldn't this affect readFields() and write() of DatanodeDescriptor (used everywhere : RPCs etc) ? This patch looks like a problematic hack. I think this needs to be fixed better. If EditLog requires to read and write differently these different serialization should used there instead of everywhere.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3337_20080501b.patch: incorporated Konstantin's comments.

          Show
          Tsz Wo Nicholas Sze added a comment - 3337_20080501b.patch: incorporated Konstantin's comments.
          Hide
          Konstantin Shvachko added a comment -

          This patch works on my old file system image. Minor comments, please

          • remove import of UTF8
          • provide comments on the 2 new methods *FSEditLog() explaining what they are for.
          Show
          Konstantin Shvachko added a comment - This patch works on my old file system image. Minor comments, please remove import of UTF8 provide comments on the 2 new methods *FSEditLog() explaining what they are for.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          3334_20080501.patch => 3337_20080501.patch

          Show
          Tsz Wo Nicholas Sze added a comment - 3334_20080501.patch => 3337_20080501.patch
          Hide
          Tsz Wo Nicholas Sze added a comment - - edited

          3334_20080501.patch: reverted the accidental changes of FSEditLog format in HADOOP-3283.

          Show
          Tsz Wo Nicholas Sze added a comment - - edited 3334_20080501.patch: reverted the accidental changes of FSEditLog format in HADOOP-3283 .

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development