Hadoop Common
  1. Hadoop Common
  2. HADOOP-1762

Namenode does not need to store storageID and datanodeID persistently

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently Namenode stores all the storage-ids it generates since the beginning (since last format). It allocates a new storageID everytime a new datanode comes online. It also stores all the known datanode ids since the beginning.

      It would be better if Namenode did not have to keep track of these. I will describe a proposal in the next comment.

      This has implecations regd how Namenode helps administrators identify 'dead datanodes' etc. These issues are addressed in HADOOP-1138.

      1. HADOOP-1762.patch
        9 kB
        Raghu Angadi
      2. HADOOP-1762-Opt2.patch
        8 kB
        Raghu Angadi
      3. HADOOP-1762-Opt2.patch
        8 kB
        Raghu Angadi
      4. HADOOP-1762-Opt2.patch
        8 kB
        Raghu Angadi
      5. HADOOP-1762-Opt2.patch
        8 kB
        Raghu Angadi
      6. HADOOP-1762-Opt2.patch
        9 kB
        Raghu Angadi
      7. HADOOP-1762.patch
        9 kB
        Raghu Angadi
      8. HADOOP-1762-Opt2.patch
        10 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-Nightly #241 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/241/ )
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Raghu.

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Raghu.
          Hide
          Raghu Angadi added a comment -

          +1 I think. Contrib failure is expected (HADOOP-1888).

          Show
          Raghu Angadi added a comment - +1 I think. Contrib failure is expected ( HADOOP-1888 ).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12365780/HADOOP-1762-Opt2.patch
          against trunk revision r575578.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12365780/HADOOP-1762-Opt2.patch against trunk revision r575578. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/765/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          Oops wrong patch attached earlier.

          Show
          Raghu Angadi added a comment - Oops wrong patch attached earlier.
          Hide
          Raghu Angadi added a comment -

          This patch includes (2) as well : Deprecates OP_DATANODE_ADD and OP_DATANODE_REMOVE.

          Show
          Raghu Angadi added a comment - This patch includes (2) as well : Deprecates OP_DATANODE_ADD and OP_DATANODE_REMOVE .
          Hide
          Raghu Angadi added a comment -

          Thanks for the detailed review Konstantin, as always.

          All changes except (2) above are included in the patch.
          1. done.
          2. Did not deprecate since we still need the constants (for backward compatibility). Added a comment
          3. Done. Left a detailed comment regd repeated messages.
          4. Added comment regd why storageID is pretty safe.

          Could you do a quick scan of the patch?

          Show
          Raghu Angadi added a comment - Thanks for the detailed review Konstantin, as always. All changes except (2) above are included in the patch. 1. done. 2. Did not deprecate since we still need the constants (for backward compatibility). Added a comment 3. Done. Left a detailed comment regd repeated messages. 4. Added comment regd why storageID is pretty safe. Could you do a quick scan of the patch?
          Hide
          Konstantin Shvachko added a comment - - edited

          Mostly cosmetic comments:

          1. Datanode.setNewStorageID():
                } catch (UnknownHostException ignored) {} 

            Should not be ignored but rather logged.

          2. We should deprecate FSEdits.OP_DATANODE_ADD and FSEdits.OP_DATANODE_REMOVE
            so that we could eventually remove them sometimes.
          3. FSNamesystem.registerDatanode():
                    NameNode.stateChangeLog.info( "BLOCK* NameSystem.registerDatanode: "
                                                  + "node " + nodeS.getName()
                                                  + " is replaced by " + nodeReg.getName() +
                                                  " since they have the same storageID " +
                                                  nodeReg.getStorageID() +
                                                  ". If this message repeats, both nodes " +
                                                  "might have same storageID by random " +
                                                  "chance. You need to restart one of the " +
                                                  "nodes with its data cleared. ");
            

            This message may cause confusion in the regular case.
            The possibility of storage id collision is so negligible that we should not hint on it every time one node legally replaces another with the same storage id.
            Let's retain the original redaction.

          I also reviewed the first variant of the patch. They are very similar. The main difference is who generates the new storage id
          for the new data-node. And I prefer the case when data-nodes are able to independently generate unique storage ids for themselves.

          The ids here are unique since:

          • data-nodes started on different nodes have different ips;
          • data-nodes started on the same node (but in different storage directories) at the same time will have different ports;
          • data-nodes started on the same node on the same port but at different times will have different timestamps;
          • in the latter case even somebody turns time back before starting the second node the random number is likely to prevent the collision.

          B.t.w. something like that should be in the comments for Datanode.setNewStorageID().

          Show
          Konstantin Shvachko added a comment - - edited Mostly cosmetic comments: Datanode.setNewStorageID(): } catch (UnknownHostException ignored) {} Should not be ignored but rather logged. We should deprecate FSEdits.OP_DATANODE_ADD and FSEdits.OP_DATANODE_REMOVE so that we could eventually remove them sometimes. FSNamesystem.registerDatanode(): NameNode.stateChangeLog.info( "BLOCK* NameSystem.registerDatanode: " + "node " + nodeS.getName() + " is replaced by " + nodeReg.getName() + " since they have the same storageID " + nodeReg.getStorageID() + ". If this message repeats, both nodes " + "might have same storageID by random " + "chance. You need to restart one of the " + "nodes with its data cleared. " ); This message may cause confusion in the regular case. The possibility of storage id collision is so negligible that we should not hint on it every time one node legally replaces another with the same storage id. Let's retain the original redaction. I also reviewed the first variant of the patch. They are very similar. The main difference is who generates the new storage id for the new data-node. And I prefer the case when data-nodes are able to independently generate unique storage ids for themselves. The ids here are unique since: data-nodes started on different nodes have different ips; data-nodes started on the same node (but in different storage directories) at the same time will have different ports; data-nodes started on the same node on the same port but at different times will have different timestamps; in the latter case even somebody turns time back before starting the second node the random number is likely to prevent the collision. B.t.w. something like that should be in the comments for Datanode.setNewStorageID().
          Hide
          Raghu Angadi added a comment -

          Better use of SecureRandom().

          Show
          Raghu Angadi added a comment - Better use of SecureRandom().
          Hide
          Raghu Angadi added a comment -

          Attached updated patch for option 2: It uses SecureRandom() (which hopefully gets its seed from something like /dev/[u]random) and includes port in the ID. Now the id string is around 45-50 bytes.

          Show
          Raghu Angadi added a comment - Attached updated patch for option 2: It uses SecureRandom() (which hopefully gets its seed from something like /dev/ [u] random) and includes port in the ID. Now the id string is around 45-50 bytes.
          Hide
          Raghu Angadi added a comment -

          > Isn't the pid required in case 2 datanodes are started simultaneously on the same machine?

          I'm thinking of using port instead of pid. would that work?

          Show
          Raghu Angadi added a comment - > Isn't the pid required in case 2 datanodes are started simultaneously on the same machine? I'm thinking of using port instead of pid. would that work?
          Hide
          Sameer Paranjpye added a comment -

          Isn't the pid required in case 2 datanodes are started simultaneously on the same machine?

          Show
          Sameer Paranjpye added a comment - Isn't the pid required in case 2 datanodes are started simultaneously on the same machine?
          Hide
          Raghu Angadi added a comment -

          Patch for second option. Here, datanode generates a storage ID of the form "DS-randInt-ip-currentTimeMillis". This does not include pid of the JVM since its very simple to to get the pid.

          Either of these patches removes datanodes from the persistent storage.

          Show
          Raghu Angadi added a comment - Patch for second option. Here, datanode generates a storage ID of the form "DS-randInt-ip-currentTimeMillis". This does not include pid of the JVM since its very simple to to get the pid. Either of these patches removes datanodes from the persistent storage.
          Hide
          Raghu Angadi added a comment -

          One version of the patch :

          • StorageIDs and datanodes are not stored persistently.
          • When ever a new StorageID is required, we generate new random ID just like before. This patch generates a 48bit integer instead of 32 bit.
          • In case of collision, new datanode replaces the old Datanode and new datanode gets a new storageID.
          • The old datanode will re-register. So the datanode will appear to be absent for one heartbeat period.
          • Probability of collision is nothing compared probablity of losing a datanode for other reasons. Even then the datnode would be missing only for a short while.

          I will attach another version of the patch which does the following :

          • Instead of Namenode generating the id, datanode itself generates an id in the form "randInt-ip-pid-timestamp".
          • This will make collision even less probable. Since Namenode does not assign StorageIDs, it can not work around like the the current patch. Namenode will just log the suspected collision. In this case, two datanodes involved in the collision will keep replacing each other.
          • We could make namenode change a storageID. But this goes against the requirement that only datanode should assign a storageID.
          • Note that this will increase the size of StorageID string which is copied everytime DatanodeID is passed.
          Show
          Raghu Angadi added a comment - One version of the patch : StorageIDs and datanodes are not stored persistently. When ever a new StorageID is required, we generate new random ID just like before. This patch generates a 48bit integer instead of 32 bit. In case of collision, new datanode replaces the old Datanode and new datanode gets a new storageID. The old datanode will re-register. So the datanode will appear to be absent for one heartbeat period. Probability of collision is nothing compared probablity of losing a datanode for other reasons. Even then the datnode would be missing only for a short while. I will attach another version of the patch which does the following : Instead of Namenode generating the id, datanode itself generates an id in the form "randInt-ip-pid-timestamp". This will make collision even less probable. Since Namenode does not assign StorageIDs, it can not work around like the the current patch. Namenode will just log the suspected collision. In this case, two datanodes involved in the collision will keep replacing each other. We could make namenode change a storageID. But this goes against the requirement that only datanode should assign a storageID. Note that this will increase the size of StorageID string which is copied everytime DatanodeID is passed.
          Hide
          dhruba borthakur added a comment -

          I like the option with lastStorageId persisted in the namenode. It removes the possibility (however miniscule) of storageId collision.

          Show
          dhruba borthakur added a comment - I like the option with lastStorageId persisted in the namenode. It removes the possibility (however miniscule) of storageId collision.
          Hide
          Raghu Angadi added a comment -

          Proposed implementation :

          • Namenode stores one integer lastStorageId persistently
          • When a Namenode starts, it does know about any storageIds except lastStorageId
          • When a datanode D1 registers:
                if ( D1.storageID == 0 or D1.storageID > lastStorageId) {
                   D1.storageID = lastStorageIDd++; // take care of overflow etc
                   EditLog.write.(LAST_STORAGE_ID, lastStorageID);
                }
                // same as current behaviour
                // Check if D1.storageID is already registered etc
                
          • Another simpler alternative: Don't keep track of lastStorageID but always assign a random storage id when ever a new storage ID is required. Especially if we use 64 bit integer, probability of collision is pretty much as low.
          • What about when lastStorageID is INT_MAX? We can use 64bit integer.. probably we should. And even if 32bit integer rolls, its ok.
          • In either case, collision probability would still be minuscule compared to probability of similar damage (losing a datanode).
          • If there is an actual collision, apart from namenode losing one datanode, there is another consequence : If two nodes Dx and Dy get the same storage id, then each will keep replacing the other at the namenode. To avoid this, whenever a new datanode registers with an existing storage id, just assign a new storage id, instead of reusing the old one.
          • If we use 'lastStorageID' method, then, when a datanode starts up this hadoop 0.15 for the first time, it should zero out its storage id. Apart from this, there are no other changes required at the datanode.

          I personally prefer the random storage id.

          Show
          Raghu Angadi added a comment - Proposed implementation : Namenode stores one integer lastStorageId persistently When a Namenode starts, it does know about any storageIds except lastStorageId When a datanode D1 registers: if ( D1.storageID == 0 or D1.storageID > lastStorageId) { D1.storageID = lastStorageIDd++; // take care of overflow etc EditLog.write.(LAST_STORAGE_ID, lastStorageID); } // same as current behaviour // Check if D1.storageID is already registered etc Another simpler alternative: Don't keep track of lastStorageID but always assign a random storage id when ever a new storage ID is required. Especially if we use 64 bit integer, probability of collision is pretty much as low. What about when lastStorageID is INT_MAX? We can use 64bit integer.. probably we should. And even if 32bit integer rolls, its ok. In either case, collision probability would still be minuscule compared to probability of similar damage (losing a datanode). If there is an actual collision, apart from namenode losing one datanode, there is another consequence : If two nodes Dx and Dy get the same storage id, then each will keep replacing the other at the namenode. To avoid this, whenever a new datanode registers with an existing storage id, just assign a new storage id, instead of reusing the old one. If we use 'lastStorageID' method, then, when a datanode starts up this hadoop 0.15 for the first time, it should zero out its storage id. Apart from this, there are no other changes required at the datanode. I personally prefer the random storage id.

            People

            • Assignee:
              Raghu Angadi
              Reporter:
              Raghu Angadi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development