Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7575

Upgrade should generate a unique storage ID for each volume

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.4.0, 2.5.0, 2.6.0
    • 2.7.0, 2.6.1, 3.0.0-alpha1
    • None
    • Reviewed

    Description

      Before HDFS-2832 each DataNode would have a unique storageId which included its IP address. Since HDFS-2832 the DataNodes have a unique storageId per storage directory which is just a random UUID.

      They send reports per storage directory in their heartbeats. This heartbeat is processed on the NameNode in the DatanodeDescriptor#updateHeartbeatState method. Pre HDFS-2832 this would just store the information per Datanode. After the patch though each DataNode can have multiple different storages so it's stored in a map keyed by the storage Id.

      This works fine for all clusters that have been installed post HDFS-2832 as they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 different keys. On each Heartbeat the Map is searched and updated (DatanodeStorageInfo storage = storageMap.get(s.getStorageID());):

      DatanodeStorageInfo
        void updateState(StorageReport r) {
          capacity = r.getCapacity();
          dfsUsed = r.getDfsUsed();
          remaining = r.getRemaining();
          blockPoolUsed = r.getBlockPoolUsed();
        }
      

      On clusters that were upgraded from a pre HDFS-2832 version though the storage Id has not been rewritten (at least not on the four clusters I checked) so each directory will have the exact same storageId. That means there'll be only a single entry in the storageMap and it'll be overwritten by a random StorageReport from the DataNode. This can be seen in the updateState method above. This just assigns the capacity from the received report, instead it should probably sum it up per received heartbeat.

      The Balancer seems to be one of the only things that actually uses this information so it now considers the utilization of a random drive per DataNode for balancing purposes.

      Things get even worse when a drive has been added or replaced as this will now get a new storage Id so there'll be two entries in the storageMap. As new drives are usually empty it skewes the balancers decision in a way that this node will never be considered over-utilized.

      Another problem is that old StorageReports are never removed from the storageMap. So if I replace a drive and it gets a new storage Id the old one will still be in place and used for all calculations by the Balancer until a restart of the NameNode.

      I can try providing a patch that does the following:

      • Instead of using a Map I could just store the array we receive or instead of storing an array sum up the values for reports with the same Id
      • On each heartbeat clear the map (so we know we have up to date information)

      Does that sound sensible?

      Attachments

        1. HDFS-7575.01.patch
          1.69 MB
          Arpit Agarwal
        2. HDFS-7575.02.patch
          28 kB
          Arpit Agarwal
        3. HDFS-7575.03.binary.patch
          61 kB
          Arpit Agarwal
        4. HDFS-7575.03.patch
          27 kB
          Arpit Agarwal
        5. testUpgrade22via24GeneratesStorageIDs.tgz
          7 kB
          Arpit Agarwal
        6. testUpgradeFrom22GeneratesStorageIDs.tgz
          13 kB
          Arpit Agarwal
        7. testUpgradeFrom24PreservesStorageId.tgz
          14 kB
          Arpit Agarwal
        8. HDFS-7575.04.binary.patch
          61 kB
          Arpit Agarwal
        9. HDFS-7575.04.patch
          26 kB
          Arpit Agarwal
        10. HDFS-7575.05.binary.patch
          32 kB
          Arpit Agarwal
        11. HDFS-7575.05.patch
          21 kB
          Arpit Agarwal

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            arp Arpit Agarwal
            larsfrancke Lars Francke
            Votes:
            0 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment