Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6772

Get DN storages out of blockContentsStale state faster after NN restarts

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Here is the non-HA scenario.

      1. Get HDFS into block-over-replicated situation.
      2. Restart the NN.
      3. From NN's point of view, DNs will remain in blockContentsStale==true state for a long time. That in turns make postponedMisreplicatedBlocks size big. Bigger postponedMisreplicatedBlocks size will impact blockreport latency. Given blockreport takes NN global lock, it has severe impact on NN performance and make the cluster unstable.

      Why will DNs remain in blockContentsStale==true state for a long time?

      1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in before heartbeat RPC. That is due to how BPServiceActor#offerService decides when to send blockreport and heartbeat. In the case of NN restart, NN will ask DN to register when NN gets the first heartbeat request; DN will then register with NN; followed by blockreport RPC; the heartbeat RPC will come after that.
      2. So right after the first blockreport, given heartbeatedSinceFailover remains false, blockContentsStale will stay true.

      DatanodeStorageInfo.java
        void receivedBlockReport() {
          if (heartbeatedSinceFailover) {
            blockContentsStale = false;
          }
          blockReportCount++;
        }
      

      3. So the DN will remain in blockContentsStale==true until the next blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to some large value.

        Attachments

        1. HDFS-6772.patch
          8 kB
          Ming Ma
        2. HDFS-6772-2.patch
          12 kB
          Ming Ma
        3. HDFS-6772-3.patch
          11 kB
          Ming Ma

          Activity

            People

            • Assignee:
              mingma Ming Ma
              Reporter:
              mingma Ming Ma
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: