Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12703

Exceptions are fatal to decommissioning monitor

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.0
    • Fix Version/s: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The DecommissionManager.Monitor runs as an executor scheduled task. If an exception occurs, all decommissioning ceases until the NN is restarted. Per javadoc for executor#scheduleAtFixedRate: If any execution of the task encounters an exception, subsequent executions are suppressed. The monitor thread is alive but blocked waiting for an executor task that will never come. The code currently disposes of the future so the actual exception that aborted the task is gone.

      Failover is insufficient since the task is also likely dead on the standby. Replication queue init after the transition to active will fix the under replication of blocks on currently decommissioning nodes but future nodes never decommission. The standby must be bounced prior to failover – and hopefully the error condition does not reoccur.

        Attachments

        1. HDFS-12703.001.patch
          9 kB
          Xue Liu
        2. HDFS-12703.002.patch
          8 kB
          Xiaoqiao He
        3. HDFS-12703.003.patch
          8 kB
          Xiaoqiao He
        4. HDFS-12703.004.patch
          8 kB
          Xiaoqiao He
        5. HDFS-12703.005.patch
          10 kB
          Xiaoqiao He
        6. HDFS-12703.006.patch
          12 kB
          Xiaoqiao He
        7. HDFS-12703.007.patch
          12 kB
          Xiaoqiao He
        8. HDFS-12703.008.patch
          12 kB
          Xiaoqiao He
        9. HDFS-12703.009.patch
          12 kB
          Xiaoqiao He
        10. HDFS-12703.010.patch
          12 kB
          Xiaoqiao He
        11. HDFS-12703.011.patch
          12 kB
          Xiaoqiao He
        12. HDFS-12703.012.patch
          12 kB
          Xiaoqiao He
        13. HDFS-12703.013.patch
          12 kB
          Xiaoqiao He

          Issue Links

            Activity

              People

              • Assignee:
                hexiaoqiao Xiaoqiao He
                Reporter:
                daryn Daryn Sharp
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: