Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6145

Stopping unexpected exception from propagating to avoid serious consequences

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • None
    • None

    Description

      There are a few cases where an exception should never have occurred, but the code simply logged it and let the execution continue. Since they shouldn't have occurred, a safer way may be to simply terminate the execution and stop them from propagating into some unexpected consequences.

      ==========================
      Case 1:
      Line: 336, File: "org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java"

      325:       try {
      326:         Quota.Counts counts = cleanSubtree(snapshot, prior, collectedBlocks,
      327:             removedINodes, true);
      328:         INodeDirectory parent = getParent();
       .. ..
      335:       } catch(QuotaExceededException e) {
      336:         LOG.error("BUG: removeSnapshot increases namespace usage.", e);
      337:       }
      

      Since this shouldn't have occurred unless some unexpected bugs occur,
      should the NN simply stop the execution to prevent bad things from propagation?

      Similar handling of QuotaExceededException can be found at:
      Line: 544, File: "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
      Line: 657, File: "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
      Line: 669, File: "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
      ==========================================
      ==========================
      Case 2:
      Line: 601, File: "org/apache/hadoop/hdfs/server/namenode/JournalSet.java"

      591:  public synchronized RemoteEditLogManifest getEditLogManifest(long fromTxId,
      ..
      595:    for (JournalAndStream j : journals) {
      ..
      598:         try {
      599:           allLogs.addAll(fjm.getRemoteEditLogs(fromTxId, forReading, false));
      600:         } catch (Throwable t) {
      601:           LOG.warn("Cannot list edit logs in " + fjm, t);
      602:         }
      

      An exception from addAll will result in some edit log files not considered, and not included in the checkpoint, which may result in dataloss.
      ==========================================
      ==========================
      Case 3:
      Line: 4029, File: "org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java"

      4010:       try {
      4011:         while (fsRunning && shouldNNRmRun) {
      4012:           checkAvailableResources();
      4013:           if(!nameNodeHasResourcesAvailable()) {
      4014:             String lowResourcesMsg = "NameNode low on available disk space. ";
      4015:             if (!isInSafeMode()) {
      4016:               FSNamesystem.LOG.warn(lowResourcesMsg + "Entering safe mode.");
      4017:             } else {
      4018:               FSNamesystem.LOG.warn(lowResourcesMsg + "Already in safe mode.");
      4019:             }
      4020:             enterSafeMode(true);
      4021:           }
      .. ..
      4027:         }
      4028:       } catch (Exception e) {
      4029:         FSNamesystem.LOG.error("Exception in NameNodeResourceMonitor: ", e);
      4030:       }
      

      enterSafeMode might thrown exception. In the case of not being able to entering safe mode, should the execution simply terminate?
      ==========================================

      Attachments

        Activity

          People

            Unassigned Unassigned
            d.yuan Ding Yuan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: