Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2702

A single failed name dir can cause the NN to exit

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      There's a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Here's the relevant code:

      close()  // So editStreams.size() is 0 
      foreach edits dir {
        ..
        eStream = new ...  // Might get an IOE here
        editStreams.add(eStream);
      } catch (IOException ioe) {
        removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
      }
      

      If we get an IOException before we've added two edits streams to the list we'll exit, eg if there's an error processing the 1st name dir we'll exit even if there are 4 valid name dirs. The fix is to move the checking out of removeEditsForStorageDir (nee processIOError) or modify it so it can be disabled in some cases, eg here where we don't yet know how many streams are valid.

        Attachments

        1. hdfs-2702.txt
          13 kB
          Eli Collins
        2. hdfs-2702.txt
          12 kB
          Eli Collins
        3. hdfs-2702.txt
          12 kB
          Eli Collins
        4. hdfs-2702.txt
          11 kB
          Eli Collins
        5. hdfs-2702.txt
          4 kB
          Eli Collins

          Issue Links

            Activity

              People

              • Assignee:
                eli Eli Collins
                Reporter:
                eli Eli Collins
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: