Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2702

A single failed name dir can cause the NN to exit

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      There's a bug in FSEditLog#rollEditLog which results in the NN process exiting if a single name dir has failed. Here's the relevant code:

      close()  // So editStreams.size() is 0 
      foreach edits dir {
        ..
        eStream = new ...  // Might get an IOE here
        editStreams.add(eStream);
      } catch (IOException ioe) {
        removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
      }
      

      If we get an IOException before we've added two edits streams to the list we'll exit, eg if there's an error processing the 1st name dir we'll exit even if there are 4 valid name dirs. The fix is to move the checking out of removeEditsForStorageDir (nee processIOError) or modify it so it can be disabled in some cases, eg here where we don't yet know how many streams are valid.

      1. hdfs-2702.txt
        13 kB
        Eli Collins
      2. hdfs-2702.txt
        12 kB
        Eli Collins
      3. hdfs-2702.txt
        12 kB
        Eli Collins
      4. hdfs-2702.txt
        11 kB
        Eli Collins
      5. hdfs-2702.txt
        4 kB
        Eli Collins

        Issue Links

          Activity

          Eli Collins created issue -
          Eli Collins made changes -
          Field Original Value New Value
          Fix Version/s 1.1.0 [ 12317959 ]
          Affects Version/s 1.1.0 [ 12317959 ]
          Affects Version/s 0.20.205.0 [ 12316392 ]
          Target Version/s 1.1.0 [ 12317959 ]
          Eli Collins made changes -
          Affects Version/s 1.0.0 [ 12318243 ]
          Affects Version/s 1.1.0 [ 12317959 ]
          Eli Collins made changes -
          Attachment hdfs-2702.txt [ 12507805 ]
          Eli Collins made changes -
          Attachment hdfs-2702.txt [ 12507847 ]
          Eli Collins made changes -
          Attachment hdfs-2702.txt [ 12507848 ]
          Todd Lipcon made changes -
          Link This issue is blocked by HDFS-2701 [ HDFS-2701 ]
          Todd Lipcon made changes -
          Link This issue is blocked by HDFS-2703 [ HDFS-2703 ]
          Eli Collins made changes -
          Attachment hdfs-2702.txt [ 12507943 ]
          Eli Collins made changes -
          Attachment hdfs-2702.txt [ 12507968 ]
          Eli Collins made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Eli Collins made changes -
          Fix Version/s 1.1.0 [ 12317959 ]
          Target Version/s 1.1.0 [ 12317959 ]
          Eli Collins made changes -
          Component/s name-node [ 12312926 ]
          Suresh Srinivas made changes -
          Fix Version/s 1.0.2 [ 12320051 ]
          Target Version/s 1.0.2 [ 12320051 ]
          Matt Foley made changes -
          Fix Version/s 1.1.0 [ 12317959 ]
          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Eli Collins made changes -
          Link This issue is related to HDFS-3310 [ HDFS-3310 ]

            People

            • Assignee:
              Eli Collins
              Reporter:
              Eli Collins
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development