Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3399 BookKeeper option support for NN HA
  3. HDFS-3423

BKJM: NN startup is failing, when tries to recoverUnfinalizedSegments() a bad inProgress_ ZNodes

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha1
    • 2.0.2-alpha
    • None
    • None

    Description

      Say, the InProgress_000X node is corrupted due to not writing the data(version, ledgerId, firstTxId) to this inProgress_000X znode. Namenode startup has the logic to recover all the unfinalized segments, here will try to read the segment and getting shutdown.

      EditLogLedgerMetadata.java:
      
      static EditLogLedgerMetadata read(ZooKeeper zkc, String path)
            throws IOException, KeeperException.NoNodeException  {
            byte[] data = zkc.getData(path, false, null);
            String[] parts = new String(data).split(";");
            if (parts.length == 3)
               ....reading inprogress metadata
            else if (parts.length == 4)
               ....reading inprogress metadata
            else
              throw new IOException("Invalid ledger entry, "
                                    + new String(data));
            }
      

      Scenario:- Leaving bad inProgress_000X node ?
      Assume BKJM has created the inProgress_000X zNode and ZK is not available when trying to add the metadata. Now, inProgress_000X ends up with partial information.

      Attachments

        1. HDFS-3423.diff
          20 kB
          Ivan Kelly
        2. HDFS-3423.diff
          20 kB
          Ivan Kelly
        3. HDFS-3423.diff
          20 kB
          Ivan Kelly
        4. HDFS-3423.patch
          21 kB
          Uma Maheswara Rao G
        5. HDFS-3423.patch
          21 kB
          Uma Maheswara Rao G

        Issue Links

          Activity

            People

              ikelly Ivan Kelly
              rakeshr Rakesh Radhakrishnan
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: