Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10942

Incorrect handling of flushing edit logs to JN

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • hdfs, s
    • None

    Description

      We use EditsDoubleBuffer to handle edit logs:

      /**
       * A double-buffer for edits. New edits are written into the first buffer
       * while the second is available to be flushed. Each time the double-buffer
       * is flushed, the two internal buffers are swapped. This allows edits
       * to progress concurrently to flushes without allocating new buffers each
       * time.
       */
      

      With the following code, that flush the ready buffer, it copy the ready buffer to a local copy, then flush.

      QuarumOutputStream (buf in the code below is an instance of EditsDoubleBuffer):
      
        @Override
        protected void flushAndSync(boolean durable) throws IOException {
          int numReadyBytes = buf.countReadyBytes();
          if (numReadyBytes > 0) {
            int numReadyTxns = buf.countReadyTxns();
            long firstTxToFlush = buf.getFirstReadyTxId();
      
            assert numReadyTxns > 0;
      
            // Copy from our double-buffer into a new byte array. This is for
            // two reasons:
            // 1) The IPC code has no way of specifying to send only a slice of
            //    a larger array.
            // 2) because the calls to the underlying nodes are asynchronous, we
            //    need a defensive copy to avoid accidentally mutating the buffer
            //    before it is sent.
            DataOutputBuffer bufToSend = new DataOutputBuffer(numReadyBytes);
            buf.flushTo(bufToSend);
            assert bufToSend.getLength() == numReadyBytes;
            byte[] data = bufToSend.getData();
            assert data.length == bufToSend.getLength();
      

      The above call doesn't seem to prevent the orginal copy of the buffer inside buf from being swapped by the following method.

      EditsDoubleBuffer:
      
       public void setReadyToFlush() {
          assert isFlushed() : "previous data not flushed yet";
          TxnBuffer tmp = bufReady;
          bufReady = bufCurrent;
          bufCurrent = tmp;
        }
      
      

      Though we have some runtime assertion in the code, the assertion is not enabled in production, so the expected condition in the assert may be false at runtime. This would possibly cause a mess. When any condition is not as expected by the assertion, it seems a real exception should be thrown instead.

      So two issues in summary:

      • How we synchronize between the flush and the swap of the two buffers
      • Whether we should throw real exception instead of the assert that's disabled at runtime normally.

      A possible third issue:

      • If NN crashes for some reason, before the edits in bufCurrent gets to bufReady and flushed, these edits in will be lost.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yzhangal Yongjun Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: