Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28665

WALs not marked closed when there are errors in closing WALs

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      In our production clusters we have observed that when WAL close fails It causes the the oldWAL files not marked as close and not letting them cleaned. When a WAL close fails in closeWriter it increments the error count.

      Span span = Span.current();
       try {
            span.addEvent("closing writer");
            writer.close();
            span.addEvent("writer closed");
          } catch (IOException ioe) {
            int errors = closeErrorCount.incrementAndGet();
            boolean hasUnflushedEntries = isUnflushedEntries();
            if (syncCloseCall && (hasUnflushedEntries || (errors > this.closeErrorsTolerated))) {
              LOG.error("Close of WAL " + path + " failed. Cause=\"" + ioe.getMessage() + "\", errors="
                + errors + ", hasUnflushedEntries=" + hasUnflushedEntries);
              throw ioe;
            }
            LOG.warn("Riding over failed WAL close of " + path
              + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK", ioe);
          }
      

      When there are errors in closing WAL only twice doReplaceWALWriter enters this code block

      if (isUnflushedEntries() || closeErrorCount.get() >= this.closeErrorsTolerated) {
                try {
                  closeWriter(this.writer, oldPath, true);
                } finally {
                  inflightWALClosures.remove(oldPath.getName());
                }
              }
      

      as we don't mark them closed here like we do it here

        Writer localWriter = this.writer;
                closeExecutor.execute(() -> {
                  try {
                    closeWriter(localWriter, oldPath, false);
                  } catch (IOException e) {
                    LOG.warn("close old writer failed", e);
                  } finally {
                    // call this even if the above close fails, as there is no other chance we can set
                    // closed to true, it will not cause big problems.
                   {color:red} markClosedAndClean(oldPath);{color}
                    inflightWALClosures.remove(oldPath.getName());
                  }
                });
      

      Attachments

        Activity

          People

            kiran.maturi Kiran Kumar Maturi
            kiran.maturi Kiran Kumar Maturi
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: