Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8302 Consolidate log purging logic in QJM and FJM
  3. HDFS-8303

QJM should purge old logs in the current directory through FJM

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      As the first step of the consolidation effort, QJM should call its FJM to purge the current directory.

      The current QJM logic of purging current dir is very similar to FJM purging logic.

      QJM:

       private static final List<Pattern> CURRENT_DIR_PURGE_REGEXES =
            ImmutableList.of(
              Pattern.compile("edits_\\d+-(\\d+)"),
              Pattern.compile("edits_inprogress_(\\d+)(?:\\..*)?"));
      ...
                long txid = Long.parseLong(matcher.group(1));
                if (txid < minTxIdToKeep) {
                  LOG.info("Purging no-longer needed file " + txid);
                  if (!f.delete()) {
      ...
      

      FJM:

        private static final Pattern EDITS_REGEX = Pattern.compile(
          NameNodeFile.EDITS.getName() + "_(\\d+)-(\\d+)");
        private static final Pattern EDITS_INPROGRESS_REGEX = Pattern.compile(
          NameNodeFile.EDITS_INPROGRESS.getName() + "_(\\d+)");
        private static final Pattern EDITS_INPROGRESS_STALE_REGEX = Pattern.compile(
            NameNodeFile.EDITS_INPROGRESS.getName() + "_(\\d+).*(\\S+)");
      ...
          List<EditLogFile> editLogs = matchEditLogs(files, true);
          for (EditLogFile log : editLogs) {
            if (log.getFirstTxId() < minTxIdToKeep &&
                log.getLastTxId() < minTxIdToKeep) {
              purger.purgeLog(log);
            }
          }
      

      I can see 2 differences:

      1. Different regex in matching for empty/corrupt in-progress files. The FJM pattern makes more sense to me.
      2. FJM verifies that both start and end txID of a finalized edit file to be old enough. This doesn't make sense because end txID is always larger than start txID

      Attachments

        1. HDFS-8303.2.patch
          1 kB
          István Fajth
        2. HDFS-8303.1.patch
          1 kB
          Zhe Zhang
        3. HDFS-8303.0.patch
          2 kB
          Zhe Zhang

        Issue Links

          Activity

            People

              pifta István Fajth
              zhz Zhe Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: