Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5902

JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • jobhistoryserver
    • None

    Description

      1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed.

      2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why.

      So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason.

      Regarding the actual bug : Skipping completed jhist files

      We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file.

      Regarding logging
      It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility.

        private static List<FileStatus> scanDirectory(Path path, FileContext fc,
            PathFilter pathFilter) throws IOException {
          path = fc.makeQualified(path);
          List<FileStatus> jhStatusList = new ArrayList<FileStatus>();
          RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path);
          while (fileStatusIter.hasNext()) {
            FileStatus fileStatus = fileStatusIter.next();
            Path filePath = fileStatus.getPath();
            if (fileStatus.isFile() && pathFilter.accept(filePath)) {
              jhStatusList.add(fileStatus);
            }
          }
          return jhStatusList;
        }
      
      

      Reproducing

      I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes "-" characters in its name, which wind up getting replaced by "%2D" later on at some stage in the job pipeline.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jayunit100 jay vyas
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Remaining Estimate - 1h
                  1h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified