Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7233

MapReduce Input Path Should Ignore Path Ends With '/*' When Job Submit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • None
    • job submission, performance
    • None

    Description

      We have a public and shared hadoop cluster that runs so many MR job from different department.

       

      I found that job submission very slow once the input path of the job set to a path ends with "/*", like "/my/path/*", but "/my/path" or "/my/path/" works fine.

       

      After read the code. I think the problem lies in the process of splits calculation.

       

      FileInputFormat#singleThreadedListStatus() method get a array of FileStatus first. If the input path ends with "/*", and the result is all file/directory FileStatus object in the input path. But only one FileStatus object(the input path) if the input path not ends with "/*".

       

      The next step is find the LocatedFileStatus of each FileStatus object. so, only the directory FileStatus do searching the LocatedFileStatus(dfs.listPaths(), batch).

       

      Finally, when calculate job split like FileInputFormat#getSplits() method. If the FileStatus is not LocatedFileStatus object, then use fs.getFileBlockLocations() method to fetch. Which could lead a lot of RPC requests when many files in the input path. CombineFileInputFormat do this also in the construction method of OneFileInfo.

       

      So, in this case, some job take a few minutes/hours to submit.

       

      I tried to remove the suffix of the input path that ends with "/*" before the code that get file status, but I don't confirm if this will cause other problems.

      Attachments

        1. job submit.jpg
          74 kB
          Victor Zhang

        Activity

          People

            Unassigned Unassigned
            xiaoxigua Victor Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: