Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2078

TraceBuilder unable to generate the traces while giving the job history path by globing.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: tools/rumen
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I was trying to generate the traces for MR job histories by using TraceBuilder. However, it's unable to generate the traces while giving the job history path by globing. It throws a file not found exception even though the job history path is exists.

      I have provide the job history path in the below way.

      hdfs://<<clustername>>/dir1/dir2/dir3///////

      Exception:

      java.io.FileNotFoundException: File does not exist:
      hdfs://<<clustername>>/dir1/dir2/dir3//////
      at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
      at org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.<init>(TraceBuilder.java:88)
      at org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:183)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:121)

      It's truncating the last slash in the path.

      1. mapreduce-2078-v1.5.patch
        5 kB
        Amar Kamat
      2. mapreduce-2078-v1.2.patch
        5 kB
        Amar Kamat

        Activity

        Hide
        Amar Kamat added a comment -

        There is a FileSystem.globStatus(Path) API in FileSystem to enumerate all the paths represented by a globbed path.

        The current TraceBuilder code does the following

          for (int i = 2 + switchTop; i < args.length; ++i) {
            Path thisPath = new Path(args[i]);
            FileSystem fs = thisPath.getFileSystem(conf);
            if (fs.getFileStatus(thisPath).isDirectory()) {
              FileStatus[] statuses = fs.listStatus(thisPath);
              for (FileStatus s : statuses) {
                // process the file 
                ..
              }
            }
        

        This needs to changed to first flatten the globbed paths passed as input. So the suggested fix is

          for (int i = 2 + switchTop; i < args.length; ++i) { // iterate over the input
            Path thisPath = new Path(args[i]);
            // get the filesystem specific to the input passed
            FileSystem fs = thisPath.getFileSystem(conf);
        
            // flatten the globbed file path
            FileStatus[] realStatuses = fs.globStatus(thisPath);
        
            // iterate over all the files under the globbed input path
            for (FileStatus status : realStatuses) {
              // extract the actual (flat) path from the file status
              Path realPath = status.getPath();
        
              // now do what is done in the trunk 
              if (fs.getFileStatus(realPath).isDirectory()) {
              FileStatus[] statuses = fs.listStatus(realPath);
              for (FileStatus s : statuses) {
                // process the file 
                ..
              }
            }
          }
        }
        

        I ran TraceBuilder with this fix and now it works with globbed input paths.

        Show
        Amar Kamat added a comment - There is a FileSystem.globStatus(Path) API in FileSystem to enumerate all the paths represented by a globbed path. The current TraceBuilder code does the following for ( int i = 2 + switchTop; i < args.length; ++i) { Path thisPath = new Path(args[i]); FileSystem fs = thisPath.getFileSystem(conf); if (fs.getFileStatus(thisPath).isDirectory()) { FileStatus[] statuses = fs.listStatus(thisPath); for (FileStatus s : statuses) { // process the file .. } } This needs to changed to first flatten the globbed paths passed as input. So the suggested fix is for ( int i = 2 + switchTop; i < args.length; ++i) { // iterate over the input Path thisPath = new Path(args[i]); // get the filesystem specific to the input passed FileSystem fs = thisPath.getFileSystem(conf); // flatten the globbed file path FileStatus[] realStatuses = fs.globStatus(thisPath); // iterate over all the files under the globbed input path for (FileStatus status : realStatuses) { // extract the actual (flat) path from the file status Path realPath = status.getPath(); // now do what is done in the trunk if (fs.getFileStatus(realPath).isDirectory()) { FileStatus[] statuses = fs.listStatus(realPath); for (FileStatus s : statuses) { // process the file .. } } } } I ran TraceBuilder with this fix and now it works with globbed input paths.
        Hide
        Amar Kamat added a comment -

        The attached patch fixes the issue by first enumerating all the paths under the globbed input path and then processing them individually. test-patch and ant-tests passed

        Show
        Amar Kamat added a comment - The attached patch fixes the issue by first enumerating all the paths under the globbed input path and then processing them individually. test-patch and ant-tests passed
        Hide
        Ranjit Mathew added a comment -

        Some minor comments:

        • In If the input is a file then its directly in the comment to processInput(), change "its" to "it's" or "it is".
        • Just use plain old JavaDoc style comments for processInput().
        • In processInput(), you can now directly use the FileStatus object inStatus instead of calling fs.getFileStatus() again.
        Show
        Ranjit Mathew added a comment - Some minor comments: In If the input is a file then its directly in the comment to processInput() , change "its" to "it's" or "it is". Just use plain old JavaDoc style comments for processInput() . In processInput() , you can now directly use the FileStatus object inStatus instead of calling fs.getFileStatus() again.
        Hide
        Ravi Gummadi added a comment -

        Some comments on the patch:
        (1) I think FileSystem.createNewFile(inputPath1) is better/simpler than FsShell.run(new String[]

        {"-touchz", inputPath1.toString()}

        ) in the testcase.
        (2) The method name processInput() can be changed to something like processInputArgument() or buildInputHistoryPaths() for more clarity.
        (3) In FileInputFormat.listStatus(), the return value of fs.globStatus() is checked for null. Please check if we need the check here also.

        Show
        Ravi Gummadi added a comment - Some comments on the patch: (1) I think FileSystem.createNewFile(inputPath1) is better/simpler than FsShell.run(new String[] {"-touchz", inputPath1.toString()} ) in the testcase. (2) The method name processInput() can be changed to something like processInputArgument() or buildInputHistoryPaths() for more clarity. (3) In FileInputFormat.listStatus(), the return value of fs.globStatus() is checked for null. Please check if we need the check here also.
        Hide
        Amar Kamat added a comment -

        Attaching a new patch incorporating Ranjit's and Ravi's comments. test-patch passed.

        Show
        Amar Kamat added a comment - Attaching a new patch incorporating Ranjit's and Ravi's comments. test-patch passed.
        Hide
        Ravi Gummadi added a comment -

        Patch looks fine to me.
        +1

        Show
        Ravi Gummadi added a comment - Patch looks fine to me. +1
        Hide
        Amareshwari Sriramadasu added a comment -

        I just committed this. Thanks Amar !

        Show
        Amareshwari Sriramadasu added a comment - I just committed this. Thanks Amar !
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Vinay Kumar Thota
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development