Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1979

"Output directory already exists" error in gridmix when gridmix.output.directory is not defined

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: contrib/gridmix
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      "Output directory already exists" error is seen in gridmix when gridmix.output.directory is not defined. When gridmix.output.directory is not defined, then gridmix uses inputDir/gridmix/ as output path for gridmix run. Because gridmix is creating outputPath(in this case, inputDir/gridmix/) at the begining, the output path to generate-data-mapreduce-job(i.e. inputDir) already exists and becomes error from mapreduce.

      There is need for creation of this outputPath in any case(whether user specifies the path using gridmix.output.directory OR gridmix itself considering inputDir/gridmix/ ) even though the paths are automatically created for output paths of mapreduce jobs(like mkdir -p), because gridmix needs to set 777 permissions for this outputPath sothat different users can create different output directories of different mapreduce jobs within this gridmix run.

      The other case in which this problem is seen is when gridmix.output.directory is defined as a relative path. This is because in this case also, gridmix tries to create relative path under ioPath/ and thus the same issue.

      1. 1979.patch
        0.9 kB
        Ravi Gummadi
      2. 1979.v1.1.patch
        5 kB
        Ravi Gummadi
      3. 1979.v1.2.patch
        5 kB
        Ravi Gummadi
      4. 1979.v1.3.patch
        7 kB
        Ravi Gummadi
      5. 1979.v1.patch
        5 kB
        Ravi Gummadi

        Activity

        Hide
        Ravi Gummadi added a comment -

        Attaching patch removing the code that creates the output directory. This creation causes the above reported issue when it is insdie inputDir(as is in the case where gridmix.output.directory is not defined by user and -generate option tries to generate data in inputDir).

        Show
        Ravi Gummadi added a comment - Attaching patch removing the code that creates the output directory. This creation causes the above reported issue when it is insdie inputDir(as is in the case where gridmix.output.directory is not defined by user and -generate option tries to generate data in inputDir).
        Hide
        Ravi Gummadi added a comment -

        With earlier patch, TestGridmixSubmission was failing. Attaching new patch with the correct fix. Also added testcase.

        Show
        Ravi Gummadi added a comment - With earlier patch, TestGridmixSubmission was failing. Attaching new patch with the correct fix. Also added testcase.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch removing the special check for the default output path existence.
        There are many checks that can be done and can be done in a separate JIRA(as they is not relevant here). Raised MAPREDUCE-2006 to handle these validations.

        Show
        Ravi Gummadi added a comment - Attaching new patch removing the special check for the default output path existence. There are many checks that can be done and can be done in a separate JIRA(as they is not relevant here). Raised MAPREDUCE-2006 to handle these validations.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch by changing the test case a little bit on Amar's suggestion.

        Show
        Ravi Gummadi added a comment - Attaching new patch by changing the test case a little bit on Amar's suggestion.
        Hide
        Ravi Gummadi added a comment -

        There is need for creation of outputPath just to set the permissions of this directory to 777 sothat all users(who launch mapreduce jobs within this gridmix run) will be able to create their own output directories under this outputPath through mapreduce jobs.

        Show
        Ravi Gummadi added a comment - There is need for creation of outputPath just to set the permissions of this directory to 777 sothat all users(who launch mapreduce jobs within this gridmix run) will be able to create their own output directories under this outputPath through mapreduce jobs.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch with a small modification to the call of list.toArracy() in the testcase for better readability as suggested by Amar.

        Show
        Ravi Gummadi added a comment - Attaching new patch with a small modification to the call of list.toArracy() in the testcase for better readability as suggested by Amar.
        Hide
        Ravi Gummadi added a comment -

        ant test and test-patch passed on my local machine.

        Show
        Ravi Gummadi added a comment - ant test and test-patch passed on my local machine.
        Hide
        Amar Kamat added a comment -

        +1. Looks fine to me.

        Show
        Amar Kamat added a comment - +1. Looks fine to me.
        Hide
        Amareshwari Sriramadasu added a comment -

        I just committed this. Thanks Ravi !

        Show
        Amareshwari Sriramadasu added a comment - I just committed this. Thanks Ravi !
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Ravi Gummadi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development