Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2015

GridMix always throws an FileAlreadyExistsException even ouput directory is not available in HDFS.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: contrib/gridmix
    • Labels:
      None

      Description

      Gridmix always throws an FileAlreadyExistsException even ouput directory is not available in HDFS. Actually I was launching the Gridmix in a command line for generating the data, before launching I just make sure the output directory is not available in the HDFS by deleting the folder if already exists.However, I could see output directory already exists exception every time. Please see the attached logs for more information.

        Activity

        Hide
        Amar Kamat added a comment -

        I think this is a duplicate of MAPREDUCE-1979.

        Show
        Amar Kamat added a comment - I think this is a duplicate of MAPREDUCE-1979 .
        Hide
        Vinay Kumar Thota added a comment -

        I don't think so, this bugs is duplicate of MAPREDUCE-1979. Because I am defining the 'gridmix.output.directory' attribute while invoking the gridmix.

        Show
        Vinay Kumar Thota added a comment - I don't think so, this bugs is duplicate of MAPREDUCE-1979 . Because I am defining the 'gridmix.output.directory' attribute while invoking the gridmix.
        Hide
        Vinay Kumar Thota added a comment -

        it's a duplicate of MAPREDUCE-1979.

        Show
        Vinay Kumar Thota added a comment - it's a duplicate of MAPREDUCE-1979 .
        Hide
        Ravi Gummadi added a comment -

        Is gridmix.output.directory defined as a relative path ? If so, then also it is because of the same problem reported at MAPREDUCE-1979 and the patch uploaded to MAPREDUCE-1979 solves it. Please validate the same and resolve this as duplicate, if this JIRA is a duplicate of MAPREDUCE-1979.
        If relative path is specified for gridmix.output.directory, then also gridmix tries to create that relative path as ioPath/<relativePath>.

        Show
        Ravi Gummadi added a comment - Is gridmix.output.directory defined as a relative path ? If so, then also it is because of the same problem reported at MAPREDUCE-1979 and the patch uploaded to MAPREDUCE-1979 solves it. Please validate the same and resolve this as duplicate, if this JIRA is a duplicate of MAPREDUCE-1979 . If relative path is specified for gridmix.output.directory, then also gridmix tries to create that relative path as ioPath/<relativePath>.
        Hide
        Inna Balasanyan added a comment -

        Hi
        I am also getting this error but the path for gridmix.output.directory isn't relative. Inna

        Show
        Inna Balasanyan added a comment - Hi I am also getting this error but the path for gridmix.output.directory isn't relative. Inna
        Hide
        Ravi Gummadi added a comment -

        Which version of hadoop are you using ? Is it having the patch of MAPREDUCE-1979 already ?

        Show
        Ravi Gummadi added a comment - Which version of hadoop are you using ? Is it having the patch of MAPREDUCE-1979 already ?
        Hide
        Inna Balasanyan added a comment -

        Hi,
        thanks for quick reply.May be the problem is really the version cause I'm using hadoop 1.0.1.

        Show
        Inna Balasanyan added a comment - Hi, thanks for quick reply.May be the problem is really the version cause I'm using hadoop 1.0.1.
        Hide
        Ravi Gummadi added a comment -

        hadoop 1.0.1 should contain the fix of MR-1979. Please provide the details of the gridmix command line used and the Exception message.

        Show
        Ravi Gummadi added a comment - hadoop 1.0.1 should contain the fix of MR-1979. Please provide the details of the gridmix command line used and the Exception message.
        Hide
        Inna Balasanyan added a comment -

        Hello
        This is the command line used
        bin/hadoop -classpath $JAR_CLASSPATH org.apache.hadoop.mapred.gridmix.Gridmix -Dgridmix.min.file.size=1 -Dgridmix.output.directory=hdfs://localhost:54310/user/hduser/outgrid -generate 1m hdfs://localhost:54310/user/hduser/outgrid hdfs://localhost:54310/user/hduser/rumen_output/job-trace.json

        And this is the output

        12/07/25 10:47:27 INFO gridmix.JobMonitor: Job submission failed notify if anyone is waiting org.apache.hadoop.mapreduce.Job@fd918a
        12/07/25 10:47:37 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser-78719438/.staging/job_local_0002
        12/07/25 10:47:37 ERROR security.UserGroupInformation: PriviledgedActionException as:innab cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:54310/user/hduser/outgrid already exists
        12/07/25 10:47:37 ERROR gridmix.Gridmix: Startup failed
        org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:54310/user/hduser/outgrid already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
        at org.apache.hadoop.mapred.gridmix.Gridmix.writeInputData(Gridmix.java:118)
        at org.apache.hadoop.mapred.gridmix.Gridmix.start(Gridmix.java:283)
        at org.apache.hadoop.mapred.gridmix.Gridmix.runJob(Gridmix.java:263)
        at org.apache.hadoop.mapred.gridmix.Gridmix.access$000(Gridmix.java:55)
        at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:217)
        at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:215)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.gridmix.Gridmix.run(Gridmix.java:215)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.mapred.gridmix.Gridmix.main(Gridmix.java:390)
        12/07/25 10:47:37 INFO gridmix.Gridmix: Exiting...

        Thanks

        Show
        Inna Balasanyan added a comment - Hello This is the command line used bin/hadoop -classpath $JAR_CLASSPATH org.apache.hadoop.mapred.gridmix.Gridmix -Dgridmix.min.file.size=1 -Dgridmix.output.directory=hdfs://localhost:54310/user/hduser/outgrid -generate 1m hdfs://localhost:54310/user/hduser/outgrid hdfs://localhost:54310/user/hduser/rumen_output/job-trace.json And this is the output 12/07/25 10:47:27 INFO gridmix.JobMonitor: Job submission failed notify if anyone is waiting org.apache.hadoop.mapreduce.Job@fd918a 12/07/25 10:47:37 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser-78719438/.staging/job_local_0002 12/07/25 10:47:37 ERROR security.UserGroupInformation: PriviledgedActionException as:innab cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:54310/user/hduser/outgrid already exists 12/07/25 10:47:37 ERROR gridmix.Gridmix: Startup failed org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:54310/user/hduser/outgrid already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at org.apache.hadoop.mapred.gridmix.Gridmix.writeInputData(Gridmix.java:118) at org.apache.hadoop.mapred.gridmix.Gridmix.start(Gridmix.java:283) at org.apache.hadoop.mapred.gridmix.Gridmix.runJob(Gridmix.java:263) at org.apache.hadoop.mapred.gridmix.Gridmix.access$000(Gridmix.java:55) at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:217) at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.gridmix.Gridmix.run(Gridmix.java:215) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.mapred.gridmix.Gridmix.main(Gridmix.java:390) 12/07/25 10:47:37 INFO gridmix.Gridmix: Exiting... Thanks
        Hide
        Ravi Gummadi added a comment -

        Command line arguments are wrong. You should not give same directory-path for both gridmix.output.directory and <iopath> (both of them are ***/outgrid). Please provide different paths.

        Show
        Ravi Gummadi added a comment - Command line arguments are wrong. You should not give same directory-path for both gridmix.output.directory and <iopath> (both of them are ***/outgrid). Please provide different paths.
        Hide
        Inna Balasanyan added a comment -

        Thanks I have changed the paths they are now different. But I am getting an error.

        12/07/25 11:30:46 INFO gridmix.SubmitterUserResolver: Current user resolver is SubmitterUserResolver
        12/07/25 11:30:46 WARN gridmix.Gridmix: Resource null ignored
        12/07/25 11:30:47 INFO gridmix.Gridmix: Submission policy is STRESS
        12/07/25 11:30:47 INFO gridmix.Gridmix: Generating 1.0m of test data...
        12/07/25 11:30:47 INFO util.NativeCodeLoader: Loaded the native-hadoop library
        12/07/25 11:30:47 INFO gridmix.Statistics: Not tracking job GRIDMIX_GENDATA as seq id is less than zero: -1
        12/07/25 11:30:52 INFO gridmix.JobMonitor: GRIDMIX_GENDATA (job_local_0001) success
        12/07/25 11:30:57 INFO gridmix.Gridmix: Changing the permissions for inputPath hdfs://localhost:54310/user/hduser/outgrid
        12/07/25 11:30:57 INFO gridmix.Gridmix: Done.
        12/07/25 11:30:57 INFO gridmix.FilePool: minFileSize 1
        12/07/25 11:30:57 INFO gridmix.FilePool: InnerDesc 1
        12/07/25 11:30:57 INFO gridmix.FilePool: InnerDesc 2 thisDir.getPath()hdfs://localhost:54310/user/hduser/outgrid
        12/07/25 11:30:57 INFO gridmix.FilePool: FilePool
        12/07/25 11:30:57 ERROR gridmix.Gridmix: Startup failed
        java.io.IOException: Found no satisfactory file in hdfs://localhost:54310/user/hduser/outgrid
        at org.apache.hadoop.mapred.gridmix.FilePool.refresh(FilePool.java:106)
        at org.apache.hadoop.mapred.gridmix.JobSubmitter.refreshFilePool(JobSubmitter.java:159)
        at org.apache.hadoop.mapred.gridmix.Gridmix.start(Gridmix.java:286)
        at org.apache.hadoop.mapred.gridmix.Gridmix.runJob(Gridmix.java:263)
        at org.apache.hadoop.mapred.gridmix.Gridmix.access$4(Gridmix.java:229)
        at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:217)
        at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:1)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.gridmix.Gridmix.run(Gridmix.java:215)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.mapred.gridmix.Gridmix.main(Gridmix.java:390)
        12/07/25 11:30:57 INFO gridmix.Gridmix: Exiting...

        I have read that this is because of relatively small input data but I have set the Dgridmix.min.file.size to 1. Could you please help me?

        Show
        Inna Balasanyan added a comment - Thanks I have changed the paths they are now different. But I am getting an error. 12/07/25 11:30:46 INFO gridmix.SubmitterUserResolver: Current user resolver is SubmitterUserResolver 12/07/25 11:30:46 WARN gridmix.Gridmix: Resource null ignored 12/07/25 11:30:47 INFO gridmix.Gridmix: Submission policy is STRESS 12/07/25 11:30:47 INFO gridmix.Gridmix: Generating 1.0m of test data... 12/07/25 11:30:47 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/25 11:30:47 INFO gridmix.Statistics: Not tracking job GRIDMIX_GENDATA as seq id is less than zero: -1 12/07/25 11:30:52 INFO gridmix.JobMonitor: GRIDMIX_GENDATA (job_local_0001) success 12/07/25 11:30:57 INFO gridmix.Gridmix: Changing the permissions for inputPath hdfs://localhost:54310/user/hduser/outgrid 12/07/25 11:30:57 INFO gridmix.Gridmix: Done. 12/07/25 11:30:57 INFO gridmix.FilePool: minFileSize 1 12/07/25 11:30:57 INFO gridmix.FilePool: InnerDesc 1 12/07/25 11:30:57 INFO gridmix.FilePool: InnerDesc 2 thisDir.getPath()hdfs://localhost:54310/user/hduser/outgrid 12/07/25 11:30:57 INFO gridmix.FilePool: FilePool 12/07/25 11:30:57 ERROR gridmix.Gridmix: Startup failed java.io.IOException: Found no satisfactory file in hdfs://localhost:54310/user/hduser/outgrid at org.apache.hadoop.mapred.gridmix.FilePool.refresh(FilePool.java:106) at org.apache.hadoop.mapred.gridmix.JobSubmitter.refreshFilePool(JobSubmitter.java:159) at org.apache.hadoop.mapred.gridmix.Gridmix.start(Gridmix.java:286) at org.apache.hadoop.mapred.gridmix.Gridmix.runJob(Gridmix.java:263) at org.apache.hadoop.mapred.gridmix.Gridmix.access$4(Gridmix.java:229) at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:217) at org.apache.hadoop.mapred.gridmix.Gridmix$1.run(Gridmix.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.mapred.gridmix.Gridmix.run(Gridmix.java:215) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.mapred.gridmix.Gridmix.main(Gridmix.java:390) 12/07/25 11:30:57 INFO gridmix.Gridmix: Exiting... I have read that this is because of relatively small input data but I have set the Dgridmix.min.file.size to 1. Could you please help me?
        Hide
        Ravi Gummadi added a comment -

        Can you manually check the file sizes that are created by Gridmix under the input path and see if all the file sizes are less than gridmix.min.file.size ?

        Show
        Ravi Gummadi added a comment - Can you manually check the file sizes that are created by Gridmix under the input path and see if all the file sizes are less than gridmix.min.file.size ?
        Hide
        Inna Balasanyan added a comment -

        I was doing it. My json file is about 1.1 M but after the start Gridmix generates a file in input folder. The name is _SUCCESS but it is empty. Then it gives an error and exits.I have generated json using rumen

        Show
        Inna Balasanyan added a comment - I was doing it. My json file is about 1.1 M but after the start Gridmix generates a file in input folder. The name is _SUCCESS but it is empty. Then it gives an error and exits.I have generated json using rumen
        Hide
        Ravi Gummadi added a comment -

        Please provide the command line again. Still you could be having something wrong there.

        Show
        Ravi Gummadi added a comment - Please provide the command line again. Still you could be having something wrong there.
        Hide
        Inna Balasanyan added a comment -

        Here is the command line

        bin/hadoop -classpath $JAR_CLASSPATH org.apache.hadoop.mapred.gridmix.Gridmix -Dgridmix.min.file.size=1 -Dgridmix.output.directory=hdfs://localhost:54310/user/hduser/out -generate 1m hdfs://localhost:54310/user/hduser/outgrid hdfs://localhost:54310/user/hduser/rumen_output/job-trace.json

        Thanks for help

        Show
        Inna Balasanyan added a comment - Here is the command line bin/hadoop -classpath $JAR_CLASSPATH org.apache.hadoop.mapred.gridmix.Gridmix -Dgridmix.min.file.size=1 -Dgridmix.output.directory=hdfs://localhost:54310/user/hduser/out -generate 1m hdfs://localhost:54310/user/hduser/outgrid hdfs://localhost:54310/user/hduser/rumen_output/job-trace.json Thanks for help
        Hide
        Ravi Gummadi added a comment -

        This looks fine to me.

        Can you check the output of
        hadoop dfs -ls hdfs://localhost:54310/user/hduser/outgrid ?

        Show
        Ravi Gummadi added a comment - This looks fine to me. Can you check the output of hadoop dfs -ls hdfs://localhost:54310/user/hduser/outgrid ?
        Hide
        Inna Balasanyan added a comment -

        rw-rw-rw 3 hduser supergroup 0 2012-07-25 14:24 /user/hduser/outgrid/_SUCCESS

        Nothing else

        Show
        Inna Balasanyan added a comment - rw-rw-rw 3 hduser supergroup 0 2012-07-25 14:24 /user/hduser/outgrid/_SUCCESS Nothing else
        Hide
        Ravi Gummadi added a comment -

        This is unexpected when Gridmix says the GRIDMIX_GENDATA job succeeded.

        To debug further, please try with local filesystem based paths for output dir and iopath and see the behavior.

        Show
        Ravi Gummadi added a comment - This is unexpected when Gridmix says the GRIDMIX_GENDATA job succeeded. To debug further, please try with local filesystem based paths for output dir and iopath and see the behavior.
        Hide
        Inna Balasanyan added a comment -

        I got the same for local filesystem. same error.

        Show
        Inna Balasanyan added a comment - I got the same for local filesystem. same error.
        Hide
        Inna Balasanyan added a comment -

        Sorry for disturbing you. I am blocked with this problem.

        Show
        Inna Balasanyan added a comment - Sorry for disturbing you. I am blocked with this problem.
        Hide
        Ravi Gummadi added a comment -

        Are you sure the input path is not already existing when you start running your command ?

        Anyway, please send mail with the problem description to user mailing list. Others may have more insights.

        Show
        Ravi Gummadi added a comment - Are you sure the input path is not already existing when you start running your command ? Anyway, please send mail with the problem description to user mailing list. Others may have more insights.
        Hide
        Inna Balasanyan added a comment -

        Ok. Mail sent

        Show
        Inna Balasanyan added a comment - Ok. Mail sent
        Hide
        Brian Husted added a comment -

        Ravi Gummadi Inna Balasanyan I am running into the exact same issue with Gridmix on 1.2.1 where it generates no input data with an empty _SUCCESS. I get from Gridmix: GRIDMIX_GENERATE_INPUT_DATA (job_local641395695_0001) success

        Any pointers how to resolve this issue?

        Show
        Brian Husted added a comment - Ravi Gummadi Inna Balasanyan I am running into the exact same issue with Gridmix on 1.2.1 where it generates no input data with an empty _SUCCESS. I get from Gridmix: GRIDMIX_GENERATE_INPUT_DATA (job_local641395695_0001) success Any pointers how to resolve this issue?

          People

          • Assignee:
            Ranjit Mathew
            Reporter:
            Vinay Kumar Thota
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development