Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.3
    • Fix Version/s: 0.23.0
    • Component/s: None
    • Labels:
      None

      Description

      Our project, Pig, exposes FsShell functionality to our end users through a shell command. We want to use this command with no modifications to make sure that whether you work with HDFS through Hadoop or Pig you get identical semantics.

      The main concern that has been recently raised by our users is that there is no way to ignore certain failures that they consider to be benign, for instance, removing a non-existent directory.

      We have 2 asks related to this issue:

      (1) Meaningful error code returned from FsShell (we use java class) so that we can take different actions on different errors
      (2) Unix like ways to tell the command to ignore certain behavior. Here are the commands that we would like to be expanded/implemented:

      • rm -f
      • rmdir ---ignore-fail-on-non-empty
      • mkdir -p

        Issue Links

          Activity

          Daryn Sharp made changes -
          Fix Version/s 0.23.0 [ 12315569 ]
          Fix Version/s 0.24.0 [ 12317652 ]
          Hide
          Harsh J added a comment -

          The Fix Version here should be 0.23.2. If this is right, Daryn - Please make the change?

          0.24 is non-existent as a version and am changing our JIRAs to reflect that.

          Thanks!

          Show
          Harsh J added a comment - The Fix Version here should be 0.23.2. If this is right, Daryn - Please make the change? 0.24 is non-existent as a version and am changing our JIRAs to reflect that. Thanks!
          Daryn Sharp made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Daryn Sharp added a comment -

          The requested updates to commands are implemented. I believe the misunderstanding about rm flags is cleared up.

          Show
          Daryn Sharp added a comment - The requested updates to commands are implemented. I believe the misunderstanding about rm flags is cleared up.
          Hide
          Daryn Sharp added a comment -

          The existing output directory will be removed. But in case of the directory does not exist, rmdir --ignore-fail-on-non-empty silently ignore and continue with the MR job.

          No, that is not how the flag works on linux. The flag only ignores a failed delete on an existing directory that contains items.

          $ ls -d non-existent-dir
          ls: non-existent-dir: No such file or directory
          $ rmdir --ignore-fail-on-non-empty non-existent-dir; echo exit=$?
          rmdir: non-existent-dir: No such file or directory
          exit=1
          

          Rm's -f flag does what you want. It will not return an error if the directory does not exist.

          Show
          Daryn Sharp added a comment - The existing output directory will be removed. But in case of the directory does not exist, rmdir --ignore-fail-on-non-empty silently ignore and continue with the MR job. No, that is not how the flag works on linux. The flag only ignores a failed delete on an existing directory that contains items. $ ls -d non-existent-dir ls: non-existent-dir: No such file or directory $ rmdir --ignore-fail-on-non-empty non-existent-dir; echo exit=$? rmdir: non-existent-dir: No such file or directory exit=1 Rm's -f flag does what you want. It will not return an error if the directory does not exist.
          Hide
          Daniel Dai added a comment -

          An existing output directory will be silently ignored.

          The existing output directory will be removed. But in case of the directory does not exist, rmdir --ignore-fail-on-non-empty silently ignore and continue with the MR job.

          Show
          Daniel Dai added a comment - An existing output directory will be silently ignored. The existing output directory will be removed. But in case of the directory does not exist, rmdir --ignore-fail-on-non-empty silently ignore and continue with the MR job.
          Hide
          Daryn Sharp added a comment -

          [...] other users just want to make sure the output path is removed before launching the mapreduce job, they don't want to fail the script.

          I'm still a bit confused. If the user wants to make sure the output directory is removed prior to launching the MR job, I don't think they want rmdir --ignore-fail-on-non-empty $path. An existing output directory will be silently ignored. Pig will then launch a MR job that fails due to the existing directory that wasn't removed. I must be misunderstanding the explanation... I think the user wants to run rm -r -f $path?

          Show
          Daryn Sharp added a comment - [...] other users just want to make sure the output path is removed before launching the mapreduce job, they don't want to fail the script. I'm still a bit confused. If the user wants to make sure the output directory is removed prior to launching the MR job, I don't think they want rmdir --ignore-fail-on-non-empty $path . An existing output directory will be silently ignored. Pig will then launch a MR job that fails due to the existing directory that wasn't removed. I must be misunderstanding the explanation... I think the user wants to run rm -r -f $path ?
          Hide
          Daniel Dai added a comment -

          The use case is user invoke rmdir from Pig script. If exception is thrown, the entire Pig script stop running. This might be expected behavior for some users, but other users just want to make sure the output path is removed before launching the mapreduce job, they don't want to fail the script. Currently Pig have a native implementation for the second case, but it would be much better FsShell provides this.

          Show
          Daniel Dai added a comment - The use case is user invoke rmdir from Pig script. If exception is thrown, the entire Pig script stop running. This might be expected behavior for some users, but other users just want to make sure the output path is removed before launching the mapreduce job, they don't want to fail the script. Currently Pig have a native implementation for the second case, but it would be much better FsShell provides this.
          Hide
          Daryn Sharp added a comment -

          I believe that all the features, except rmdir --ignore-fail-on-non-empty, are implemented. I'm having difficulty understanding the use case for the ignore flag. When would the user want to remove a directory but not care if it wasn't removed?

          Show
          Daryn Sharp added a comment - I believe that all the features, except rmdir --ignore-fail-on-non-empty , are implemented. I'm having difficulty understanding the use case for the ignore flag. When would the user want to remove a directory but not care if it wasn't removed?
          Arun C Murthy made changes -
          Fix Version/s 0.24.0 [ 12317652 ]
          Fix Version/s 0.23.0 [ 12315569 ]
          Daryn Sharp made changes -
          Fix Version/s 0.23.0 [ 12315569 ]
          Daryn Sharp made changes -
          Link This issue incorporates HADOOP-6385 [ HADOOP-6385 ]
          Daryn Sharp made changes -
          Link This issue incorporates HDFS-639 [ HDFS-639 ]
          Daryn Sharp made changes -
          Link This issue is part of HADOOP-7176 [ HADOOP-7176 ]
          Daryn Sharp made changes -
          Assignee Daryn Sharp [ daryn ]
          Chris Douglas made changes -
          Project Hadoop HDFS [ 12310942 ] Hadoop Common [ 12310240 ]
          Key HDFS-1784 HADOOP-7209
          Affects Version/s 0.20.3 [ 12314812 ]
          Affects Version/s 0.20.3 [ 12314814 ]
          Chris Douglas made changes -
          Field Original Value New Value
          Project Hadoop Map/Reduce [ 12310941 ] Hadoop HDFS [ 12310942 ]
          Key MAPREDUCE-2404 HDFS-1784
          Affects Version/s 0.20.3 [ 12314814 ]
          Affects Version/s 0.20.3 [ 12314813 ]
          Olga Natkovich created issue -

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Olga Natkovich
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development