Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2493

New Api FileOutputFormat does not honour user specified OutputCommitter

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.0, 0.24.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      o.a.h.mapreduce.lib.output.FileOutputFormat always uses the default FileOutputCommitter. It ignores the user specified OutputCommitter.

      1. MAPREDUCE-2493-1.patch
        6 kB
        Bhallamudi Venkata Siva Kamesh
      2. MAPREDUCE-2493.patch
        5 kB
        Bhallamudi Venkata Siva Kamesh

        Issue Links

          Activity

          Hide
          Bhallamudi Venkata Siva Kamesh added a comment -

          Hi Robert,
          Thanks for looking into this issue. I have found the necessity of a pluggable output committer associated with FOF, when I was fixing MAPREDUCE-3130. I fixed MAPREDUCE-3130 by associating MAPREDUCE-3471 with FOF. I just attached the way I fixed this.

          Upmerging the perviously submitted patch. Please provide your feedback

          If this is not a proper way to fix, I am happy to have fix for this.

          Show
          Bhallamudi Venkata Siva Kamesh added a comment - Hi Robert, Thanks for looking into this issue. I have found the necessity of a pluggable output committer associated with FOF, when I was fixing MAPREDUCE-3130 . I fixed MAPREDUCE-3130 by associating MAPREDUCE-3471 with FOF . I just attached the way I fixed this. Upmerging the perviously submitted patch. Please provide your feedback If this is not a proper way to fix, I am happy to have fix for this.
          Hide
          Robert Joseph Evans added a comment -

          Bh.V.S.Kamesh,

          I am not an expert on the code, but I thought that this was by design. In the older APIs the configs for output format and output committer were separate, but the committer is tied quite closely to the output format. If I am outputting to a DB using a DB Output Format I now have to set two configs instead of just one to make this work. What is more I may need to play some odd games to make it so that a DB output committer even works so that I can commit/roll back the results, something that the current DB output format does not implement. This comes at the expense of making it more difficult to override the OutputCommitter, but my experience with the FileOutputCommitter, it is not really designed so that it can be subclassed in a clean extensible way.

          That being said I am fine with adding in the ability to override the output committer through a configuration on the newer API, I am just not sure that this is the proper way to do it. I have not had time to really think through it. At a minimum please upmerge the patch. It no longer applies.

          Show
          Robert Joseph Evans added a comment - Bh.V.S.Kamesh, I am not an expert on the code, but I thought that this was by design. In the older APIs the configs for output format and output committer were separate, but the committer is tied quite closely to the output format. If I am outputting to a DB using a DB Output Format I now have to set two configs instead of just one to make this work. What is more I may need to play some odd games to make it so that a DB output committer even works so that I can commit/roll back the results, something that the current DB output format does not implement. This comes at the expense of making it more difficult to override the OutputCommitter, but my experience with the FileOutputCommitter, it is not really designed so that it can be subclassed in a clean extensible way. That being said I am fine with adding in the ability to override the output committer through a configuration on the newer API, I am just not sure that this is the proper way to do it. I have not had time to really think through it. At a minimum please upmerge the patch. It no longer applies.
          Hide
          Bhallamudi Venkata Siva Kamesh added a comment -

          Can someone please review the attached patch.

          Show
          Bhallamudi Venkata Siva Kamesh added a comment - Can someone please review the attached patch.
          Hide
          Bhallamudi Venkata Siva Kamesh added a comment -

          here I am attaching a solution against trunk. Pls review the solution

          Show
          Bhallamudi Venkata Siva Kamesh added a comment - here I am attaching a solution against trunk. Pls review the solution
          Hide
          Bhallamudi Venkata Siva Kamesh added a comment -

          When we use new API, the OutputCommitter associated with the FileOutputFormat is FileOutputCommitter.
          Where as for the old API, OutputCommitter is reading from the conf object.

              if (useNewApi) {
                if (LOG.isDebugEnabled()) {
                  LOG.debug("using new api for output committer");
                }
                outputFormat =
                  ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job);
                committer = outputFormat.getOutputCommitter(taskContext);
              } else {
                committer = conf.getOutputCommitter();
              }
          
          Show
          Bhallamudi Venkata Siva Kamesh added a comment - When we use new API, the OutputCommitter associated with the FileOutputFormat is FileOutputCommitter. Where as for the old API, OutputCommitter is reading from the conf object. if (useNewApi) { if (LOG.isDebugEnabled()) { LOG.debug( "using new api for output committer" ); } outputFormat = ReflectionUtils.newInstance(taskContext.getOutputFormatClass(), job); committer = outputFormat.getOutputCommitter(taskContext); } else { committer = conf.getOutputCommitter(); }

            People

            • Assignee:
              Bhallamudi Venkata Siva Kamesh
              Reporter:
              Sharad Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development