Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7366

FileOutputCommitter Enable Concurrent Writes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.3.1
    • None
    • mrv2

    Description

      is it possible to make `PENDING_DIR_NAME` configurable?
      That will enable concurrent writes to same location. current if two spark processes write same destination one of them is failing.

      current

       public static final String PENDING_DIR_NAME = "_temporary";

      new:

      PENDING_DIR_NAME = conf.get("mapreduce.fileoutputcommitter.pending.dir", "_temporary");

      here is custom commiter doing it: https://gist.github.com/ismailsimsek/33c55d8e1fcfc79160483c38a978edbd

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              simsek ismail
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h