Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3840

Use java.io.tmpdir as default output location for BulkRecordWriter

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:

      Description

      BulkRecordWriter uses the value of the property mapreduce.output.bulkoutputformat.localdir if set, defaulting to value of mapred.local.dir if the former is not set.

      However, on a typical production system, mapred.local.dir is set to a list of directories. This leads to BulkOutputFormat writing to silly paths such as

      /dir1/,dir2,/dir3,KeySpaceName/CFName

      This has two effects:

      1) Directory is not removed when job is finished, leading to disk space management issues.

      2) If a new job is run against same keyspacename and CF, it tries to load old data + new data.

      Better to use System.getProperty("java.io.tmpdir"), as that is set to an attempt-specific temporary directory which is cleaned after the job finishes. See http://hadoop.apache.org/common/docs/current/mapred_tutorial.html, under "Directory Structure".

        Attachments

          Activity

            People

            • Assignee:
              forsberg Erik Forsberg
              Reporter:
              forsberg Erik Forsberg
              Reviewer:
              Brandon Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: