[MAPREDUCE-7029] FileOutputCommitter is slow on filesystems lacking recursive delete - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.8.2
Fix Version/s: 3.1.0, 2.10.0
Component/s: None
Labels:
None
Environment:
Hide

Google Cloud Storage (with the GCS connector: https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs) for HCFS compatibility.

FileOutputCommitter algorithm v2.

Running on Google Compute Engine with Java 8, Debian 8, Hadoop 2.8.2, Spark 2.2.0.
Show
Google Cloud Storage (with the GCS connector: https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs ) for HCFS compatibility. FileOutputCommitter algorithm v2. Running on Google Compute Engine with Java 8, Debian 8, Hadoop 2.8.2, Spark 2.2.0.

Hadoop Flags:

Reviewed
Release Note:

Hide
MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems.

Show
MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems.

Description

I ran a Spark job that outputs thousands of parquet files (aka there are thousands of reducers), and it hung for several minutes in the driver after all tasks were complete. Here is a very simple repro of the job (to be run in a spark-shell):

spark.range(1L << 20).repartition(1 << 14).write.save("gs://some/path")

Spark actually calls into Mapreduce's FileOuputCommitter. Job committing (specifically cleanupJob()) recursively deletes the job temporary directory, which is something like "gs://some/path/_temporary". If I understand correctly, on HDFS, this would be O(1), but on GCS (and every HCFS I know), this requires a full file tree walk. Deleting tens of thousands of objects in GCS takes several minutes.

I propose that commitTask() recursively deletes its the task attempt temp directory (something like "gs://some/path/_temporary/attempt1/task1"). On HDFS, this is O(1) per task, so this is very little overhead per task. On GCS (and other HCFSs), this adds parallelism for deleting the job temp directory.

With the attached patch, the repro above went from taking ~10 minutes to taking ~5 minutes, and task time did not significantly change.

Side note: I found this issue with Spark, but I assume it applies to a Mapreduce job with thousands of reducers as well.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-7029-branch-2.005.patch
12/Jan/18 21:29
7 kB
Karthik Palaniappan
MAPREDUCE-7029-branch-2.005.patch
16/Jan/18 21:04
7 kB
Jason Darrell Lowe
MAPREDUCE-7029-branch-2.004.patch
10/Jan/18 23:56
7 kB
Karthik Palaniappan
MAPREDUCE-7029.005.patch
12/Jan/18 21:29
7 kB
Karthik Palaniappan
MAPREDUCE-7029.004.patch
10/Jan/18 23:57
7 kB
Karthik Palaniappan
MAPREDUCE-7029.003.patch
10/Jan/18 17:59
7 kB
Karthik Palaniappan
MAPREDUCE-7029.002.patch
01/Jan/18 00:49
3 kB
Karthik Palaniappan
MAPREDUCE-7029.001.patch
28/Dec/17 00:03
1 kB
Karthik Palaniappan

Activity

People

Assignee:: Karthik Palaniappan

Reporter:: Karthik Palaniappan

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 28/Dec/17 00:02

Updated:: 23/Jan/18 16:59

Resolved:: 17/Jan/18 16:32