[MAPREDUCE-6992] Race for temp dir in LocalDistributedCacheManager.java - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

When localizing distributed cache files in "local" mode, LocalDistributedCacheManager.java chooses a "unique" directory based on a millisecond time stamp. When running code with some parallelism, it's possible to run into this.

The error message looks like

bq. java.io.FileNotFoundException: jenkins/mapred/local/1508958341829_tmp does not exist

I ran into this in Impala's data loading. There, we run a HiveServer2 which runs in MapReduce. If multiple queries are submitted simultaneously to the HS2, they conflict on this directory. Googling found that StreamSets ran into something very similar looking at https://issues.streamsets.com/browse/SDC-5473.

I believe the buggy code is (link: https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L94)

    // Generating unique numbers for FSDownload.
    AtomicLong uniqueNumberGenerator =
        new AtomicLong(System.currentTimeMillis());

Notably, a similar code path uses an actual random number generator (LocalJobRunner.java, https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java#L912).

  public String getStagingAreaDir() throws IOException {
    Path stagingRootDir = new Path(conf.get(JTConfig.JT_STAGING_AREA_ROOT,
        "/tmp/hadoop/mapred/staging"));
    UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
    String user;
    randid = rand.nextInt(Integer.MAX_VALUE);

Attachments

Issue Links

duplicates

MAPREDUCE-6441 Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Philip Martin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Oct/17 18:14

Updated:: 26/Oct/17 18:45

Resolved:: 26/Oct/17 18:45