Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2390

JobTracker and TaskTrackers fail with a misleading error if one of the mapreduce.cluster.dir has unusable permissions / is unavailable.

    Details

    • Tags:
      jobtracker, tasktracker

      Description

      To reproduce, have a mapred.local.dir property set to a few directories. Before starting up the JT, set one of these directories' permission as 'd---------', and then start the JT/TT. The JT, although it tries to ignore this directory, fails with an odd and misleading message claiming that its configured address in use.

      Fixing the permission clears this issue!

      This was also reported in the mailing lists by Ted Yu, quite a few months ago. But I had forgotten about filing a bug for it here. Still seems to happen. A log is attached below.

      2011-03-17 00:40:32,321 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: java.io.IOException: Cannot create toBeDeleted in /home/hack/.tmplocalz/2
              at org.apache.hadoop.util.MRAsyncDiskService.<init>(MRAsyncDiskService.java:86)
              at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2189)
              at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2022)
              at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:276)
              at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:268)
              at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4712)
      
      2011-03-17 00:40:33,322 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
      2011-03-17 00:40:33,322 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
      2011-03-17 00:40:33,322 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
      2011-03-17 00:40:33,322 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
      2011-03-17 00:40:33,322 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
      2011-03-17 00:40:33,350 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as hack
      2011-03-17 00:40:33,351 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to localhost/127.0.0.1:8021 : Address already in use
              at org.apache.hadoop.ipc.Server.bind(Server.java:227)
              at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:314)
              at org.apache.hadoop.ipc.Server.<init>(Server.java:1411)
              at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:510)
              at org.apache.hadoop.ipc.RPC.getServer(RPC.java:471)
              at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2112)
              at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2022)
              at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:276)
              at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:268)
              at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4712)
      Caused by: java.net.BindException: Address already in use
              at sun.nio.ch.Net.bind(Native Method)
              at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
              at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
              at org.apache.hadoop.ipc.Server.bind(Server.java:225)
              ... 9 more
      
      2011-03-17 00:40:33,352 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 
      /************************************************************
      SHUTDOWN_MSG: Shutting down JobTracker at QDuo/127.0.0.1
      ************************************************************/
      

      The list conversation in context, at search-hadoop.com:
      http://search-hadoop.com/m/FzN7iqreL/problem+starting+cdh3b2+jobtracker&subj=problem+starting+cdh3b2+jobtracker

      I'll try to investigate and post the exact problem / solution soon.

        Activity

        Harsh J created issue -
        Hide
        Todd Lipcon added a comment -

        Is this reproducible on trunk too? If it's only in CDH, this should go on the Cloudera JIRA.

        Show
        Todd Lipcon added a comment - Is this reproducible on trunk too? If it's only in CDH, this should go on the Cloudera JIRA.
        Hide
        Harsh J added a comment -

        MAPREDUCE-1382 fixes this, but what's odd is that:

        JT says this:

        2011-06-12 00:00:34,924 WARN org.apache.hadoop.util.MRAsyncDiskService: Cannot create toBeDeleted in /Users/harshchouraria/Work/installs/temp-space/mapred/local1. Ignored.
        

        TT does this instead:

        2011-06-12 00:00:35,980 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /Users/harshchouraria/Work/installs/temp-space/mapred/local1, expected: rwxr-xr-x, while actual: ---------. Fixing...
        

        May be worth making JT do this too, if its gonna use mapred.local.dir (or w/e its new alias is) after all.

        Show
        Harsh J added a comment - MAPREDUCE-1382 fixes this, but what's odd is that: JT says this: 2011-06-12 00:00:34,924 WARN org.apache.hadoop.util.MRAsyncDiskService: Cannot create toBeDeleted in /Users/harshchouraria/Work/installs/temp-space/mapred/local1. Ignored. TT does this instead: 2011-06-12 00:00:35,980 WARN org.apache.hadoop.util.DiskChecker: Incorrect permissions were set on /Users/harshchouraria/Work/installs/temp-space/mapred/local1, expected: rwxr-xr-x, while actual: ---------. Fixing... May be worth making JT do this too, if its gonna use mapred.local.dir (or w/e its new alias is) after all.
        Hide
        Harsh J added a comment -

        Resolving as Duplicate of MAPREDUCE-1382 (Which is fixed in 0.22)

        Show
        Harsh J added a comment - Resolving as Duplicate of MAPREDUCE-1382 (Which is fixed in 0.22)
        Harsh J made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Release Note Ignore bad locations in mapreduce.cluster.dir / mapred.local.dir directories in a proper fashion.
        Fix Version/s 0.22.0 [ 12314184 ]
        Resolution Duplicate [ 3 ]

          People

          • Assignee:
            Harsh J
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development