Hadoop Common
  1. Hadoop Common
  2. HADOOP-1232

Datanode did not get removed from blockMap when a datanode was down

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.12.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      After a datanode shuted down, the following exception was thrown when a job tried to open a file with blocks on the data node. It looks that the datanode was removed from NetworkTopology but not from the blockMap.

      org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.IllegalArgumentException: Unexpected non-existing data node: /xxx/yyy:50010
      at org.apache.hadoop.net.NetworkTopology.checkArgument(NetworkTopology.java:379)
      at org.apache.hadoop.net.NetworkTopology.getDistance(NetworkTopology.java:396)
      at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser$1.compare(FSNamesystem.java:3161)
      at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser$1.compare(FSNamesystem.java:3160)
      at java.util.Arrays.mergeSort(Arrays.java:1270)
      at java.util.Arrays.sort(Arrays.java:1210)
      at java.util.Collections.sort(Collections.java:159)
      at org.apache.hadoop.dfs.FSNamesystem$ReplicationTargetChooser.sortByDistance(FSNamesystem.java:3159)
      at org.apache.hadoop.dfs.FSNamesystem.open(FSNamesystem.java:549)
      at org.apache.hadoop.dfs.NameNode.open(NameNode.java:250)
      at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)

      at org.apache.hadoop.ipc.Client.call(Client.java:471)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
      at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:511)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:498)
      at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:207)
      at org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.open(DistributedFileSystem.java:129)
      at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110)
      at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330)
      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245)
      at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:54)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:139)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)

        Activity

        Hairong Kuang created issue -
        Hairong Kuang made changes -
        Field Original Value New Value
        Fix Version/s 0.13.0 [ 12312348 ]
        Affects Version/s 0.12.3 [ 12312403 ]
        Component/s dfs [ 12310710 ]
        Hide
        Raghu Angadi added a comment -

        Do you know if this was a one time exception or a persistent one (ie, happens every time we try to open the file)?

        Show
        Raghu Angadi added a comment - Do you know if this was a one time exception or a persistent one (ie, happens every time we try to open the file)?
        Hide
        Hairong Kuang added a comment -

        The datanode persistently stayed in the blockmap after it got shuted down.

        Show
        Hairong Kuang added a comment - The datanode persistently stayed in the blockmap after it got shuted down.
        Doug Cutting made changes -
        Fix Version/s 0.13.0 [ 12312348 ]
        Hide
        dhruba borthakur added a comment -

        This portion of code has gone away since 0.13 release. It is very likely that you won't see this problem with 0.13 and later releases.

        Show
        dhruba borthakur added a comment - This portion of code has gone away since 0.13 release. It is very likely that you won't see this problem with 0.13 and later releases.
        Hide
        Raghu Angadi added a comment -

        haven't seen this in a long while.

        Show
        Raghu Angadi added a comment - haven't seen this in a long while.
        Raghu Angadi made changes -
        Resolution Cannot Reproduce [ 5 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s dfs [ 12310710 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        469d 5h 7m 1 Raghu Angadi 22/Jul/08 00:56
        Resolved Resolved Closed Closed
        56d 17h 27m 1 Doug Cutting 16/Sep/08 18:24

          People

          • Assignee:
            Unassigned
            Reporter:
            Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development