Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      java.io.IOException: java.lang.NullPointerException
      	at org.apache.hadoop.hdfs.server.datanode.FSDataset.updateReplicaUnderRecovery(FSDataset.java:2089)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.updateReplicaUnderRecovery(DataNode.java:1598)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516)
      	...
      

      The problem can be reproduced by calling updateReplicaUnderRecovery(blockid) with blockid not in the volumeMap.

      1. h676_20091005.patch
        2 kB
        Tsz Wo Nicholas Sze
      2. updateRUR.patch
        5 kB
        Konstantin Shvachko

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #47 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #47 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/47/ )
          Hide
          Konstantin Shvachko added a comment -

          I just committed this.

          Show
          Konstantin Shvachko added a comment - I just committed this.
          Hide
          Konstantin Shvachko added a comment -

          > Is the patch missing DataNode changes?
          Changes in DataNode were not required because I renamed FSDataset.updateReplica() to updateReplicaUnderRecovery(), so the DataNode call remains the same, but it actually calls another method.

          Show
          Konstantin Shvachko added a comment - > Is the patch missing DataNode changes? Changes in DataNode were not required because I renamed FSDataset.updateReplica() to updateReplicaUnderRecovery() , so the DataNode call remains the same, but it actually calls another method.
          Hide
          Konstantin Shvachko added a comment -

          Checked the logs.
          Failure of TestDataTransferProtocol is related to HDFS-668, when updatePipeline() from the client comes after addBlock() from data-node. In this case the client was trying to close the file after pipeline failure and this was going on forever, because we don't have a limit for close retries.
          Failure of TestBlockUnderConstruction is reported in HDFS-682.
          The rest seems to be good.

          Show
          Konstantin Shvachko added a comment - Checked the logs. Failure of TestDataTransferProtocol is related to HDFS-668 , when updatePipeline() from the client comes after addBlock() from data-node. In this case the client was trying to close the file after pipeline failure and this was going on forever, because we don't have a limit for close retries. Failure of TestBlockUnderConstruction is reported in HDFS-682 . The rest seems to be good.
          Hide
          Suresh Srinivas added a comment -

          After discussing with Konstantin, I understand that the problem is fixed by changes in FSDataSet.updateReplicaUnderRecovery(). +1 for the patch.

          Show
          Suresh Srinivas added a comment - After discussing with Konstantin, I understand that the problem is fixed by changes in FSDataSet.updateReplicaUnderRecovery(). +1 for the patch.
          Hide
          Suresh Srinivas added a comment -

          Is the patch missing DataNode changes?

          Show
          Suresh Srinivas added a comment - Is the patch missing DataNode changes?
          Hide
          Giridharan Kesavan added a comment -

          the patch build on hudson was running for more than a day , so i had to kill it.
          http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.nedt/58/
          please checkout the console log for details
          http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/58/console

          Show
          Giridharan Kesavan added a comment - the patch build on hudson was running for more than a day , so i had to kill it. http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.nedt/58/ please checkout the console log for details http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/58/console
          Hide
          Konstantin Shvachko added a comment -

          This should fix the problem, although I still don't know what was Nicholas's test case. I just called the method witrh a block id that does not exist.

          Show
          Konstantin Shvachko added a comment - This should fix the problem, although I still don't know what was Nicholas's test case. I just called the method witrh a block id that does not exist.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Also could you please give more details on how to reporoduce the error.

          I have updated the description.

          Show
          Tsz Wo Nicholas Sze added a comment - > Also could you please give more details on how to reporoduce the error. I have updated the description.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          It turns out that Datanode should call not call FSDataset.updateReplicaUnderRecovery(..). It should call FSDataset.updateReplica(..), which checks null and other potential problems. See also HDFS-658.

          Show
          Tsz Wo Nicholas Sze added a comment - It turns out that Datanode should call not call FSDataset.updateReplicaUnderRecovery(..). It should call FSDataset.updateReplica(..), which checks null and other potential problems. See also HDFS-658 .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          See also this.

          Show
          Tsz Wo Nicholas Sze added a comment - See also this .
          Hide
          Tsz Wo Nicholas Sze added a comment -

          h676_20091005.patch: copied the code from h627_20090924.patch posted in HDFS-627.

          Show
          Tsz Wo Nicholas Sze added a comment - h676_20091005.patch: copied the code from h627_20090924.patch posted in HDFS-627 .
          Hide
          Konstantin Shvachko added a comment -

          Don't think we should file bugs as subtasks of HDFS-265. We should just link them to the main jira.
          Also could you please give more details on how to reporoduce the error.

          Show
          Konstantin Shvachko added a comment - Don't think we should file bugs as subtasks of HDFS-265 . We should just link them to the main jira. Also could you please give more details on how to reporoduce the error.

            People

            • Assignee:
              Konstantin Shvachko
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development