Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2573

TestFiDataXceiverServer is failing, not testing OOME

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 2.0.0-alpha
    • Fix Version/s: 0.22.0
    • Component/s: datanode, test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestFiDataXceiverServer is failing occasionally. It turns out also that the test is not testing the desired condition, because OOME is not caught in DataXceiverServer.run()

      1. HDFS-2573.patch
        5 kB
        Konstantin Boudnik
      2. HDFS-2573.patch
        6 kB
        Uma Maheswara Rao G

        Issue Links

          Activity

          Hide
          Konstantin Shvachko added a comment -

          The test was introduced in HDFS-2451 to test behavior of DataXceiverServer in case OutOfMemoryError is thrown. There were several versions of the test using mocking and aspects. With current version I see that DataXceiverServer.run() does not catch OOME thrown as a fault injection.

          What happens here is:
          new Daemon(datanode.threadGroup,
          new DataXceiver(s, datanode, this)).start();
          This creates a new thread, and then invokes DataXceiver.run() with fault injection, which throws OOME as expected. BUT, this is thrown in the new thread, which is different from the thread
          DataXceiverServer.run() is in. And the new thread does not catch the exception.
          I clearly see dispatchUncaughtException() when I run it in debugger.

          I propose to remove the test.
          I think we tried as hard as we could, but couldn't reasonably reproduce the scenario.

          Show
          Konstantin Shvachko added a comment - The test was introduced in HDFS-2451 to test behavior of DataXceiverServer in case OutOfMemoryError is thrown. There were several versions of the test using mocking and aspects. With current version I see that DataXceiverServer.run() does not catch OOME thrown as a fault injection. What happens here is: new Daemon(datanode.threadGroup, new DataXceiver(s, datanode, this)).start(); This creates a new thread, and then invokes DataXceiver.run() with fault injection, which throws OOME as expected. BUT, this is thrown in the new thread, which is different from the thread DataXceiverServer.run() is in. And the new thread does not catch the exception. I clearly see dispatchUncaughtException() when I run it in debugger. I propose to remove the test. I think we tried as hard as we could, but couldn't reasonably reproduce the scenario.
          Hide
          Konstantin Boudnik added a comment -

          I agree with your analysis Konstantin: test works as expected however OOME thrown in threadB isn't getting caught in threadA starting the former.
          I have removed all AOP code from the test and manually added OOME throwing to DataXceiver#run method - and see exactly the same behavior.

          The only modification I can think of in this case is to throw OOME before new DataXceiver thread is spawned, but this defeats the purpose of the test completely. We can exclude it for now in a hope to find a better solution later or just completely remove it. Either way is fine with me.

          Show
          Konstantin Boudnik added a comment - I agree with your analysis Konstantin: test works as expected however OOME thrown in threadB isn't getting caught in threadA starting the former. I have removed all AOP code from the test and manually added OOME throwing to DataXceiver#run method - and see exactly the same behavior. The only modification I can think of in this case is to throw OOME before new DataXceiver thread is spawned, but this defeats the purpose of the test completely. We can exclude it for now in a hope to find a better solution later or just completely remove it. Either way is fine with me.
          Hide
          Konstantin Boudnik added a comment -

          I think there's a couple of approaches here:

          • overwrite datanode.threadGroup#uncaughtException method
          • inject a bit more code DataXceiver so a listener can be registered and notified when OOME is thrown in run().

          Both are doable, the latter is a way simpler.

          Show
          Konstantin Boudnik added a comment - I think there's a couple of approaches here: overwrite datanode.threadGroup#uncaughtException method inject a bit more code DataXceiver so a listener can be registered and notified when OOME is thrown in run() . Both are doable, the latter is a way simpler.
          Hide
          Konstantin Shvachko added a comment -

          I was thinking along these lines, but it's not exactly the use case we were trying to model.
          Well, not everything is meant to be tested. Let's remove it.

          Show
          Konstantin Shvachko added a comment - I was thinking along these lines, but it's not exactly the use case we were trying to model. Well, not everything is meant to be tested. Let's remove it.
          Hide
          Konstantin Boudnik added a comment -

          Good riddance.

          Show
          Konstantin Boudnik added a comment - Good riddance.
          Hide
          Konstantin Boudnik added a comment -

          This ticket reverts test code introduced in HDFS-2452

          Show
          Konstantin Boudnik added a comment - This ticket reverts test code introduced in HDFS-2452
          Hide
          Konstantin Boudnik added a comment -

          The patch to remove the test has been attached. However, the problem isn't in the test code which is completely appropriate. The problem is in untestable implementation of parts of DataXceieverServer and that's the one needs fixing, actually.

          Show
          Konstantin Boudnik added a comment - The patch to remove the test has been attached. However, the problem isn't in the test code which is completely appropriate. The problem is in untestable implementation of parts of DataXceieverServer and that's the one needs fixing, actually.
          Hide
          Konstantin Shvachko added a comment -

          Agreed.
          +1 for the patch.

          Show
          Konstantin Shvachko added a comment - Agreed. +1 for the patch.
          Hide
          Konstantin Boudnik added a comment -

          I have committed it.

          Show
          Konstantin Boudnik added a comment - I have committed it.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #111 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/111/)
          HDFS-2573. TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik.

          cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204279
          Files :

          • /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.22/hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj
          • /hadoop/common/branches/branch-0.22/hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #111 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/111/ ) HDFS-2573 . TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik. cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204279 Files : /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt /hadoop/common/branches/branch-0.22/hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj /hadoop/common/branches/branch-0.22/hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks Cos, for the patch. This needs to be corrected in trunk as well. Here is a patch for trunk.

          Show
          Uma Maheswara Rao G added a comment - Thanks Cos, for the patch. This needs to be corrected in trunk as well. Here is a patch for trunk.
          Hide
          Konstantin Boudnik added a comment -

          Thanks Uma, I will commit it to the trunk later today

          Show
          Konstantin Boudnik added a comment - Thanks Uma, I will commit it to the trunk later today
          Hide
          Konstantin Boudnik added a comment -

          Also committed to the trunk

          Show
          Konstantin Boudnik added a comment - Also committed to the trunk
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1332 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1332/)
          HDFS-2573. TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik.

          cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1332 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1332/ ) HDFS-2573 . TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik. cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1383 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1383/)
          HDFS-2573. TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik.

          cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1383 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1383/ ) HDFS-2573 . TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik. cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1310 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1310/)
          HDFS-2573. TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik.

          cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1310 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1310/ ) HDFS-2573 . TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik. cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #871 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/871/)
          HDFS-2573. TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik.

          cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #871 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/871/ ) HDFS-2573 . TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik. cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #905 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/905/)
          HDFS-2573. TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik.

          cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #905 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/905/ ) HDFS-2573 . TestFiDataXceiverServer is failing, not testing OOME. Contributed by Konstantin Boudnik. cos : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204781 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/DataXceiverAspects.aj /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/aop/org/apache/hadoop/hdfs/server/datanode/TestFiDataXceiverServer.java

            People

            • Assignee:
              Konstantin Boudnik
              Reporter:
              Konstantin Shvachko
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development