Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10720

Fix intermittent test failure of TestDataNodeErasureCodingMetrics

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha1
    • None
    • None

    Description

      The test is wrongly finding out the datanode to be corrupted from the block locations. Instead of finding out a datanode which is used in the block locations it is simply getting a datanode from the cluster, which may not be a datanode present in the block locations.

          byte[] indices = lastBlock.getBlockIndices();
          //corrupt the first block
          DataNode toCorruptDn = cluster.getDataNodes().get(indices[0]);
      

      For example, datanodes in the cluster.getDataNodes() array indexed like, 0->Dn1, 1->Dn2, 2->Dn3, 3->Dn4, 4->Dn5, 5->Dn6, 6->Dn7, 7->Dn8, 8->Dn9, 9->Dn10

      Assume the datanodes which are part of block location is => Dn2, Dn3, Dn4, Dn5, Dn6, Dn7, Dn8, Dn9, Dn10. Now, in the failed scenario, it is getting the corrupted datanode as cluster.getDataNodes().get(0) which will be Dn1 and corruption of this datanode will not result in ECWork and is failing the tests.

      Ideally, the test should find a datanode from the block locations and corrupt it, that will trigger ECWork.

      Attachments

        1. HDFS-10720-03.patch
          4 kB
          Rakesh Radhakrishnan
        2. HDFS-10720-02.patch
          5 kB
          Rakesh Radhakrishnan
        3. HDFS-10720-01.patch
          3 kB
          Rakesh Radhakrishnan
        4. HDFS-10720-00.patch
          3 kB
          Rakesh Radhakrishnan

        Activity

          People

            rakeshr Rakesh Radhakrishnan
            rakeshr Rakesh Radhakrishnan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: