Hadoop Common
  1. Hadoop Common
  2. HADOOP-1224

"Browse the filesystem" link pointing to a dead data-node

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.3
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      On the NameNode status web page "Browse the filesystem" link can point to a dead data-node.
      The reason for that is that FSNamesystem.randomDataNode() selects a random node from the
      list of all nodes rather then selecting among alive nodes only.

      1. DFSBrowsingDeadNode_v1.0.patch
        1 kB
        Enis Soztutar
      2. DFSBrowsingDeadNode_v1.1.patch
        1 kB
        Enis Soztutar
      3. DFSBrowsingDeadNode_v1.2.patch
        0.7 kB
        Enis Soztutar

        Activity

        Hide
        Enis Soztutar added a comment -

        This patch
        1.changes randomDataNode() so that it will skip deadnodes and decommissioned nodes. Starts with a random data node and checks the data nodes sequentially until a live node is found.
        2.changes the return type of the getDatanodeByIndex() from DatanodeInfo to DatanodeDescriptor.

        Show
        Enis Soztutar added a comment - This patch 1.changes randomDataNode() so that it will skip deadnodes and decommissioned nodes. Starts with a random data node and checks the data nodes sequentially until a live node is found. 2.changes the return type of the getDatanodeByIndex() from DatanodeInfo to DatanodeDescriptor.
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12355503/DFSBrowsingDeadNode_v1.0.patch applied and successfully tested against trunk revision r528230. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/43/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/43/console
        Hide
        dhruba borthakur added a comment -

        +1. Code looks good.

        Show
        dhruba borthakur added a comment - +1. Code looks good.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Enis!

        Show
        Tom White added a comment - I've just committed this. Thanks Enis!
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #60 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/60/ )
        Hide
        Konstantin Shvachko added a comment -

        Now its even worse, we select ONLY! dead nodes.
        if (d != null && !d.isDecommissioned() && isDatanodeDead(d) &&
        !d.isDecommissionInProgress()) {
        return d.getHost() + ":" + d.getInfoPort();
        Did anybody ever actually tried to click the link?

        Show
        Konstantin Shvachko added a comment - Now its even worse, we select ONLY! dead nodes. if (d != null && !d.isDecommissioned() && isDatanodeDead(d) && !d.isDecommissionInProgress()) { return d.getHost() + ":" + d.getInfoPort(); Did anybody ever actually tried to click the link?
        Hide
        Enis Soztutar added a comment -

        This patch applies to current trunk(534354). Fixes the bug in [forgotten!] negation in check in isDataNodeDead(), introduces a function isDataNodeLive()

        Show
        Enis Soztutar added a comment - This patch applies to current trunk(534354). Fixes the bug in [forgotten!] negation in check in isDataNodeDead(), introduces a function isDataNodeLive()
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12356611/DFSBrowsingDeadNode_v1.1.patch applied and successfully tested against trunk revision r534234. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/100/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/100/console
        Hide
        Tom White added a comment -

        Thanks Enis. Have you manually tested this latest patch as Konstantin suggests?

        Show
        Tom White added a comment - Thanks Enis. Have you manually tested this latest patch as Konstantin suggests?
        Hide
        Enis Soztutar added a comment -

        Finally, i was able to test the patch.
        Manually i have set up a cluster with 2 DN and one NN.
        After intentionally killing one DN, or decommisioning one DN, browsing worked as expected. Sorry for the previously untested version

        Show
        Enis Soztutar added a comment - Finally, i was able to test the patch. Manually i have set up a cluster with 2 DN and one NN. After intentionally killing one DN, or decommisioning one DN, browsing worked as expected. Sorry for the previously untested version
        Hide
        Konstantin Shvachko added a comment -

        It is confusing if a data-node can be neither dead nor alive. In your patch
        isDatanodeDead() =/= ! isDatanodeLive()
        The patch should merely add "!" imo.

        Show
        Konstantin Shvachko added a comment - It is confusing if a data-node can be neither dead nor alive. In your patch isDatanodeDead() =/= ! isDatanodeLive() The patch should merely add "!" imo.
        Hide
        Tom White added a comment -

        > The patch should merely add "!" imo.

        +1

        Show
        Tom White added a comment - > The patch should merely add "!" imo. +1
        Hide
        Enis Soztutar added a comment -

        The confusing thing here, IMO, is that the admin status of the datanode, can be either NORMAL, DECOMMISIONED or DECOMMISSION_IN_PROGRESS, and if the admin state is normal, it can be either dead or "not dead". So a data node, from the perspective of the end user, can be in one of the four states : live, dead, decommissioned or decommission_in_progress. Thus isDatanodeDead() =/= ! isDatanodeLive().

        Show
        Enis Soztutar added a comment - The confusing thing here, IMO, is that the admin status of the datanode, can be either NORMAL, DECOMMISIONED or DECOMMISSION_IN_PROGRESS, and if the admin state is normal, it can be either dead or "not dead". So a data node, from the perspective of the end user, can be in one of the four states : live, dead, decommissioned or decommission_in_progress. Thus isDatanodeDead() =/= ! isDatanodeLive().
        Hide
        Enis Soztutar added a comment -

        one line patch that adds "!" .

        Show
        Enis Soztutar added a comment - one line patch that adds "!" .
        Hide
        Konstantin Shvachko added a comment -

        +1

        Show
        Konstantin Shvachko added a comment - +1
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Enis!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Enis!
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #82 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/82/ )

          People

          • Assignee:
            Enis Soztutar
            Reporter:
            Konstantin Shvachko
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development