Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.16.2
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      10 node cluster + 1 namenode

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changed 'du' command to run in a seperate thread so that it does not block user.

      Description

      I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
      Unfortunately we're seeing datanode timeout issues. In previous versions we've often seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again.

      This causes mild discomfort but the big problems appear when all nodes do this at once, as happened a few times after the upgrade.
      It was suggested that this could be due to namenode garbage collection, but looking at the gc log output it doesn't seem to be the case.

      1. du-nonblocking-v6-trunk.patch
        7 kB
        Johan Oskarsson
      2. du-nonblocking-v5-trunk.patch
        6 kB
        Johan Oskarsson
      3. du-nonblocking-v4-trunk.patch
        6 kB
        Johan Oskarsson
      4. du-nonblocking-v2-trunk.patch
        4 kB
        Johan Oskarsson
      5. du-nonblocking-v1.patch
        8 kB
        Johan Oskarsson
      6. hadoop-hadoop-datanode-new.log
        6.15 MB
        Johan Oskarsson
      7. hadoop-hadoop-datanode-new.out
        60 kB
        Johan Oskarsson
      8. hadoop-hadoop-namenode-master2.out
        25 kB
        Johan Oskarsson
      9. hadoop-hadoop-datanode.out
        114 kB
        Johan Oskarsson

        Issue Links

          Activity

          Johan Oskarsson created issue -
          Johan Oskarsson made changes -
          Field Original Value New Value
          Attachment hadoop-hadoop-datanode.out [ 12379856 ]
          Johan Oskarsson made changes -
          Attachment hadoop-hadoop-namenode-master2.out [ 12379857 ]
          Johan Oskarsson made changes -
          Attachment hadoop-hadoop-datanode-new.out [ 12379871 ]
          Johan Oskarsson made changes -
          Attachment hadoop-hadoop-datanode-new.log [ 12379873 ]
          Raghu Angadi made changes -
          Fix Version/s 0.18.0 [ 12312972 ]
          Fix Version/s 0.16.3 [ 12313092 ]
          Johan Oskarsson made changes -
          Attachment du-nonblocking-v1.patch [ 12381254 ]
          Johan Oskarsson made changes -
          Attachment du-nonblocking-v2-trunk.patch [ 12381779 ]
          Johan Oskarsson made changes -
          Attachment du-nonblocking-v4-trunk.patch [ 12381956 ]
          Johan Oskarsson made changes -
          Attachment du-nonblocking-v5-trunk.patch [ 12382039 ]
          Johan Oskarsson made changes -
          Affects Version/s 0.16.4 [ 12313132 ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Johan Oskarsson [ johanoskarsson ]
          Affects Version/s 0.16.3 [ 12313092 ]
          Hadoop Flags [Reviewed]
          Johan Oskarsson made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Johan Oskarsson made changes -
          Attachment du-nonblocking-v6-trunk.patch [ 12382371 ]
          Johan Oskarsson made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.16.4 [ 12313132 ]
          Affects Version/s 0.16.3 [ 12313092 ]
          Raghu Angadi made changes -
          Resolution Fixed [ 1 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Release Note  DU class runs the 'du' command in a seperate thread so that it does not block user. DataNode might miss heartbeats one large nodes otherwise.
          Robert Chansler made changes -
          Release Note  DU class runs the 'du' command in a seperate thread so that it does not block user. DataNode might miss heartbeats one large nodes otherwise. Changed 'du' command to run in a seperate thread so that it does not block user.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Raghu Angadi made changes -
          Link This issue is related to HADOOP-4584 [ HADOOP-4584 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]

            People

            • Assignee:
              Johan Oskarsson
              Reporter:
              Johan Oskarsson
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development