Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.16.2
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      10 node cluster + 1 namenode

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changed 'du' command to run in a seperate thread so that it does not block user.

      Description

      I recently upgraded to 0.16.2 from 0.15.2 on our 10 node cluster.
      Unfortunately we're seeing datanode timeout issues. In previous versions we've often seen in the nn webui that one or two datanodes "last contact" goes from the usual 0-3 sec to ~200-300 before it drops down to 0 again.

      This causes mild discomfort but the big problems appear when all nodes do this at once, as happened a few times after the upgrade.
      It was suggested that this could be due to namenode garbage collection, but looking at the gc log output it doesn't seem to be the case.

      1. hadoop-hadoop-namenode-master2.out
        25 kB
        Johan Oskarsson
      2. hadoop-hadoop-datanode-new.out
        60 kB
        Johan Oskarsson
      3. hadoop-hadoop-datanode-new.log
        6.15 MB
        Johan Oskarsson
      4. hadoop-hadoop-datanode.out
        114 kB
        Johan Oskarsson
      5. du-nonblocking-v6-trunk.patch
        7 kB
        Johan Oskarsson
      6. du-nonblocking-v5-trunk.patch
        6 kB
        Johan Oskarsson
      7. du-nonblocking-v4-trunk.patch
        6 kB
        Johan Oskarsson
      8. du-nonblocking-v2-trunk.patch
        4 kB
        Johan Oskarsson
      9. du-nonblocking-v1.patch
        8 kB
        Johan Oskarsson

        Issue Links

          Activity

          Hide
          Johan Oskarsson added a comment -

          Example client exception:

          08/04/10 10:09:27 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.0.5.76:50010
          08/04/10 10:09:27 INFO dfs.DFSClient: Abandoning block blk_-5192866954303337577
          08/04/10 10:09:27 INFO dfs.DFSClient: Waiting to find target node: 10.0.5.70:50010

          Datanode log output from around the same time:

          2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as 10.0.5.76:50010
          2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 10.0.5.76:50010
          2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: PacketResponder blk_-9149681676832791404 2 Exception java.io.EOFException
          at java.io.DataInputStream.readFully(DataInputStream.java:180)
          at java.io.DataInputStream.readLong(DataInputStream.java:399)
          at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1822)
          at java.lang.Thread.run(Thread.java:619)

          2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 2 for block blk_-9149681676832791404 terminating
          2008-04-10 10:09:16,793 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_1511572447827516117 received exception java.net.SocketTimeoutException: Read timed out
          2008-04-10 10:09:16,793 ERROR org.apache.hadoop.dfs.DataNode: 10.0.5.70:50010:DataXceiver: java.net.SocketTimeoutException: Read timed out
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:129)
          at java.net.SocketInputStream.read(SocketInputStream.java:182)
          at java.io.DataInputStream.readByte(DataInputStream.java:248)
          at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:324)
          at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346)
          at org.apache.hadoop.io.Text.readString(Text.java:413)
          at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1117)
          at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
          at java.lang.Thread.run(Thread.java:619)

          2008-04-10 10:09:23,895 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-7442302015902809712 src: /10.0.5.74:55546 dest: /10.0.5.74:50010
          2008-04-10 10:09:23,896 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is
          2008-04-10 10:09:23,927 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-7442302015902809712 of size 902 from /10.0.5.74
          2008-04-10 10:09:23,928 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-7442302015902809712 terminating
          2008-04-10 10:09:23,937 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_4161972554165500020 src: /10.0.11.7:41256 dest: /10.0.11.7:50010
          2008-04-10 10:09:23,968 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as
          2008-04-10 10:09:23,969 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is
          2008-04-10 10:09:24,034 INFO org.apache.hadoop.dfs.DataNode: Received block blk_4161972554165500020 of size 95 from /10.0.11.7
          2008-04-10 10:09:24,034 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 2 for block blk_4161972554165500020 terminating
          2008-04-10 10:09:27,664 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as 10.0.5.76:50010
          2008-04-10 10:09:27,664 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 10.0.5.76:50010
          2008-04-10 10:09:27,665 INFO org.apache.hadoop.dfs.DataNode: PacketResponder blk_-5192866954303337577 2 Exception java.io.EOFException
          at java.io.DataInputStream.readFully(DataInputStream.java:180)
          at java.io.DataInputStream.readLong(DataInputStream.java:399)
          at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1822)
          at java.lang.Thread.run(Thread.java:619)

          Show
          Johan Oskarsson added a comment - Example client exception: 08/04/10 10:09:27 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.0.5.76:50010 08/04/10 10:09:27 INFO dfs.DFSClient: Abandoning block blk_-5192866954303337577 08/04/10 10:09:27 INFO dfs.DFSClient: Waiting to find target node: 10.0.5.70:50010 Datanode log output from around the same time: 2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as 10.0.5.76:50010 2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 10.0.5.76:50010 2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: PacketResponder blk_-9149681676832791404 2 Exception java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1822) at java.lang.Thread.run(Thread.java:619) 2008-04-10 10:09:15,758 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 2 for block blk_-9149681676832791404 terminating 2008-04-10 10:09:16,793 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_1511572447827516117 received exception java.net.SocketTimeoutException: Read timed out 2008-04-10 10:09:16,793 ERROR org.apache.hadoop.dfs.DataNode: 10.0.5.70:50010:DataXceiver: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:324) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:346) at org.apache.hadoop.io.Text.readString(Text.java:413) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1117) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) at java.lang.Thread.run(Thread.java:619) 2008-04-10 10:09:23,895 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-7442302015902809712 src: /10.0.5.74:55546 dest: /10.0.5.74:50010 2008-04-10 10:09:23,896 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-04-10 10:09:23,927 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-7442302015902809712 of size 902 from /10.0.5.74 2008-04-10 10:09:23,928 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-7442302015902809712 terminating 2008-04-10 10:09:23,937 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_4161972554165500020 src: /10.0.11.7:41256 dest: /10.0.11.7:50010 2008-04-10 10:09:23,968 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as 2008-04-10 10:09:23,969 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 2008-04-10 10:09:24,034 INFO org.apache.hadoop.dfs.DataNode: Received block blk_4161972554165500020 of size 95 from /10.0.11.7 2008-04-10 10:09:24,034 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 2 for block blk_4161972554165500020 terminating 2008-04-10 10:09:27,664 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 got response for connect ack from downstream datanode with firstbadlink as 10.0.5.76:50010 2008-04-10 10:09:27,664 INFO org.apache.hadoop.dfs.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 10.0.5.76:50010 2008-04-10 10:09:27,665 INFO org.apache.hadoop.dfs.DataNode: PacketResponder blk_-5192866954303337577 2 Exception java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1822) at java.lang.Thread.run(Thread.java:619)
          Hide
          Johan Oskarsson added a comment -

          Stacktrace of a datanode that lost contact with the namenode. Not sure it helps but doesn't hurt.

          Show
          Johan Oskarsson added a comment - Stacktrace of a datanode that lost contact with the namenode. Not sure it helps but doesn't hurt.
          Hide
          Johan Oskarsson added a comment -

          Namenode stacktrace

          Show
          Johan Oskarsson added a comment - Namenode stacktrace
          Hide
          Johan Oskarsson added a comment -

          Found 68 of these when doing a stackdump in a datanode that had lost contact with the namenode.

          "org.apache.hadoop.dfs.DataNode$DataXceiver@5b02a6" daemon prio=10 tid=0x7192bc00 nid=0x5bc6 waiting for monitor entry [0x70cee000..0x70cef0c0]
          java.lang.Thread.State: BLOCKED (on object monitor)
          at org.apache.hadoop.dfs.FSDataset.getFile(FSDataset.java:867)

          • waiting to lock <0x77ce9360> (a org.apache.hadoop.dfs.FSDataset)
            at org.apache.hadoop.dfs.FSDataset.isValidBlock(FSDataset.java:795)
            at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:614)
            at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1995)
            at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
            at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
            at java.lang.Thread.run(Thread.java:619)
          Show
          Johan Oskarsson added a comment - Found 68 of these when doing a stackdump in a datanode that had lost contact with the namenode. "org.apache.hadoop.dfs.DataNode$DataXceiver@5b02a6" daemon prio=10 tid=0x7192bc00 nid=0x5bc6 waiting for monitor entry [0x70cee000..0x70cef0c0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.dfs.FSDataset.getFile(FSDataset.java:867) waiting to lock <0x77ce9360> (a org.apache.hadoop.dfs.FSDataset) at org.apache.hadoop.dfs.FSDataset.isValidBlock(FSDataset.java:795) at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:614) at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1995) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) at java.lang.Thread.run(Thread.java:619)
          Hide
          Raghu Angadi added a comment -

          > Found 68 of these when doing a stackdump in a datanode that had lost contact with the namenode.

          Could you attach the full stack trace of this datanode?

          Show
          Raghu Angadi added a comment - > Found 68 of these when doing a stackdump in a datanode that had lost contact with the namenode. Could you attach the full stack trace of this datanode?
          Hide
          Johan Oskarsson added a comment -

          As requested

          Show
          Johan Oskarsson added a comment - As requested
          Hide
          Raghu Angadi added a comment -

          While you are at it, could you attach log (.log file) for this datanode as well? The log file show what activity is going on now. Also list approximate time the stack trace was taken.

          My observation is that you are writing a lot of blocks.. and the datanode that looks blocked is blocked while listing all the blocks on the native filesystem. It does this every hour when it sends block reports. Till now nothing suspicious other than heavy write traffic and slow disks. Check iostat on the machine. What is the hardware like?

          The two main threads from the DataNode are :

          1. locks 0x782f1348 :
            "DataNode: [/var/storage/1/dfs/data,/var/storage/2/dfs/data,/var/storage/3/dfs/data,/var/storage/4/dfs/data]" daemon prio=10 tid=0x72409000
             nid=0x44f6 runnable [0x71d8a000..0x71d8aec0]
               java.lang.Thread.State: RUNNABLE
                    at java.io.UnixFileSystem.list(Native Method)
                    at java.io.File.list(File.java:973)
                    at java.io.File.listFiles(File.java:1051)
                    at org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:153)
                    at org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:149)
                    at org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:149)
                    at org.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:368)
                    at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:434)
                    - locked <0x782f1348> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
                    at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:781)
                    at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:642)
                    at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2431)
                    at java.lang.Thread.run(Thread.java:619)
            
          2. locked 0x77ce9360 and waiting on 0x782f1348
            "org.apache.hadoop.dfs.DataNode$DataXceiver@101f287" daemon prio=10 tid=0x71906400 nid=0x5a93 waiting for monitor entry [0x712de000..0x712defc0]
               java.lang.Thread.State: BLOCKED (on object monitor)
            	at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:665)
            	- waiting to lock <0x782f1348> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet)
            	- locked <0x77ce9360> (a org.apache.hadoop.dfs.FSDataset)
            	at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1995)
            	at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074)
            	at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938)
            	at java.lang.Thread.run(Thread.java:619)
            
          3. most other threads are waiting on 0x77ce9360
          Show
          Raghu Angadi added a comment - While you are at it, could you attach log (.log file) for this datanode as well? The log file show what activity is going on now. Also list approximate time the stack trace was taken. My observation is that you are writing a lot of blocks.. and the datanode that looks blocked is blocked while listing all the blocks on the native filesystem. It does this every hour when it sends block reports. Till now nothing suspicious other than heavy write traffic and slow disks. Check iostat on the machine. What is the hardware like? The two main threads from the DataNode are : locks 0x782f1348 : "DataNode: [/var/storage/1/dfs/data,/var/storage/2/dfs/data,/var/storage/3/dfs/data,/var/storage/4/dfs/data]" daemon prio=10 tid=0x72409000 nid=0x44f6 runnable [0x71d8a000..0x71d8aec0] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.list(Native Method) at java.io.File.list(File.java:973) at java.io.File.listFiles(File.java:1051) at org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:153) at org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:149) at org.apache.hadoop.dfs.FSDataset$FSDir.getBlockInfo(FSDataset.java:149) at org.apache.hadoop.dfs.FSDataset$FSVolume.getBlockInfo(FSDataset.java:368) at org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:434) - locked <0x782f1348> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet) at org.apache.hadoop.dfs.FSDataset.getBlockReport(FSDataset.java:781) at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:642) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2431) at java.lang.Thread.run(Thread.java:619) locked 0x77ce9360 and waiting on 0x782f1348 "org.apache.hadoop.dfs.DataNode$DataXceiver@101f287" daemon prio=10 tid=0x71906400 nid=0x5a93 waiting for monitor entry [0x712de000..0x712defc0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:665) - waiting to lock <0x782f1348> (a org.apache.hadoop.dfs.FSDataset$FSVolumeSet) - locked <0x77ce9360> (a org.apache.hadoop.dfs.FSDataset) at org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1995) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) at java.lang.Thread.run(Thread.java:619) most other threads are waiting on 0x77ce9360
          Hide
          Raghu Angadi added a comment -

          Is is there any change in load on the cluster compared to when it was running 0.15?

          Show
          Raghu Angadi added a comment - Is is there any change in load on the cluster compared to when it was running 0.15?
          Hide
          Johan Oskarsson added a comment -

          Here's the log file for that machine.
          Hardware: 8 cores @ 1.86ghz, 8gb ram, 4x750gb 7200 prm disks with gigabit ethernet connections.

          Example iostat from a node that just lost contact with the namenode.

          avg-cpu: %user %nice %sys %iowait %idle
          18.88 0.00 1.60 0.99 78.53

          Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
          sda 0.55 0.72 4.77 4.42 837.81 1831.38 418.91 915.69 290.43 0.26 28.17 7.28 6.69
          sdb 0.46 0.87 4.42 3.92 771.67 1443.65 385.84 721.83 265.93 0.05 5.43 7.87 6.56
          sdc 0.54 0.59 4.59 3.72 823.05 1676.57 411.52 838.28 300.87 0.11 13.06 7.92 6.58
          sdd 0.45 0.63 4.32 3.51 756.46 1433.20 378.23 716.60 279.92 0.29 36.78 7.95 6.22

          Show
          Johan Oskarsson added a comment - Here's the log file for that machine. Hardware: 8 cores @ 1.86ghz, 8gb ram, 4x750gb 7200 prm disks with gigabit ethernet connections. Example iostat from a node that just lost contact with the namenode. avg-cpu: %user %nice %sys %iowait %idle 18.88 0.00 1.60 0.99 78.53 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.55 0.72 4.77 4.42 837.81 1831.38 418.91 915.69 290.43 0.26 28.17 7.28 6.69 sdb 0.46 0.87 4.42 3.92 771.67 1443.65 385.84 721.83 265.93 0.05 5.43 7.87 6.56 sdc 0.54 0.59 4.59 3.72 823.05 1676.57 411.52 838.28 300.87 0.11 13.06 7.92 6.58 sdd 0.45 0.63 4.32 3.51 756.46 1433.20 378.23 716.60 279.92 0.29 36.78 7.95 6.22
          Hide
          Raghu Angadi added a comment - - edited

          Johan, you can enclose the formatted text inside {noformat} ... {noformat} so that it is easier to read.

          Show
          Raghu Angadi added a comment - - edited Johan, you can enclose the formatted text inside { noformat } ... { noformat } so that it is easier to read.
          Hide
          Raghu Angadi added a comment -

          What is the exact iostat command you ran? Is it an average over last 5 seconds or so or overall average?

          From the data

          2008-04-10 09:23:08,667 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 497572 blocks got processed in 381086 msecs
          

          A few things about this:

          • You have around 500K blocks.. mostly of them very small (verification time is very short). This is an order of magnitude larger than our datanodes have here at Yahoo.
          • The above log message says it too 6.5 min for block report. Most of this time I would think is for listing the files in the local directory.
          • Such a large number of blocks should cause similar problem with 0.15 also. Anything you think is different?

          Though DataNode should be able to handle larger number of blocks better, I don't think this is a blocker for 0.16.3 release (unless iostat shows something else). Do you agree?

          Btw, are you planning to have large number of small blocks? Its going to limit NameNode scalability.

          Show
          Raghu Angadi added a comment - What is the exact iostat command you ran? Is it an average over last 5 seconds or so or overall average? From the data 2008-04-10 09:23:08,667 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 497572 blocks got processed in 381086 msecs A few things about this: You have around 500K blocks.. mostly of them very small (verification time is very short). This is an order of magnitude larger than our datanodes have here at Yahoo. The above log message says it too 6.5 min for block report. Most of this time I would think is for listing the files in the local directory. Such a large number of blocks should cause similar problem with 0.15 also. Anything you think is different? Though DataNode should be able to handle larger number of blocks better, I don't think this is a blocker for 0.16.3 release (unless iostat shows something else). Do you agree? Btw, are you planning to have large number of small blocks? Its going to limit NameNode scalability.
          Hide
          Raghu Angadi added a comment -

          I am currently marking this for 0.18. 0.16.3 is close to be released and looks like any fix for this would not be trivial. Also it does not seem like a regression yet. But if it turns out to be different, we can make this a candidate for 0.16.4 or 0.17.

          Show
          Raghu Angadi added a comment - I am currently marking this for 0.18. 0.16.3 is close to be released and looks like any fix for this would not be trivial. Also it does not seem like a regression yet. But if it turns out to be different, we can make this a candidate for 0.16.4 or 0.17.
          Hide
          Johan Oskarsson added a comment -

          Here's the second output from iostat -x 30 from a datanode that just lost contact.

          avg-cpu:  %user   %nice    %sys %iowait   %idle
                    94.26    0.00    0.83    0.00    4.91
          
          Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
          sda          0.40   0.20 137.45  5.10 2031.86 2680.44  1015.93  1340.22    33.06     1.93   13.53   6.72  95.73
          sdb          0.23   0.03  0.33  1.20  145.02  552.22    72.51   276.11   454.87     0.02   10.22   4.78   0.73
          sdc          0.50   0.03  0.70  8.76  307.10 7137.72   153.55  3568.86   786.69     4.70  496.94   6.90   6.53
          sdd          0.47   0.07  0.83  0.53  315.63    4.83   157.81     2.42   234.56     0.06   44.39  10.73   1.47
          

          I'm aware that we have quite a lot of small files, it's an issue we're working on. I guess we'll have to ramp up the priority.
          What you're saying that the block reports are causing this makes sense. I'll merge a few of these directories of small files and see if it improves.

          Perhaps the difference between 0.15 and 0.16 was just that we did hit it quite hard after the upgrade as data had queued up,
          I'm going to see how it behaves today.

          Show
          Johan Oskarsson added a comment - Here's the second output from iostat -x 30 from a datanode that just lost contact. avg-cpu: %user %nice %sys %iowait %idle 94.26 0.00 0.83 0.00 4.91 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.40 0.20 137.45 5.10 2031.86 2680.44 1015.93 1340.22 33.06 1.93 13.53 6.72 95.73 sdb 0.23 0.03 0.33 1.20 145.02 552.22 72.51 276.11 454.87 0.02 10.22 4.78 0.73 sdc 0.50 0.03 0.70 8.76 307.10 7137.72 153.55 3568.86 786.69 4.70 496.94 6.90 6.53 sdd 0.47 0.07 0.83 0.53 315.63 4.83 157.81 2.42 234.56 0.06 44.39 10.73 1.47 I'm aware that we have quite a lot of small files, it's an issue we're working on. I guess we'll have to ramp up the priority. What you're saying that the block reports are causing this makes sense. I'll merge a few of these directories of small files and see if it improves. Perhaps the difference between 0.15 and 0.16 was just that we did hit it quite hard after the upgrade as data had queued up, I'm going to see how it behaves today.
          Hide
          Johan Oskarsson added a comment -

          We've reduced the number of blocks per node to about 150 000, but we're still seeing the same problem as before.

          Show
          Johan Oskarsson added a comment - We've reduced the number of blocks per node to about 150 000, but we're still seeing the same problem as before.
          Hide
          Raghu Angadi added a comment -

          can you check messages similar to "[...]BlockReport of 497572 blocks got processed in 381086 msecs" on such data nodes?

          Show
          Raghu Angadi added a comment - can you check messages similar to " [...] BlockReport of 497572 blocks got processed in 381086 msecs" on such data nodes?
          Hide
          Raghu Angadi added a comment -

          One thing I am not sure yet is why your sda is so much busier than other disks.. somehow all the native filesystem metadata falls on one disk? Another possibility is swapping but size of each is very small (~512 bytes).

          Show
          Raghu Angadi added a comment - One thing I am not sure yet is why your sda is so much busier than other disks.. somehow all the native filesystem metadata falls on one disk? Another possibility is swapping but size of each is very small (~512 bytes).
          Hide
          Raghu Angadi added a comment -

          > "DataNode: [/var/storage/1/dfs/data,/var/storage/2/dfs/data,/var/storage/3/dfs/data,/var/storage/4/dfs/data]"
          is each of /var/storage/[1-4] mounted on different disk or all are on sda?

          Show
          Raghu Angadi added a comment - > "DataNode: [/var/storage/1/dfs/data,/var/storage/2/dfs/data,/var/storage/3/dfs/data,/var/storage/4/dfs/data] " is each of /var/storage/ [1-4] mounted on different disk or all are on sda?
          Hide
          Johan Oskarsson added a comment -

          All those directories are on separate disks.

          One thing I didn't think about was that even though we merged a lot of files to reduce the number of blocks there would still be tons of files in dfs/data/previous, since I had not run -finalizeUpgrade.
          So if I'm not mistaken the du command would still go through all those, taking quite some time. I've now finalized the upgrade and we're seeing better performance: BlockReport of 147083 blocks got processed in 25432 msecs

          Although having the datanodes lose contact with the namenode because it's checking disk usage seems like quite a serious bug to me.

          not sure why sda is more busy, although that is where the logs are located

          Show
          Johan Oskarsson added a comment - All those directories are on separate disks. One thing I didn't think about was that even though we merged a lot of files to reduce the number of blocks there would still be tons of files in dfs/data/previous, since I had not run -finalizeUpgrade. So if I'm not mistaken the du command would still go through all those, taking quite some time. I've now finalized the upgrade and we're seeing better performance: BlockReport of 147083 blocks got processed in 25432 msecs Although having the datanodes lose contact with the namenode because it's checking disk usage seems like quite a serious bug to me. not sure why sda is more busy, although that is where the logs are located
          Hide
          Raghu Angadi added a comment -

          Yes, looks like DU goes through previous also... So instead of both block report and DU causing problems now it is DU..

          Can you clarify if "datanodes lose contact" means NameNode actually marks them "dead"?

          > Although having the datanodes lose contact with the namenode because it's checking disk usage seems like quite a serious bug to me.

          I agree. Doing these in the background without blocking normal DataNode functions takes a little bit of restructuring. We should keep this jira open.

          > not sure why sda is more busy, although that is where the logs are located
          This might help your situation. If you find more info, please inform us.

          Show
          Raghu Angadi added a comment - Yes, looks like DU goes through previous also... So instead of both block report and DU causing problems now it is DU.. Can you clarify if "datanodes lose contact" means NameNode actually marks them "dead"? > Although having the datanodes lose contact with the namenode because it's checking disk usage seems like quite a serious bug to me. I agree. Doing these in the background without blocking normal DataNode functions takes a little bit of restructuring. We should keep this jira open. > not sure why sda is more busy, although that is where the logs are located This might help your situation. If you find more info, please inform us.
          Hide
          Johan Oskarsson added a comment -

          Created a simple patch that makes DU run the shell command in a new thread and never block on getUsage(). It does change the behavior a bit, but in most cases it shouldn't be a problem.
          I've not had a chance to test this on a real cluster yet, works in my local testing though.

          Unfortunately it turns out this only solves part of the problem. Block reports are still an issue.
          That's harder to solve though, not sure how to restructure that. Ideas?

          Show
          Johan Oskarsson added a comment - Created a simple patch that makes DU run the shell command in a new thread and never block on getUsage(). It does change the behavior a bit, but in most cases it shouldn't be a problem. I've not had a chance to test this on a real cluster yet, works in my local testing though. Unfortunately it turns out this only solves part of the problem. Block reports are still an issue. That's harder to solve though, not sure how to restructure that. Ideas?
          Hide
          Raghu Angadi added a comment -

          Regd block reports, one option is not to do the fs scan at all.

          Show
          Raghu Angadi added a comment - Regd block reports, one option is not to do the fs scan at all.
          Hide
          Doug Cutting added a comment -

          > Block reports are still an issue.

          Will incremental block reports (HADOOP-1079) resolve this?

          Show
          Doug Cutting added a comment - > Block reports are still an issue. Will incremental block reports ( HADOOP-1079 ) resolve this?
          Hide
          Raghu Angadi added a comment -

          > Will incremental block reports (HADOOP-1079) resolve this?
          If it avoids scanning the file system, yes. From the above jira it is not clear if it will avoid comparing in-memory map and on-disk blocks.

          Even as simple a change as commenting out fs scan is enough for this problem, I think.

          Show
          Raghu Angadi added a comment - > Will incremental block reports ( HADOOP-1079 ) resolve this? If it avoids scanning the file system, yes. From the above jira it is not clear if it will avoid comparing in-memory map and on-disk blocks. Even as simple a change as commenting out fs scan is enough for this problem, I think.
          Hide
          Johan Oskarsson added a comment -

          If I understand this correctly the purpose of a block report is to let the namenode know if a block mysteriously goes missing on the data node.

          Would it make sense to have this as part of the block verification mechanism described in HADOOP-2012? Is it already?
          That way the block report could always be sent from memory and doesn't have to hit the disk.

          Of course just commenting out the fs scan as Raghu suggest would also work, how big is the issue of blocks going missing?

          Show
          Johan Oskarsson added a comment - If I understand this correctly the purpose of a block report is to let the namenode know if a block mysteriously goes missing on the data node. Would it make sense to have this as part of the block verification mechanism described in HADOOP-2012 ? Is it already? That way the block report could always be sent from memory and doesn't have to hit the disk. Of course just commenting out the fs scan as Raghu suggest would also work, how big is the issue of blocks going missing?
          Hide
          Johan Oskarsson added a comment -

          Shall we fix the DU issue first, it seems easier. Could someone review my patch, please?
          Then we can create another ticket for the block report problem.

          Show
          Johan Oskarsson added a comment - Shall we fix the DU issue first, it seems easier. Could someone review my patch, please? Then we can create another ticket for the block report problem.
          Hide
          Raghu Angadi added a comment -

          regd the patch,
          > It does change the behavior a bit, but in most cases it shouldn't be a problem.

          I haven't looked at it properly yet, could you describe what the change in behavior is? Also not sure why it needs to change Shell stuff. Could the desired behavor for DF be implemented in DF class?

          Show
          Raghu Angadi added a comment - regd the patch, > It does change the behavior a bit, but in most cases it shouldn't be a problem. I haven't looked at it properly yet, could you describe what the change in behavior is? Also not sure why it needs to change Shell stuff. Could the desired behavor for DF be implemented in DF class?
          Hide
          Doug Cutting added a comment -

          > Also not sure why it needs to change Shell stuff. [ ... ]

          I think this was because Johan wanted to make DF implement Runnable, and there was a conflict, since Shell already has a method named 'run'. But this changes public APIs incompatibly, and is thus not a good approach.

          Johan, perhaps instead we could define a nested class in DF.java that extends Thread and overrides run() there, or implements Runnable, if you prefer.

          Also the default interval is dfs.blockreport.intervalMsec, which seems rather long. It should really be related to the heartbeat interval, no? Moreover, we shouldn't use a DFS parameter in a generic FS class. So default interval should be something safe, perhaps hardwired to 10 minutes or somesuch, and DFS should override that when it constructs a DF, if it needs. Does that make sense?

          And perhaps the thread should run 'df' first, then sleep, so that values are available to clients sooner?

          Finally, and most imporant, what evidence do you have that DU is in fact causing problems? It is run very infrequently, not strictly synchronized with block reports but rather triggered by heartbeats. A slow DU would thus result in a delayed heartbeat. None of the stack traces above indicate that DU is blocking other activities of the datanode.

          Show
          Doug Cutting added a comment - > Also not sure why it needs to change Shell stuff. [ ... ] I think this was because Johan wanted to make DF implement Runnable, and there was a conflict, since Shell already has a method named 'run'. But this changes public APIs incompatibly, and is thus not a good approach. Johan, perhaps instead we could define a nested class in DF.java that extends Thread and overrides run() there, or implements Runnable, if you prefer. Also the default interval is dfs.blockreport.intervalMsec, which seems rather long. It should really be related to the heartbeat interval, no? Moreover, we shouldn't use a DFS parameter in a generic FS class. So default interval should be something safe, perhaps hardwired to 10 minutes or somesuch, and DFS should override that when it constructs a DF, if it needs. Does that make sense? And perhaps the thread should run 'df' first, then sleep, so that values are available to clients sooner? Finally, and most imporant, what evidence do you have that DU is in fact causing problems? It is run very infrequently, not strictly synchronized with block reports but rather triggered by heartbeats. A slow DU would thus result in a delayed heartbeat. None of the stack traces above indicate that DU is blocking other activities of the datanode.
          Hide
          Johan Oskarsson added a comment -

          Cleaned up the patch a bit, now only changes DU.java + the test

          Show
          Johan Oskarsson added a comment - Cleaned up the patch a bit, now only changes DU.java + the test
          Hide
          Johan Oskarsson added a comment -

          Doug: You're right about the Runnable/run() bit, just as you wrote that I adapted the patch as you suggested.

          I agree about the interval, I'll change it.
          This patch is for DU, the DF returns so quickly that it shouldn't cause an issue.

          In the DU constructor the command is run once so that we get values straight away, I thought this would be better since then we know for sure there's correct values in there once the object is created.

          I'll try to recreate the situation to produce good evidence, but off the top of my head DU is used to decide what volume to write to in writeToBlock in FSDataset, so it causes problems with writing blocks if it takes too long. We've seen quite a lot of this.
          As you say it doesn't run that often, but often enough to cause us problems.

          Show
          Johan Oskarsson added a comment - Doug: You're right about the Runnable/run() bit, just as you wrote that I adapted the patch as you suggested. I agree about the interval, I'll change it. This patch is for DU, the DF returns so quickly that it shouldn't cause an issue. In the DU constructor the command is run once so that we get values straight away, I thought this would be better since then we know for sure there's correct values in there once the object is created. I'll try to recreate the situation to produce good evidence, but off the top of my head DU is used to decide what volume to write to in writeToBlock in FSDataset, so it causes problems with writing blocks if it takes too long. We've seen quite a lot of this. As you say it doesn't run that often, but often enough to cause us problems.
          Hide
          Doug Cutting added a comment -

          This addresses the first of my above concerns, but not the other three.

          Also, does that test in fact succeed? It seems to me that the thread will not have yet had a chance to update its usage count before the assertEquals(), no?

          Show
          Doug Cutting added a comment - This addresses the first of my above concerns, but not the other three. Also, does that test in fact succeed? It seems to me that the thread will not have yet had a chance to update its usage count before the assertEquals(), no?
          Hide
          Doug Cutting added a comment -

          Oops. I posted my previous comment before I saw your last comment. You're right, I was confusing DU and DF. And I missed the refresh in the ctor. Sorry! That resolves most of my concerns. Since this is called much more frequently than I was thinking, it makes sense for it not to be synchronous. I think my only remaining concern is the default interval, which you've said you'd address. Thanks!

          Show
          Doug Cutting added a comment - Oops. I posted my previous comment before I saw your last comment. You're right, I was confusing DU and DF. And I missed the refresh in the ctor. Sorry! That resolves most of my concerns. Since this is called much more frequently than I was thinking, it makes sense for it not to be synchronous. I think my only remaining concern is the default interval, which you've said you'd address. Thanks!
          Hide
          Raghu Angadi added a comment -

          It was my mistake to say 'DF' where I meant 'DU'.

          Patch looks good. Couple of comments :

          • it need not disallow interval of zero. In that case, you could just not start the thread and invoke run() as before. Since DU is a utilitiy it is used (or could be used) outside DataNode.
          • persistent thread : in normal case, since the thread works only once in a while, it could be created only when it needs to run. This would be an improvement, I don't mean it as a hard requirement for this patch. This is more inline with the prev behaviour since if getUsed() is not called, then there is no penalty.

          Are you using this (or prev) patch in your environment?

          This certainly improves DN stability with large number of blocks. We still need to keep in mind that DU has very noticeable penalty. Say it takes around 10min (as in your case)... then it implies 15% of the time DN will be extremely I/O starved. This will have very noticeable affect on I/O intensive applications.

          Show
          Raghu Angadi added a comment - It was my mistake to say 'DF' where I meant 'DU'. Patch looks good. Couple of comments : it need not disallow interval of zero. In that case, you could just not start the thread and invoke run() as before. Since DU is a utilitiy it is used (or could be used) outside DataNode. persistent thread : in normal case, since the thread works only once in a while, it could be created only when it needs to run. This would be an improvement, I don't mean it as a hard requirement for this patch. This is more inline with the prev behaviour since if getUsed() is not called, then there is no penalty. Are you using this (or prev) patch in your environment? This certainly improves DN stability with large number of blocks. We still need to keep in mind that DU has very noticeable penalty. Say it takes around 10min (as in your case)... then it implies 15% of the time DN will be extremely I/O starved. This will have very noticeable affect on I/O intensive applications.
          Hide
          Johan Oskarsson added a comment -

          Updated patch with the suggestions from Doug and Raghu.
          Interval defaults to 10min. If the incoming interval is 0 the previous behavior is used.

          Findbugs doesn't like that I start a thread in the constructor, but afaik it's the only way without adding a start method to the class and I assume you don't want to change the interface.
          This passes all the tests+checkstyle on my local machine. Also added a bunch of javadoc.

          Raghu: I am using a previous patch on our cluster yes, no problems so far.
          I'm not sure what you mean by not having a permanent thread. How would we update the value without blocking on getUsed in that case? You say it's not a hard requirement, I hope you can accept the patch anyway.

          Show
          Johan Oskarsson added a comment - Updated patch with the suggestions from Doug and Raghu. Interval defaults to 10min. If the incoming interval is 0 the previous behavior is used. Findbugs doesn't like that I start a thread in the constructor, but afaik it's the only way without adding a start method to the class and I assume you don't want to change the interface. This passes all the tests+checkstyle on my local machine. Also added a bunch of javadoc. Raghu: I am using a previous patch on our cluster yes, no problems so far. I'm not sure what you mean by not having a permanent thread. How would we update the value without blocking on getUsed in that case? You say it's not a hard requirement, I hope you can accept the patch anyway.
          Hide
          Raghu Angadi added a comment -

          > I'm not sure what you mean by not having a permanent thread. How would we update the value without blocking on getUsed in that case?

          This thread will stay idle pretty much most of the time. So we could start a thread inside getUsed() (and possibly in other accessor methods if interval has passed) and make the thread exit after running du. This is no less accurate than current implementation. Or you could schedule a periodic thread using Java 'Executor'. Even if you do keep the persistent thread, could you add comment if you agree that it need not be persistent.. we might implement that later.

          Regd the patch:

          1. 'lock' is not required. You can synchronize on DU.this.
          2. Also DURefreshThread should either be static class or not keep a ref to du.
          Show
          Raghu Angadi added a comment - > I'm not sure what you mean by not having a permanent thread. How would we update the value without blocking on getUsed in that case? This thread will stay idle pretty much most of the time. So we could start a thread inside getUsed() (and possibly in other accessor methods if interval has passed) and make the thread exit after running du. This is no less accurate than current implementation. Or you could schedule a periodic thread using Java 'Executor'. Even if you do keep the persistent thread, could you add comment if you agree that it need not be persistent.. we might implement that later. Regd the patch: 'lock' is not required. You can synchronize on DU.this. Also DURefreshThread should either be static class or not keep a ref to du.
          Hide
          Johan Oskarsson added a comment -

          1. Changed
          2. Removed the reference to DU and call DU.this.run() instead.

          Scheduling it with an Executor would still have a running thread that keeps track of the scheduling, no? ScheduledThreadPoolExecutor for example.
          The other example was starting a thread inside getUsed() then returning the old value while the DU runs? Wouldn't that cause problems where getUsed is called very infrequently? Perhaps this is never an issue?

          Anyway, I added the comment about improving this with a non permanent thread anyway, in the most common case I guess that would work fine.

          As mentioned this patch will cause one new findbugs error, starting a thread in the constructor, hard to avoid without having a new method to start it, breaking the public interface.

          Show
          Johan Oskarsson added a comment - 1. Changed 2. Removed the reference to DU and call DU.this.run() instead. Scheduling it with an Executor would still have a running thread that keeps track of the scheduling, no? ScheduledThreadPoolExecutor for example. The other example was starting a thread inside getUsed() then returning the old value while the DU runs? Wouldn't that cause problems where getUsed is called very infrequently? Perhaps this is never an issue? Anyway, I added the comment about improving this with a non permanent thread anyway, in the most common case I guess that would work fine. As mentioned this patch will cause one new findbugs error, starting a thread in the constructor, hard to avoid without having a new method to start it, breaking the public interface.
          Hide
          Raghu Angadi added a comment -

          > Scheduling it with an Executor would still have a running thread that keeps track of the scheduling, no?
          That is implementation dependent. JavaDoc does not say. There might be just one thread that handles many such tasks. We at least let the implementation of Executors to optimize it.

          DU looks like simple utility to users. If they do create these multiple times, each when ever they need, they will be surprised to see threads hanging around. In that sense, it might be better to add a "start()" method so that it is explicit to them that a thread might be started and it needs to be shutdown().

          +1 over all.

          Show
          Raghu Angadi added a comment - > Scheduling it with an Executor would still have a running thread that keeps track of the scheduling, no? That is implementation dependent. JavaDoc does not say. There might be just one thread that handles many such tasks. We at least let the implementation of Executors to optimize it. DU looks like simple utility to users. If they do create these multiple times, each when ever they need, they will be surprised to see threads hanging around. In that sense, it might be better to add a "start()" method so that it is explicit to them that a thread might be started and it needs to be shutdown(). +1 over all.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382039/du-nonblocking-v5-trunk.patch
          against trunk revision 656270.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2467/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2467/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2467/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382039/du-nonblocking-v5-trunk.patch against trunk revision 656270. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2467/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2467/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2467/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          Actually requiring start() is not so bad. Also DataNodes could invoke shutdown() when in side its shutdown. It will also remove the findbugs warning.

          Show
          Raghu Angadi added a comment - Actually requiring start() is not so bad. Also DataNodes could invoke shutdown() when in side its shutdown. It will also remove the findbugs warning.
          Hide
          Johan Oskarsson added a comment -

          Patch failed, my bad, ran all my tests with java6 so didn't catch the the @Overrides that eclipse likes to put in. It doesn't work well with java5.

          Raghu: Sure, I can add a start method if you guys want one.

          Show
          Johan Oskarsson added a comment - Patch failed, my bad, ran all my tests with java6 so didn't catch the the @Overrides that eclipse likes to put in. It doesn't work well with java5. Raghu: Sure, I can add a start method if you guys want one.
          Hide
          Johan Oskarsson added a comment -

          Updated patch with start and shutdown methods. If start isn't invoked the previous behavior of running on demand will be executed.

          I'm having issues running the unit tests on my machine on a clean trunk, TestDatanodeBlockScanner fails, so I hope this one passes all the tests.

          Show
          Johan Oskarsson added a comment - Updated patch with start and shutdown methods. If start isn't invoked the previous behavior of running on demand will be executed. I'm having issues running the unit tests on my machine on a clean trunk, TestDatanodeBlockScanner fails, so I hope this one passes all the tests.
          Hide
          Raghu Angadi added a comment -

          > TestDatanodeBlockScanner fails,..

          Please update your trunk, a fix for this was committed yesterday. If it still fails on your machine, let us know.

          Show
          Raghu Angadi added a comment - > TestDatanodeBlockScanner fails,.. Please update your trunk, a fix for this was committed yesterday. If it still fails on your machine, let us know.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382371/du-nonblocking-v6-trunk.patch
          against trunk revision 658862.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382371/du-nonblocking-v6-trunk.patch against trunk revision 658862. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2512/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          +1. Thanks for multiple iterations. If a user does not call start(), the interval given will be silently ignored.. in that sense, interval could be argument to start(). But I will commit v6.path for now.

          Show
          Raghu Angadi added a comment - +1. Thanks for multiple iterations. If a user does not call start(), the interval given will be silently ignored.. in that sense, interval could be argument to start(). But I will commit v6.path for now.
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Johan!

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Johan!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #500 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/500/ )
          Hide
          Jason added a comment -

          I propose an alternate solution for this.
          If the block information was managed by having a inotify task (in linux/solaris), and the windows equivalent which I forget, the datanode could be informed each time a file in the dfs tree is created, updated, or deleted.

          With this information being delivered, it can maintain an accurate block map with only 1 full scan of the datanode blocks, at start time.

          With this algorithm the data nodes will be able to scale to a much larger number of blocks.

          The other thing is the way the sync blocks on the FSDataset.FSVolumeSet are held totally aggravates this bug in 0.18.1.

          The jason@attributor.com address will be going away shortly, I will be switching to jason.hadoop@gmail.com in the next little bit.

          Show
          Jason added a comment - I propose an alternate solution for this. If the block information was managed by having a inotify task (in linux/solaris), and the windows equivalent which I forget, the datanode could be informed each time a file in the dfs tree is created, updated, or deleted. With this information being delivered, it can maintain an accurate block map with only 1 full scan of the datanode blocks, at start time. With this algorithm the data nodes will be able to scale to a much larger number of blocks. The other thing is the way the sync blocks on the FSDataset.FSVolumeSet are held totally aggravates this bug in 0.18.1. The jason@attributor.com address will be going away shortly, I will be switching to jason.hadoop@gmail.com in the next little bit.
          Hide
          Johan Oskarsson added a comment -

          Sounds very interesting. I suggest opening a new ticket describing the solution in more detail so the discussion can continue there instead of in this closed ticket.

          Show
          Johan Oskarsson added a comment - Sounds very interesting. I suggest opening a new ticket describing the solution in more detail so the discussion can continue there instead of in this closed ticket.

            People

            • Assignee:
              Johan Oskarsson
              Reporter:
              Johan Oskarsson
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development