Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-916

Rewrite DFSOutputStream to use a single thread with NIO

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      The DFS write pipeline code has some really hairy multi-threaded synchronization. There have been a lot of bugs produced by this (HDFS-101, HDFS-793, HDFS-915, tens of others) since it's very hard to understand the message passing, lock sharing, and interruption properties. The reason for the multiple threads is to be able to simultaneously send and receive. If instead of using multiple threads, it used nonblocking IO, I think the whole thing would be a lot less error prone.

      I think we could do this in two halves: one half is the DFSOutputStream. The other half is BlockReceiver. I opened this JIRA first as I think it's simpler (only one TCP connection to deal with, rather than an up and downstream)

      Opinions? Am I crazy? I would like to see some agreement on the idea before I spend time writing code.

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          Seems like this might be a candiate for netty (http://www.jboss.org/netty/) or MINA?

          Show
          Doug Cutting added a comment - Seems like this might be a candiate for netty ( http://www.jboss.org/netty/ ) or MINA?
          Hide
          Todd Lipcon added a comment -

          Yes, I'd probably want to use Netty. I haven't given it enough research yet to decide, but I read up a bit on it earlier this week and it seems pretty reasonable. I don't want to start a religious war between different NIO frameworks here, though Let's first answer: are people interested in seeing and reviewing a patch that is essentially a rewrite of DFSOutputStream?

          Show
          Todd Lipcon added a comment - Yes, I'd probably want to use Netty. I haven't given it enough research yet to decide, but I read up a bit on it earlier this week and it seems pretty reasonable. I don't want to start a religious war between different NIO frameworks here, though Let's first answer: are people interested in seeing and reviewing a patch that is essentially a rewrite of DFSOutputStream?
          Hide
          Karthik K added a comment -

          Agree +1 on this . Being one of the most important parts - better readability would go a long way here.

          Show
          Karthik K added a comment - Agree +1 on this . Being one of the most important parts - better readability would go a long way here.
          Hide
          stack added a comment -

          A gallant undertaking: +1. I volunteer testing and review.

          Show
          stack added a comment - A gallant undertaking: +1. I volunteer testing and review.
          Hide
          Eli Collins added a comment -

          +1

          Show
          Eli Collins added a comment - +1
          Hide
          Jeff Hammerbacher added a comment -

          +1

          Show
          Jeff Hammerbacher added a comment - +1
          Hide
          dhruba borthakur added a comment -

          In principle, I am fine with a rewrite of DFSOutputStream. The only tricky part is "stability", especially because the code has been baked for a while. Will it be possible for you to structure the code change in such a way that the old implementation continues to work while the new implementation is pluggable?

          BTW, this does not propose any changes to the wire protocol, isn't it?

          When this idea was discussed earlier, it was proposed that the DFSClient could use RPC (Avro?) to send/receive data from the datanode(s), instead of using the raw socket implementation that it currently does. This might help in reducing complexity too.

          Show
          dhruba borthakur added a comment - In principle, I am fine with a rewrite of DFSOutputStream. The only tricky part is "stability", especially because the code has been baked for a while. Will it be possible for you to structure the code change in such a way that the old implementation continues to work while the new implementation is pluggable? BTW, this does not propose any changes to the wire protocol, isn't it? When this idea was discussed earlier, it was proposed that the DFSClient could use RPC (Avro?) to send/receive data from the datanode(s), instead of using the raw socket implementation that it currently does. This might help in reducing complexity too.
          Hide
          Todd Lipcon added a comment -

          Dhruba: regarding stability, that's actually the goal of this JIRA. If performance improves it would be a nice side effect, but it's not what I'm after. Even though the current code is "baked" we continue to find bugs in it - it's grown a lot in complexity since it was originally written, and I don't think many people really understand the code paths at this point.

          BTW, this does not propose any changes to the wire protocol, isn't it?

          I think we have two options for a rewrite:

          a) Rewrite using the existing protocol and make wire compatibility an explicit goal. I think this could be done in the short term.
          b) Rewrite using Avro with an entirely new protocol. I think this is a good idea eventually, but I'm certain it's going to require a lot of back and forth work between the two code bases. Thus it will take much longer to get done.

          I think the lessons we learn with (a) could be later applied to (b), especially with regard to NIO frameworks. So, I'm hoping to implement (a) first while spending some cycles on the side working with the Avro guys to make sure the RPC implementation will fit our needs.

          Will it be possible for you to structure the code change in such a way that the old implementation continues to work while the new implementation is pluggable

          I hope so - HDFS-914 would help here. The other option is to do this as a branch. Given that I'm aiming to maintain wire compatibility, it would be possible to soak test it by switching over just a few DNs or clients in a large cluster.

          Show
          Todd Lipcon added a comment - Dhruba: regarding stability, that's actually the goal of this JIRA. If performance improves it would be a nice side effect, but it's not what I'm after. Even though the current code is "baked" we continue to find bugs in it - it's grown a lot in complexity since it was originally written, and I don't think many people really understand the code paths at this point. BTW, this does not propose any changes to the wire protocol, isn't it? I think we have two options for a rewrite: a) Rewrite using the existing protocol and make wire compatibility an explicit goal. I think this could be done in the short term. b) Rewrite using Avro with an entirely new protocol. I think this is a good idea eventually, but I'm certain it's going to require a lot of back and forth work between the two code bases. Thus it will take much longer to get done. I think the lessons we learn with (a) could be later applied to (b), especially with regard to NIO frameworks. So, I'm hoping to implement (a) first while spending some cycles on the side working with the Avro guys to make sure the RPC implementation will fit our needs. Will it be possible for you to structure the code change in such a way that the old implementation continues to work while the new implementation is pluggable I hope so - HDFS-914 would help here. The other option is to do this as a branch. Given that I'm aiming to maintain wire compatibility, it would be possible to soak test it by switching over just a few DNs or clients in a large cluster.
          Hide
          dhruba borthakur added a comment -

          I agree with your view. let's make HDFS-914 a prerequisite for this one so that the new code can be made pluggable and one can continue to use the old implementation of DFSOutputStream is he/she so desires.

          Show
          dhruba borthakur added a comment - I agree with your view. let's make HDFS-914 a prerequisite for this one so that the new code can be made pluggable and one can continue to use the old implementation of DFSOutputStream is he/she so desires.
          Hide
          Todd Lipcon added a comment -

          Linking in the refactor JIRA. Will do that one first and then come back here to work on plan A above.

          Show
          Todd Lipcon added a comment - Linking in the refactor JIRA. Will do that one first and then come back here to work on plan A above.
          Hide
          Todd Lipcon added a comment -

          Patch is up on HDFS-914 - hopefully we can move quickly on that, since it'll have to be redone if the code moves under it.

          Show
          Todd Lipcon added a comment - Patch is up on HDFS-914 - hopefully we can move quickly on that, since it'll have to be redone if the code moves under it.
          Show
          Jeff Hammerbacher added a comment - https://issues.apache.org/jira/browse/HDFS-281

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:

                Development