Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1060 Append/flush should support concurrent "tailer" use case
  3. HDFS-1058

reading from file under construction fails if it reader beats writer to DN for new block

    Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.21.0, 0.22.0
    • Fix Version/s: None
    • Component/s: datanode, hdfs-client
    • Labels:
      None

      Description

      If there is a writer and concurrent reader, the following can occur:

      • The writer allocates a new block from the NN
      • The reader calls getBlockLocations
      • Reader connects to the DN and calls getReplicaVisibleLength
      • writer still has not talked to the DN, so DN doesn't know about the block and throws an error

        Activity

        Hide
        Thanh Do added a comment -

        I think this is a design decision (stated in append design document at HDFS-265)
        Here we trade performance for consistency,

        Show
        Thanh Do added a comment - I think this is a design decision (stated in append design document at HDFS-265 ) Here we trade performance for consistency,
        Hide
        Todd Lipcon added a comment -

        I think the fix for this is simple enough.

        If a client calls getReplicaVisibleLength on a DN, and the DN replies with a ReplicaNotFoundException, then either:
        a) the client just hasn't started writing the block yet,
        or b) the client has stale block locations

        In order to separate (a) from (b) I think we can iterate through the DNs, and see if all of the DNs have the same response. If they do, and the LocatedBlocks indicated a 0 length, then I think we can safely just act the same as if they returned length 0. Stale block locations are impossible (or at least very very unlikely) since we just called getBlockLocations from the NN.

        My only question is whether we need to actually iterate through all of the DNs, or if we can just return 0 immediately on receiving ReplicaNotFoundException from the primary. I think going through all is actually important, because there may have been a concurrent pipeline recovery, in which case the old primary may have deleted the now-invalidated replica. (ie we can't distinguish between not having gotten the block yet and having gotten and already deleted the block)

        Does this sound right? I will work on a patch if it does.

        Show
        Todd Lipcon added a comment - I think the fix for this is simple enough. If a client calls getReplicaVisibleLength on a DN, and the DN replies with a ReplicaNotFoundException, then either: a) the client just hasn't started writing the block yet, or b) the client has stale block locations In order to separate (a) from (b) I think we can iterate through the DNs, and see if all of the DNs have the same response. If they do, and the LocatedBlocks indicated a 0 length, then I think we can safely just act the same as if they returned length 0. Stale block locations are impossible (or at least very very unlikely) since we just called getBlockLocations from the NN. My only question is whether we need to actually iterate through all of the DNs, or if we can just return 0 immediately on receiving ReplicaNotFoundException from the primary. I think going through all is actually important, because there may have been a concurrent pipeline recovery, in which case the old primary may have deleted the now-invalidated replica. (ie we can't distinguish between not having gotten the block yet and having gotten and already deleted the block) Does this sound right? I will work on a patch if it does.
        Todd Lipcon made changes -
        Field Original Value New Value
        Parent HDFS-1060 [ 12459815 ]
        Issue Type Bug [ 1 ] Sub-task [ 7 ]
        Hide
        Todd Lipcon added a comment -

        Error looks something like this:

        Caused by: java.io.IOException: Cannot obtain block length for LocatedBlock

        {blk_-6324096824609457750_1001; getBlockSize()=0; corrupt=false; offset=0; locs=[127.0.0.1:36194]}

        2010-03-21 16:46:49,825 DEBUG hdfs.DFSClient (DFSInputStream.java:readBlockLength(152)) - Faild to getReplicaVisibleLength from
        datanode 127.0.0.1:36194 for block blk_-6324096824609457750_1001
        org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for
        blk_-6324096824609457750_1001

        Show
        Todd Lipcon added a comment - Error looks something like this: Caused by: java.io.IOException: Cannot obtain block length for LocatedBlock {blk_-6324096824609457750_1001; getBlockSize()=0; corrupt=false; offset=0; locs=[127.0.0.1:36194]} 2010-03-21 16:46:49,825 DEBUG hdfs.DFSClient (DFSInputStream.java:readBlockLength(152)) - Faild to getReplicaVisibleLength from datanode 127.0.0.1:36194 for block blk_-6324096824609457750_1001 org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for blk_-6324096824609457750_1001
        Todd Lipcon created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:

              Development