Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.0.2-alpha
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Following issues in DFSInputStream are addressed in this jira:
      1. read may not retry enough in some cases cause early failure
      Assume the following call logic

       
      readWithStrategy()
        -> blockSeekTo()
        -> readBuffer()
           -> reader.doRead()
           -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
              -> blockSeekTo()
                 -> chooseDataNode()
                    -> block missing, clear deadNodes and pick the currentNode again
              seekToNewSource() return false
           readBuffer() re-throw the exception quit loop
      readWithStrategy() got the exception,  and may fail the read call before tried MaxBlockAcquireFailures.
      

      2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue.

      3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.

      1. TestDFSInputStream.java
        4 kB
        Binglin Chang
      2. HDFS-4273.patch
        26 kB
        Binglin Chang
      3. HDFS-4273-v2.patch
        26 kB
        Binglin Chang
      4. HDFS-4273.v3.patch
        26 kB
        Binglin Chang
      5. HDFS-4273.v4.patch
        27 kB
        Binglin Chang
      6. HDFS-4273.v5.patch
        27 kB
        Binglin Chang
      7. HDFS-4273.v6.patch
        27 kB
        Binglin Chang
      8. HDFS-4273.v7.patch
        29 kB
        Binglin Chang
      9. HDFS-4273.v8.patch
        27 kB
        Binglin Chang

        Issue Links

          Activity

          Binglin Chang created issue -
          Binglin Chang made changes -
          Field Original Value New Value
          Attachment TestDFSInputStream.java [ 12556103 ]
          Binglin Chang made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Binglin Chang made changes -
          Attachment HDFS-4273.patch [ 12556273 ]
          Binglin Chang made changes -
          Attachment HDFS-4273-v2.patch [ 12556369 ]
          Binglin Chang made changes -
          Affects Version/s 3.0.0 [ 12320356 ]
          Affects Version/s 2.0.3-alpha [ 12323274 ]
          Target Version/s 2.0.3-alpha [ 12323274 ]
          Binglin Chang made changes -
          Target Version/s 2.0.3-alpha [ 12323274 ] 3.0.0, 2.0.3-alpha [ 12320356, 12323274 ]
          Binglin Chang made changes -
          Affects Version/s 2.0.2-alpha [ 12322472 ]
          Affects Version/s 3.0.0 [ 12320356 ]
          Affects Version/s 2.0.3-alpha [ 12323274 ]
          Binglin Chang made changes -
          Attachment HDFS-4273.v3.patch [ 12562449 ]
          Binglin Chang made changes -
          Attachment HDFS-4273.v4.patch [ 12614872 ]
          Binglin Chang made changes -
          Attachment HDFS-4273.v5.patch [ 12615064 ]
          Binglin Chang made changes -
          Attachment HDFS-4273.v6.patch [ 12619271 ]
          Binglin Chang made changes -
          Attachment HDFS-4273.v7.patch [ 12621571 ]
          Binglin Chang made changes -
          Summary Problem in DFSInputStream read retry logic may cause early failure Fix some issue in DFSInputstream
          Binglin Chang made changes -
          Description Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}
          some issues of the logic:
          1. seekToNewSource() logic is broken because it may clear deadNodes in the middle.
          2. the variable "int retries=2" in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed?
          Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3.
          Binglin Chang made changes -
          Description Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3.
          Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.
          Binglin Chang made changes -
          Description Follow issues in DFSInputStream is address in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit.

          3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.
          Following issues in DFSInputStream are addressed in this jira:
          1. read may not retry enough in some cases cause early failure
          Assume the following call logic
          {noformat}
          readWithStrategy()
            -> blockSeekTo()
            -> readBuffer()
               -> reader.doRead()
               -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode
                  -> blockSeekTo()
                     -> chooseDataNode()
                        -> block missing, clear deadNodes and pick the currentNode again
                  seekToNewSource() return false
               readBuffer() re-throw the exception quit loop
          readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures.
          {noformat}

          2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race condition, it is cleared to 0 when it is still used by other thread. So it is possible that some read thread may never quit. Change failures to local variable solve this issue.

          3. If local datanode is added to deadNodes, it will not be removed from deadNodes if DN is back alive. We need a way to remove local datanode from deadNodes when the local datanode is become live.
          Binglin Chang made changes -
          Attachment HDFS-4273.v8.patch [ 12621932 ]
          Binglin Chang made changes -
          Link This issue is related to HDFS-6022 [ HDFS-6022 ]
          Allen Wittenauer made changes -
          Labels BB2015-05-TBR
          Masatake Iwasaki made changes -
          Labels BB2015-05-TBR
          Masatake Iwasaki made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Target Version/s 3.0.0, 2.0.3-alpha [ 12320356, 12323274 ] 2.0.3-alpha, 3.0.0 [ 12323274, 12320356 ]
          Resolution Won't Fix [ 2 ]

            People

            • Assignee:
              Binglin Chang
              Reporter:
              Binglin Chang
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development