Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1907

BlockMissingException upon concurrent read and write: reader was doing file position read while writer is doing write without hflush

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Environment:

      Run on a real cluster. Using the latest 0.23 build.

    • Hadoop Flags:
      Reviewed

      Description

      BlockMissingException is thrown under this test scenario:
      Two different processes doing concurrent file r/w: one read and the other write on the same file

      • writer keep doing file write
      • reader doing position file read from beginning of the file to the visible end of file, repeatedly

      The reader is basically doing:
      byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
      where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000;

      Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program.

      1. HDFS-1907.patch
        1 kB
        John George
      2. HDFS-1907-2.patch
        3 kB
        John George
      3. HDFS-1907-3.patch
        3 kB
        John George
      4. HDFS-1907-4.patch
        2 kB
        John George
      5. HDFS-1907-5.patch
        2 kB
        John George
      6. HDFS-1907-5.patch
        2 kB
        John George

        Issue Links

          Activity

          CW Chung created issue -
          CW Chung made changes -
          Field Original Value New Value
          Description BlockMissingException is thrown under this test scenario:
          - two different processes doing concurrent file r/w: one read and the other write on the same file
          - writer keep doing file write
          - reader doing position file read from beginning of the file to the visible end of file, repeatedly

          The call is basically:
             byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
          where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000;
          Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program.


          Error msg:
          =========================
          11/05/08 19:05:48 WARN hdfs.DFSClient: Failed to connect to /98.137.98.116:1004 for file /tmp/N/909NF for block
          BP-1632719171-98.137.98.114-1303748685682:blk_-8940328094159486414_3653:java.io.IOException: Got error for
          OP_READ_BLOCK, self=/98.137.98.125:36405, remote=/98.137.98.116:1004, for file /tmp/N/909NF, for pool
          BP-1632719171-98.137.98.114-1303748685682 block -8940328094159486414_3653
              at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:398)
              at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:631)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
              at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51)
              at TAppend.readUntilVisibleEnd(TAppend.java:441)
              at TAppend.readUntilEnd(TAppend.java:474)
              at TAppend.testReadOnly(TAppend.java:956)
              at TAppend.main(TAppend.java:1215)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
          ....
          ##### Caught Exception in testReadOnly while reading.
          java.io.IOException: Exception caught in readUntilVisibleEnd: Reader currentOffset = 0 ; totalByteRead =0 ; latest byteRead = 0 ; visibleLen= 67824640 ; byteLeftToRead = 67824640 ; bufferLen = 10240000 ; chunkNumber= 0 ; input pos = 0 ; byteToReadThisRound = 10240000 ; Filename = /tmp/N/909NF, ReadParam - CurrentPostion=0, offset=0, size=10240000


              at TAppend.readUntilVisibleEnd(TAppend.java:457)
              at TAppend.readUntilEnd(TAppend.java:474)
              at TAppend.testReadOnly(TAppend.java:956)
              at TAppend.main(TAppend.java:1215)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

          Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
          BP-1632719171-98.137.98.114-1303748685682:blk_-8940328094159486414_3653 file=/tmp/N/909NF
              at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:570)
              at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:618)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
              at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51)
              at TAppend.readUntilVisibleEnd(TAppend.java:441)
              ... 8 more
          BlockMissingException is thrown under this test scenario:
          - two different processes doing concurrent file r/w: one read and the other write on the same file
          - writer keep doing file write
          - reader doing position file read from beginning of the file to the visible end of file, repeatedly

          The call is basically:
             byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
          where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000;
          Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program.


          Error msg:
          =========================
          11/05/08 19:05:48 WARN hdfs.DFSClient: Failed to connect to /xxx.xx.xx.xxx:1004 for file /tmp/N/909NF for block
          BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653:java.io.IOException: Got error for
          OP_READ_BLOCK, self=/xxx.xx.xx.xxx:36405, remote=/xxx.xx.xx.xxx:1004, for file /tmp/N/909NF, for pool
          BP-1632719171-xxx.xx.xx.xxx-1303748685682 block -8940328094159486414_3653
              at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:398)
              at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:631)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
              at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51)
              at TAppend.readUntilVisibleEnd(TAppend.java:441)
              at TAppend.readUntilEnd(TAppend.java:474)
              at TAppend.testReadOnly(TAppend.java:956)
              at TAppend.main(TAppend.java:1215)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
          ....
          ##### Caught Exception in testReadOnly while reading.
          java.io.IOException: Exception caught in readUntilVisibleEnd: Reader currentOffset = 0 ; totalByteRead =0 ; latest byteRead = 0 ; visibleLen= 67824640 ; byteLeftToRead = 67824640 ; bufferLen = 10240000 ; chunkNumber= 0 ; input pos = 0 ; byteToReadThisRound = 10240000 ; Filename = /tmp/N/909NF, ReadParam - CurrentPostion=0, offset=0, size=10240000


              at TAppend.readUntilVisibleEnd(TAppend.java:457)
              at TAppend.readUntilEnd(TAppend.java:474)
              at TAppend.testReadOnly(TAppend.java:956)
              at TAppend.main(TAppend.java:1215)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

          Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
          BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653 file=/tmp/N/909NF
              at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:570)
              at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:618)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
              at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51)
              at TAppend.readUntilVisibleEnd(TAppend.java:441)
              ... 8 more
          Environment Run on a real cluster

          $ hadoop version
          Hadoop 0.22.0.1105090202
          Subversion git://hadoopre5.corp.sk1.yahoo.com/home/y/var/builds/thread2/workspace/Cloud-HadoopCommon-0.22-Secondary -r 3c23e43f9e262e7843e4287436429fad3224b0f7
          Compiled by hadoopqa on Mon May 9 02:13:09 PDT 2011
          From source with checksum 90b5fc469fd7a1fa0ba22db893423fed
          Run on a real cluster. Using the latest 0.23 build.
          CW Chung made changes -
          Description BlockMissingException is thrown under this test scenario:
          - two different processes doing concurrent file r/w: one read and the other write on the same file
          - writer keep doing file write
          - reader doing position file read from beginning of the file to the visible end of file, repeatedly

          The call is basically:
             byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
          where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000;
          Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program.


          Error msg:
          =========================
          11/05/08 19:05:48 WARN hdfs.DFSClient: Failed to connect to /xxx.xx.xx.xxx:1004 for file /tmp/N/909NF for block
          BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653:java.io.IOException: Got error for
          OP_READ_BLOCK, self=/xxx.xx.xx.xxx:36405, remote=/xxx.xx.xx.xxx:1004, for file /tmp/N/909NF, for pool
          BP-1632719171-xxx.xx.xx.xxx-1303748685682 block -8940328094159486414_3653
              at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:398)
              at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:631)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
              at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51)
              at TAppend.readUntilVisibleEnd(TAppend.java:441)
              at TAppend.readUntilEnd(TAppend.java:474)
              at TAppend.testReadOnly(TAppend.java:956)
              at TAppend.main(TAppend.java:1215)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
          ....
          ##### Caught Exception in testReadOnly while reading.
          java.io.IOException: Exception caught in readUntilVisibleEnd: Reader currentOffset = 0 ; totalByteRead =0 ; latest byteRead = 0 ; visibleLen= 67824640 ; byteLeftToRead = 67824640 ; bufferLen = 10240000 ; chunkNumber= 0 ; input pos = 0 ; byteToReadThisRound = 10240000 ; Filename = /tmp/N/909NF, ReadParam - CurrentPostion=0, offset=0, size=10240000


              at TAppend.readUntilVisibleEnd(TAppend.java:457)
              at TAppend.readUntilEnd(TAppend.java:474)
              at TAppend.testReadOnly(TAppend.java:956)
              at TAppend.main(TAppend.java:1215)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
              at java.lang.reflect.Method.invoke(Method.java:597)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

          Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
          BP-1632719171-xxx.xx.xx.xxx-1303748685682:blk_-8940328094159486414_3653 file=/tmp/N/909NF
              at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:570)
              at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:618)
              at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:704)
              at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:51)
              at TAppend.readUntilVisibleEnd(TAppend.java:441)
              ... 8 more
          BlockMissingException is thrown under this test scenario:
          Two different processes doing concurrent file r/w: one read and the other write on the same file
            - writer keep doing file write
            - reader doing position file read from beginning of the file to the visible end of file, repeatedly

          The reader is basically doing:
            byteRead = in.read(currentPosition, buffer, 0, byteToReadThisRound);
          where CurrentPostion=0, buffer is a byte array buffer, byteToReadThisRound = 1024*10000;

          Usually it does not fail right away. I have to read, close file, re-open the same file a few times to create the problem. I'll pose a test program to repro this problem after I've cleaned up a bit my current test program.

          Tsz Wo Nicholas Sze made changes -
          Assignee John George [ johnvijoe ]
          John George made changes -
          Attachment HDFS-1907.patch [ 12481036 ]
          John George made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          John George made changes -
          Attachment HDFS-1907-2.patch [ 12481145 ]
          John George made changes -
          Attachment HDFS-1907-3.patch [ 12481225 ]
          John George made changes -
          Attachment HDFS-1907-4.patch [ 12481269 ]
          John George made changes -
          Attachment HDFS-1907-5.patch [ 12481302 ]
          Attachment HDFS-1907-5.patch [ 12481303 ]
          Tsz Wo Nicholas Sze made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.23.0 [ 12315571 ]
          Resolution Fixed [ 1 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HDFS-2029 [ HDFS-2029 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              John George
              Reporter:
              CW Chung
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development