Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3152

Reading consistency for all readers


    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.2, 0.21.0
    • Fix Version/s: None
    • Component/s: hdfs-client
    • Labels:


      I met an exception when I would like to seek to latest size of file that another client was writing. Message is "Cannot seek after EOF". I got the seek target from previous input stream and now I trying to obtains the file incremental. It means the target over than the file size limitation.

      In my opinion, the confirmed visible file length comes from the completed blocks(NameNode) plus replied size in last DataNode of pipeline for last block.

      Here are two cases: 1. How to obtains the confirmed visible file length to all readers. 2. For each reader, how can we pick out the best DN for concrete block.

      Actually, existing code mix up those two parts. NameNode sorted block locations due to local reading(HBase or local MapReduce, random DataNode for outer reader). DFSClient obtains the first DataNode of last block. Pay attention to this point! Client may obtains the 'dirty' file length from frist DN of last block that NameNode returned. And client always uses the frist DN for each block to read file content.

      Should we split two cases?




            • Assignee:
              dennyy Denny Ye
            • Votes:
              0 Vote for this issue
              4 Start watching this issue


              • Created: