Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3152

Reading consistency for all readers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.2, 0.21.0
    • None
    • hdfs-client

    Description

      I met an exception when I would like to seek to latest size of file that another client was writing. Message is "Cannot seek after EOF". I got the seek target from previous input stream and now I trying to obtains the file incremental. It means the target over than the file size limitation.

      In my opinion, the confirmed visible file length comes from the completed blocks(NameNode) plus replied size in last DataNode of pipeline for last block.

      Here are two cases: 1. How to obtains the confirmed visible file length to all readers. 2. For each reader, how can we pick out the best DN for concrete block.

      Actually, existing code mix up those two parts. NameNode sorted block locations due to local reading(HBase or local MapReduce, random DataNode for outer reader). DFSClient obtains the first DataNode of last block. Pay attention to this point! Client may obtains the 'dirty' file length from frist DN of last block that NameNode returned. And client always uses the frist DN for each block to read file content.

      Should we split two cases?

      Attachments

        Activity

          People

            Unassigned Unassigned
            dennyy Denny Ye
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: