Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-3219

Disambiguate "visible length" in the code and docs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      HDFS-2288 there are two definition of visible length, or rather we're using the same name for two things:

      1. The HDFS-265 design doc which defines it as property of the replica:

      visible length is the "number of bytes that have been acknowledged by the downstream DataNodes". It is replica (not block) specific, meaning it can be different for different replicas at a given time. In the document it is called BA (bytes acknowledged), compared to BR (bytes received).

      2. The definition in HDFS-814 and DFSClient#getVisibleLength which defines it as a property of a file:

      The visible length is the length that all datanodes in the pipeline contain at least such amount of data. Therefore, these data are visible to the readers.

      According to this definition the visible length of a file is the floor of all visible lengths of all the replicas of the last block. It's a static property set on open, eg is not updated when a writer calls hflush. Also DFSInputStream#readBlockLength returns the 1st visible length of a replica it finds, so it seems possible (though unlikely) in a failure scenario it could return a length that was longer than what all replicas had.

      This has caused confusion in a number of other jiras. We should update the design doc, java doc, perhaps rename DFSClient#getVisibleLength etc to disambiguate this.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              eli2 Eli Collins
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: