Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-508

random seeks using FSDataInputStream can become invalid such that reads return invalid data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.2
    • Fix Version/s: 0.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Some of my applications using Hadoop DFS receive wrong data after certain random seeks. After some investigation I believe (without looking at source code of java.io.BufferedInputStream) that it basically boils down to the fact that the method
      read(byte[] b, int off, int len), when called with an external buffer larger than the internal buffer, reads into the external buffer directly without using the internal buffer anymore, but without invalidating the internal buffer by setting the variable 'count' to 0 such that a subsequent seek to an offset which is closer to the 'position' of the Positioncache than the internal buffersize will put the current position into the internal buffer containing outdated data from somewhere else.

        Attachments

        1. hadoop-508.patch
          4 kB
          Milind Barve

          Activity

            People

            • Assignee:
              milindb Milind Barve
              Reporter:
              ckunz Christian Kunz
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: