Hadoop Common
  1. Hadoop Common
  2. HADOOP-9667

SequenceFile: Reset keys and values when syncing to a place before the header

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There seems to be a bug in the SequenceFile#sync function. Thanks to Christopher Ng for this report:

          /** Seek to the next sync mark past a given position.*/
          public synchronized void sync(long position) throws IOException {
            if (position+SYNC_SIZE >= end) {
              seek(end);
              return;
            }
      
            if (position < headerEnd) {
              // seek directly to first record
              in.seek(headerEnd);                                         <====
      should this not call seek (ie this.seek) instead?
              // note the sync marker "seen" in the header
              syncSeen = true;
              return;
            }
      

      the problem is that when you sync to the start of a compressed file, the
      noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
      triggered. When you subsequently call next() you're potentially getting
      keys from the buffer which still contains keys from the previous position
      of the file.

        Activity

        Colin Patrick McCabe created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Colin Patrick McCabe
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development