Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9667

SequenceFile: Reset keys and values when syncing to a place before the header

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There seems to be a bug in the SequenceFile#sync function. Thanks to Christopher Ng for this report:

          /** Seek to the next sync mark past a given position.*/
          public synchronized void sync(long position) throws IOException {
            if (position+SYNC_SIZE >= end) {
              seek(end);
              return;
            }
      
            if (position < headerEnd) {
              // seek directly to first record
              in.seek(headerEnd);                                         <====
      should this not call seek (ie this.seek) instead?
              // note the sync marker "seen" in the header
              syncSeen = true;
              return;
            }
      

      the problem is that when you sync to the start of a compressed file, the
      noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
      triggered. When you subsequently call next() you're potentially getting
      keys from the buffer which still contains keys from the previous position
      of the file.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              cmccabe Colin P. McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated: