Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9667

SequenceFile: Reset keys and values when syncing to a place before the header

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There seems to be a bug in the SequenceFile#sync function. Thanks to Christopher Ng for this report:

          /** Seek to the next sync mark past a given position.*/
          public synchronized void sync(long position) throws IOException {
            if (position+SYNC_SIZE >= end) {
              seek(end);
              return;
            }
      
            if (position < headerEnd) {
              // seek directly to first record
              in.seek(headerEnd);                                         <====
      should this not call seek (ie this.seek) instead?
              // note the sync marker "seen" in the header
              syncSeen = true;
              return;
            }
      

      the problem is that when you sync to the start of a compressed file, the
      noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
      triggered. When you subsequently call next() you're potentially getting
      keys from the buffer which still contains keys from the previous position
      of the file.

      Attachments

        Activity

          People

            Unassigned Unassigned
            cmccabe Colin McCabe
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: