Details

      Description

      SeekerState is the staging site for a Cell while it is being assembled by the BufferedEncodedSeeker. When the parent StoreFileScanner calls getCurrentCell() it's guaranteed that the Cell is fully assembled, and we can directly return the SeekerState as a Cell rather than copying it into a KeyValue. A benchmark at the StoreFileScanner level show ~50% faster cells/sec than when copying to KeyValues (ignoring garbage collection).

        Issue Links

          Activity

          Hide
          Ted Yu added a comment -

          Where would the following methods be used ?

          +//    @Override
          +    public Cell getCurrentCell(){
          +      return current;
          +    }
          +
          +//    @Override
          +    public boolean nextCell(){
          +      return next();
          +    }
          

          For getCurrentCell(), I only found it mentioned in the javadoc of CellScannerPosition.java

          Can you tell us more about your benchmark ?

          Show
          Ted Yu added a comment - Where would the following methods be used ? + // @Override + public Cell getCurrentCell(){ + return current; + } + + // @Override + public boolean nextCell(){ + return next(); + } For getCurrentCell(), I only found it mentioned in the javadoc of CellScannerPosition.java Can you tell us more about your benchmark ?
          Hide
          Matt Corgan added a comment -

          getCurrentCell() or getCurrent() would replace KeyValueScanner.peek(). Or if we like the name peek(), then peek() would be modified to return a Cell.

          nextCell() is the same as KeyValueScanner.next().

          The separate names are because i have both on my branch for the benchmark. Will post more about the benchmark when i get it respectably working, but it basically runs through a bunch of combinations of compression, encoding, block size, and RedundantKVGenerator params to compare scan and seek performance and memory savings.

          Show
          Matt Corgan added a comment - getCurrentCell() or getCurrent() would replace KeyValueScanner.peek(). Or if we like the name peek(), then peek() would be modified to return a Cell. nextCell() is the same as KeyValueScanner.next(). The separate names are because i have both on my branch for the benchmark. Will post more about the benchmark when i get it respectably working, but it basically runs through a bunch of combinations of compression, encoding, block size, and RedundantKVGenerator params to compare scan and seek performance and memory savings.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Matt
          curentBuffer() has only value in it or it again the entire buffer array and we extract the value from that?

          Show
          ramkrishna.s.vasudevan added a comment - @Matt curentBuffer() has only value in it or it again the entire buffer array and we extract the value from that?
          Hide
          Matt Corgan added a comment -

          Stepping back for a second - after the v1 patch was posted, we changed the CellScanner method names: nextCell() -> advance() and getCurrentCell() -> current(). The v1 patch is slightly out of date.

          Then - I think currentBuffer refers to the entire HeapByteBuffer/byte[] for the current data block (~64KB by default). The advance() method will update variables like valueOffset and valueLength, and then you can see how these methods make sense:

          +    @Override
          +    public byte[] getValueArray() {
          +      return currentBuffer.array();
          +    }
          +
          +    @Override
          +    public int getValueOffset() {
          +      return currentBuffer.arrayOffset() + valueOffset;
          +    }
          +
          +    @Override
          +    public int getValueLength() {
          +      return valueLength;
          +    }
          
          Show
          Matt Corgan added a comment - Stepping back for a second - after the v1 patch was posted, we changed the CellScanner method names: nextCell() -> advance() and getCurrentCell() -> current(). The v1 patch is slightly out of date. Then - I think currentBuffer refers to the entire HeapByteBuffer/byte[] for the current data block (~64KB by default). The advance() method will update variables like valueOffset and valueLength, and then you can see how these methods make sense: + @Override + public byte [] getValueArray() { + return currentBuffer.array(); + } + + @Override + public int getValueOffset() { + return currentBuffer.arrayOffset() + valueOffset; + } + + @Override + public int getValueLength() { + return valueLength; + }
          Hide
          ramkrishna.s.vasudevan added a comment -
          Show
          ramkrishna.s.vasudevan added a comment - See HBASE-10801 .

            People

            • Assignee:
              Unassigned
              Reporter:
              Matt Corgan
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development