Lucene - Core
  1. Lucene - Core
  2. LUCENE-431

RAMInputStream and RAMOutputStream without further buffering

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      From java-dev, Doug's reply of 12 Sep 2005
      on Delaying buffer allocation in BufferedIndexInput:

      Paul Elschot wrote:
      ...
      > I noticed that RAMIndexInput extends BufferedIndexInput.
      > It has all data in buffers already, so why is there another
      > layer of buffering?

      No good reason: it's historical.

      To avoid this either: (a) the BufferedIndexInput API would need to be
      modified to permit subclasses to supply the buffer; or (b)
      RAMInputStream could subclass IndexInput directly, using its own
      buffers. The latter would probably be simpler.

      End of quote.

      I made version (b) of RAMInputStream.
      Using this RAMInputStream, TestTermVectorsReader failed as the only
      failing test.

      1. ASF.LICENSE.NOT.GRANTED--RAMInputStream.java
        2 kB
        Paul Elschot
      2. lucene-431.patch
        10 kB
        Michael Busch
      3. lucene-431.zip
        4 kB
        Michael Busch

        Activity

        Paul Elschot created issue -
        Hide
        Paul Elschot added a comment -

        Created an attachment (id=16372)
        RAMInputStream subclassing IndexInput directly

        Show
        Paul Elschot added a comment - Created an attachment (id=16372) RAMInputStream subclassing IndexInput directly
        Hide
        cutting@apache.org added a comment -

        This readByte() implementation will probably be slower than
        BufferedIndexInput's. In particular, the call to buffers.elementAt() is not
        cheap. It would be faster to cache the current buffer in a field. Fastest
        would be to also cache the position in the buffer, something like the following
        (untested):

        private static final EMPTY_BUFFER = new byte[0];
        private byte[] buffer = EMPTY_BUFFER;
        private int bufferPosition;

        public byte readByte() {
        if (bufferPosition == buffer.length)

        { pointer += buffer.length; updateBuffer(); }

        return buffer[bufferPosition++];
        }

        public byte readBytes(byte[] dest, int destOffset, int len)

        { int start = pointer + bufferPosition; ... your code ... updateBuffer(); }

        public void seek(long pos)

        { pointer = (int)pos; updateBuffer(); }

        public long getFilePointer()

        { return pointer + bufferPosition; }

        private void updateBuffer() {
        if (pointer < length)

        { buffer = file.buffers.elementAt(pointer/BUFFER_SIZE)); bufferPosition = pointer%BUFFER_SIZE; }

        else

        { buffer = EMPTY_BUFFER; bufferPosition = 0; }

        }

        Show
        cutting@apache.org added a comment - This readByte() implementation will probably be slower than BufferedIndexInput's. In particular, the call to buffers.elementAt() is not cheap. It would be faster to cache the current buffer in a field. Fastest would be to also cache the position in the buffer, something like the following (untested): private static final EMPTY_BUFFER = new byte [0] ; private byte[] buffer = EMPTY_BUFFER; private int bufferPosition; public byte readByte() { if (bufferPosition == buffer.length) { pointer += buffer.length; updateBuffer(); } return buffer [bufferPosition++] ; } public byte readBytes(byte[] dest, int destOffset, int len) { int start = pointer + bufferPosition; ... your code ... updateBuffer(); } public void seek(long pos) { pointer = (int)pos; updateBuffer(); } public long getFilePointer() { return pointer + bufferPosition; } private void updateBuffer() { if (pointer < length) { buffer = file.buffers.elementAt(pointer/BUFFER_SIZE)); bufferPosition = pointer%BUFFER_SIZE; } else { buffer = EMPTY_BUFFER; bufferPosition = 0; } }
        Jeff Turner made changes -
        Field Original Value New Value
        issue.field.bugzillaimportkey 36623 12315272
        Michael Busch made changes -
        Assignee Lucene Developers [ java-dev@lucene.apache.org ] Michael Busch [ michaelbusch ]
        Hide
        Michael Busch added a comment -

        We should fix both, RAMInputStream and RAMOutputStream to subclass IndexInput and IndexOutput directly. That saves a lot of unnecessary array copies.

        I'm attaching a new patch that changes both classes. Unlike Paul's patch this one keeps the current buffer in a local variable (as Doug suggested).

        All unit tests pass including TestTermVectorsReader. The reason why this test failes in Paul's patch is that RAMInputStream does not throw an IOException in case EOF is reached.

        I did some quick tests in which I used a RAMDirectory to build an index. With this patch the test runs 170 secs, the old version takes 236 secs, which is an improvement of about 28%.

        Show
        Michael Busch added a comment - We should fix both, RAMInputStream and RAMOutputStream to subclass IndexInput and IndexOutput directly. That saves a lot of unnecessary array copies. I'm attaching a new patch that changes both classes. Unlike Paul's patch this one keeps the current buffer in a local variable (as Doug suggested). All unit tests pass including TestTermVectorsReader. The reason why this test failes in Paul's patch is that RAMInputStream does not throw an IOException in case EOF is reached. I did some quick tests in which I used a RAMDirectory to build an index. With this patch the test runs 170 secs, the old version takes 236 secs, which is an improvement of about 28%.
        Michael Busch made changes -
        Attachment lucene-431.patch [ 12354272 ]
        Michael Busch made changes -
        Lucene Fields [Patch Available]
        Summary RAMInputStream without further buffering RAMInputStream and RAMOutputStream without further buffering
        Description From java-dev, Doug's reply of 12 Sep 2005
        on Delaying buffer allocation in BufferedIndexInput:
         
        Paul Elschot wrote:
        ...
        > I noticed that RAMIndexInput extends BufferedIndexInput.
        > It has all data in buffers already, so why is there another
        > layer of buffering?
         
        No good reason: it's historical.
         
        To avoid this either: (a) the BufferedIndexInput API would need to be
        modified to permit subclasses to supply the buffer; or (b)
        RAMInputStream could subclass IndexInput directly, using its own
        buffers.  The latter would probably be simpler.
         
        End of quote.
         
        I made version (b) of RAMInputStream.
        Using this RAMInputStream, TestTermVectorsReader failed as the only
        failing test.
        From java-dev, Doug's reply of 12 Sep 2005
        on Delaying buffer allocation in BufferedIndexInput:
         
        Paul Elschot wrote:
        ...
        > I noticed that RAMIndexInput extends BufferedIndexInput.
        > It has all data in buffers already, so why is there another
        > layer of buffering?
         
        No good reason: it's historical.
         
        To avoid this either: (a) the BufferedIndexInput API would need to be
        modified to permit subclasses to supply the buffer; or (b)
        RAMInputStream could subclass IndexInput directly, using its own
        buffers. The latter would probably be simpler.
         
        End of quote.
         
        I made version (b) of RAMInputStream.
        Using this RAMInputStream, TestTermVectorsReader failed as the only
        failing test.
        Michael Busch made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        Hide
        Michael McCandless added a comment -

        Michael, I wasn't able to cleanly apply this patch on the current trunk. I get this:

        patch -p0 < lucene-431.patch
        patching file src/java/org/apache/lucene/store/RAMInputStream.java
        Hunk #2 FAILED at 21.
        1 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMInputStream.java.rej
        patching file src/java/org/apache/lucene/store/RAMOutputStream.java
        Hunk #1 FAILED at 21.
        1 out of 3 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMOutputStream.java.rej
        patching file src/test/org/apache/lucene/store/MockRAMOutputStream.java

        I'd like to test this net performance gain with LUCENE-843. I think fixing this plus doing LUCENE-843 should make indexing into a RAMDirectory quite a bit faster.

        Show
        Michael McCandless added a comment - Michael, I wasn't able to cleanly apply this patch on the current trunk. I get this: patch -p0 < lucene-431.patch patching file src/java/org/apache/lucene/store/RAMInputStream.java Hunk #2 FAILED at 21. 1 out of 2 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMInputStream.java.rej patching file src/java/org/apache/lucene/store/RAMOutputStream.java Hunk #1 FAILED at 21. 1 out of 3 hunks FAILED – saving rejects to file src/java/org/apache/lucene/store/RAMOutputStream.java.rej patching file src/test/org/apache/lucene/store/MockRAMOutputStream.java I'd like to test this net performance gain with LUCENE-843 . I think fixing this plus doing LUCENE-843 should make indexing into a RAMDirectory quite a bit faster.
        Hide
        Michael Busch added a comment -

        Mike,

        that's strange.... for me the patch applies cleanly on the current trunk. I just tried it again.

        Anyways, I'm attaching a zip containing the patched files. Now you should be able to test 843 with this one. Let me know if it doesn't work...

        • Michael
        Show
        Michael Busch added a comment - Mike, that's strange.... for me the patch applies cleanly on the current trunk. I just tried it again. Anyways, I'm attaching a zip containing the patched files. Now you should be able to test 843 with this one. Let me know if it doesn't work... Michael
        Michael Busch made changes -
        Attachment lucene-431.zip [ 12354336 ]
        Hide
        Doug Cutting added a comment -

        > I'd like to test this net performance gain with LUCENE-843.

        Yes, it would be great to see how much each improves things individually as well as combined.

        Show
        Doug Cutting added a comment - > I'd like to test this net performance gain with LUCENE-843 . Yes, it would be great to see how much each improves things individually as well as combined.
        Hide
        Michael McCandless added a comment -

        >> I'd like to test this net performance gain with LUCENE-843.
        >
        > Yes, it would be great to see how much each improves things individually as well as combined.

        Will do!

        Show
        Michael McCandless added a comment - >> I'd like to test this net performance gain with LUCENE-843 . > > Yes, it would be great to see how much each improves things individually as well as combined. Will do!
        Hide
        Michael McCandless added a comment -

        Michael, the patch problem seems to be something on my end, which I can't yet explain.

        When I take your zip (thanks!), unzip into a fresh trunk checkout, run 'svn diff', take the output to another fresh trunk checkout, and try to apply that patch, I get the same error. Somehow my version of patch (2.5.4 on Debian) cannot handle the output of 'svn diff'. Spooky!

        Show
        Michael McCandless added a comment - Michael, the patch problem seems to be something on my end, which I can't yet explain. When I take your zip (thanks!), unzip into a fresh trunk checkout, run 'svn diff', take the output to another fresh trunk checkout, and try to apply that patch, I get the same error. Somehow my version of patch (2.5.4 on Debian) cannot handle the output of 'svn diff'. Spooky!
        Hide
        Joe Shaw added a comment -

        Michael: mysterious patch failures like that are usually caused by problems with line endings. Try running dos2unix on the patch and then apply it.

        Show
        Joe Shaw added a comment - Michael: mysterious patch failures like that are usually caused by problems with line endings. Try running dos2unix on the patch and then apply it.
        Hide
        Michael McCandless added a comment -

        Thanks for the advice Alas, I had already tried that on the original patch and it gives the same error. I remain baffled!

        Show
        Michael McCandless added a comment - Thanks for the advice Alas, I had already tried that on the original patch and it gives the same error. I remain baffled!
        Hide
        Michael Busch added a comment -

        Hello Mike,

        did you get a chance to try this patch out? I'm planning to commit it soon...

        Show
        Michael Busch added a comment - Hello Mike, did you get a chance to try this patch out? I'm planning to commit it soon...
        Hide
        Michael McCandless added a comment -

        Yes, I did and it looks good. I would say commit it!

        Show
        Michael McCandless added a comment - Yes, I did and it looks good. I would say commit it!
        Hide
        Michael Busch added a comment -

        Thanks for the quick (7 mins!) response, Mike .

        I just committed it.

        Show
        Michael Busch added a comment - Thanks for the quick (7 mins!) response, Mike . I just committed it.
        Michael Busch made changes -
        Status In Progress [ 3 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Mark Thomas made changes -
        Workflow jira [ 12325277 ] Default workflow, editable Closed status [ 12564533 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12564533 ] jira [ 12585778 ]
        Steve Rowe made changes -
        Affects Version/s CVS Nightly - Specify date in submission [ 12310282 ]

          People

          • Assignee:
            Michael Busch
            Reporter:
            Paul Elschot
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development