Lucene - Core
  1. Lucene - Core
  2. LUCENE-435

[PATCH] BufferedIndexOutput - optimized writeBytes() method

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None

      Description

      I have created a patch that optimize writeBytes metod:

      public void writeBytes(byte[] b, int length) throws IOException {
      if (bufferPosition > 0) // flush buffer
      flush();

      if (length < BUFFER_SIZE)

      { flushBuffer(b, length); bufferStart += length; }

      else {
      int pos = 0;
      int size;
      while (pos < length) {
      if (length - pos < BUFFER_SIZE)

      { size = length - pos; }

      else

      { size = BUFFER_SIZE; }

      System.arraycopy(b, pos, buffer, 0, size);
      pos += size;
      flushBuffer(buffer, size);
      bufferStart += size;
      }
      }
      }

      Its a much more faster now. I know that for indexing this not help much, but for copying files in the IndexStore this is so big improvement. Its about 400% faster that old implementation.

      The patch was tested with 300MB data, "ant test" sucessfuly finished with no errors.

      1. BufferedIndexOutputWriteBytes.patch
        1 kB
        Lukas Zapletal
      2. fastWrite.patch
        6 kB
        Lukas Zapletal
      3. writeBytes.patch
        4 kB
        Lukas Zapletal

        Activity

        Hide
        Lukas Zapletal added a comment -

        Hold on, I will supply unified patch, I am also working on TestStore to do some testing.

        Show
        Lukas Zapletal added a comment - Hold on, I will supply unified patch, I am also working on TestStore to do some testing.
        Hide
        Lukas Zapletal added a comment -

        The patch in unified format created by:

        1. .../trunk> svn diff
        Show
        Lukas Zapletal added a comment - The patch in unified format created by: .../trunk> svn diff
        Hide
        Lukas Zapletal added a comment -

        TestStore runs fine, I will add some testing for writeBytes (since it uses writeByte). The patch is tested.

        Show
        Lukas Zapletal added a comment - TestStore runs fine, I will add some testing for writeBytes (since it uses writeByte). The patch is tested.
        Hide
        Doug Cutting added a comment -

        I voiced some concerns about this patch in:

        http://www.mail-archive.com/java-dev@lucene.apache.org/msg02055.html

        I don't think these have yet been addressed.

        Show
        Doug Cutting added a comment - I voiced some concerns about this patch in: http://www.mail-archive.com/java-dev@lucene.apache.org/msg02055.html I don't think these have yet been addressed.
        Hide
        Lukas Zapletal added a comment -

        Will fix this.

        Show
        Lukas Zapletal added a comment - Will fix this.
        Hide
        Lukas Zapletal added a comment -

        Fixed patch, TestStore also updated...

        Show
        Lukas Zapletal added a comment - Fixed patch, TestStore also updated...
        Hide
        Lukas Zapletal added a comment -

        What else is needed? Its fixed and tested.

        Show
        Lukas Zapletal added a comment - What else is needed? Its fixed and tested.
        Hide
        Yonik Seeley added a comment -

        I just breifly looked at writeBytes(), and I think there is further room for optimization.
        Specifically, Doug's suggestion: "If the new data is larger than a buffer, then the buffer should be flushed and the new data written directly, without ever copying it into the buffer."

        Even if everything were already peftect, committers need time to review it.

        Show
        Yonik Seeley added a comment - I just breifly looked at writeBytes(), and I think there is further room for optimization. Specifically, Doug's suggestion: "If the new data is larger than a buffer, then the buffer should be flushed and the new data written directly, without ever copying it into the buffer." Even if everything were already peftect, committers need time to review it.
        Hide
        Lukas Zapletal added a comment -

        Hello,

        I have done what you requested (sorry for the late delay - too busy). I also found a bug in RAMOutputStream - the implementation of flushBuffer method was not able to write any buffers longer than 2*BUFFER_LENGTH. My fast writeBytes patch now handle all various situation and uses the fastest methods to write data.

        In my opinion its not good to make BUFFER_LENGTH constant public. Consider making it private since this can lead to nontrivial "dependency" (as I have described above). Its not good to have one buffer length for input, output and RAM* objects (which should have independant buffer length at all - it has nothing to do with the caching in the abstract methods). Making it private and maybe accessible on runtime could help a litte (as I said – I use the API for some index copying and I would like to have larger buffers).

        Anyway, this is my contribution, I am looking for more reviews. The patch includes StoreTest modification which helps with testing either writeByte method or writeBytes methods. Thanks for your attention.

        Show
        Lukas Zapletal added a comment - Hello, I have done what you requested (sorry for the late delay - too busy). I also found a bug in RAMOutputStream - the implementation of flushBuffer method was not able to write any buffers longer than 2*BUFFER_LENGTH. My fast writeBytes patch now handle all various situation and uses the fastest methods to write data. In my opinion its not good to make BUFFER_LENGTH constant public. Consider making it private since this can lead to nontrivial "dependency" (as I have described above). Its not good to have one buffer length for input, output and RAM* objects (which should have independant buffer length at all - it has nothing to do with the caching in the abstract methods). Making it private and maybe accessible on runtime could help a litte (as I said – I use the API for some index copying and I would like to have larger buffers). Anyway, this is my contribution, I am looking for more reviews. The patch includes StoreTest modification which helps with testing either writeByte method or writeBytes methods. Thanks for your attention.
        Hide
        Doug Cutting added a comment -

        I just committed this.

        Show
        Doug Cutting added a comment - I just committed this.

          People

          • Assignee:
            Unassigned
            Reporter:
            Lukas Zapletal
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development