Lucene - Core
  1. Lucene - Core
  2. LUCENE-2453

Make Index Output Buffer Size Configurable

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.1
    • Fix Version/s: 4.2, 5.0
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Currently, the buffered index input class allows sub-classes and users thereof to specify a size for the input buffer, which by default is 1024 bytes. In practice, this option is leveraged by the simple file and compound segment index input sub-classes.

      By the same token, it would be nice if the buffered index output class could open up it's buffer size for users to configure. In particular, this would allow sub-classes thereof to align the output buffer size, which by default is 16348 bytes, to that of the underlying directory's data unit. For example, a network-based directory might want to buffer data in multiples of it's maximum transmission unit. To use an existing use-case, the file system-based directory could potentially choose to align it's output buffer size to the operating system's file block size.

      The proposed change to the buffered index output class involves defining a one-arg constructor that takes a user-defined buffer size, and a default constructor that uses the currently defined buffer size.

      1. LUCENE-2453.patch
        3 kB
        Karthick Sankarachary
      2. LUCENE-2453.patch
        3 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Simon Willnauer
          http://svn.apache.org/viewvc?view=revision&revision=1437311

          LUCENE-2453: fix javadocs

          Show
          Commit Tag Bot added a comment - [trunk commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1437311 LUCENE-2453 : fix javadocs
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Simon Willnauer
          http://svn.apache.org/viewvc?view=revision&revision=1437312

          LUCENE-2453: fix javadocs

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1437312 LUCENE-2453 : fix javadocs
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Simon Willnauer
          http://svn.apache.org/viewvc?view=revision&revision=1437304

          LUCENE-2453: Make Index Output Buffer Size Configurable

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1437304 LUCENE-2453 : Make Index Output Buffer Size Configurable
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Simon Willnauer
          http://svn.apache.org/viewvc?view=revision&revision=1437295

          LUCENE-2453: Make Index Output Buffer Size Configurable

          Show
          Commit Tag Bot added a comment - [trunk commit] Simon Willnauer http://svn.apache.org/viewvc?view=revision&revision=1437295 LUCENE-2453 : Make Index Output Buffer Size Configurable
          Hide
          Simon Willnauer added a comment -

          bringing this up-to-date.... I think this is pretty useful for downstream apps though. I plan to commit this soon too.

          Show
          Simon Willnauer added a comment - bringing this up-to-date.... I think this is pretty useful for downstream apps though. I plan to commit this soon too.
          Hide
          Uwe Schindler added a comment -

          Hi Andrzej, The patch has already been updated to incorporate the comments above. Please let me know if you need anything else.

          Karthick , you should not delete old patches from the issue, as it makes it hard to follow the issue. Just upload the new patch with same filename and JIRA will automatically gray the old one out, but its still visible.

          Show
          Uwe Schindler added a comment - Hi Andrzej, The patch has already been updated to incorporate the comments above. Please let me know if you need anything else. Karthick , you should not delete old patches from the issue, as it makes it hard to follow the issue. Just upload the new patch with same filename and JIRA will automatically gray the old one out, but its still visible.
          Hide
          Karthick Sankarachary added a comment -

          Hi Andrzej, The patch has already been updated to incorporate the comments above. Please let me know if you need anything else.

          Show
          Karthick Sankarachary added a comment - Hi Andrzej, The patch has already been updated to incorporate the comments above. Please let me know if you need anything else.
          Hide
          Andrzej Bialecki added a comment -

          Karthick, I'm interested in moving forward with this and LUCENE-2456. Could you perhaps prepare an updated patch that incorporates the comments above?

          Show
          Andrzej Bialecki added a comment - Karthick, I'm interested in moving forward with this and LUCENE-2456 . Could you perhaps prepare an updated patch that incorporates the comments above?
          Hide
          Karthick Sankarachary added a comment -

          Hi Shai,

          To answer your comments:

          • buffer can still be final (and should) since it's only initialized in the ctor

          [K] Agreed. It's not like we want to allow the size of the buffer to be changed once it has been instantiated.

          • I'd inline checkBufferSize in the ctor

          [K] Done. Again, we only need to check the buffer size one time in the ctor.

          • I think that adding the same level of control to BufferedIndexInput would be useful too?

          [K] Actually, the BufferedIndexInput already allows this level of control, and then some. In fact, I plagiarized the #checkBufferSize method from that class, where it is used twice, once in the ctor, and then again in the #setBufferSize method. In theory, we could allow the size of the BufferedIndexOutput's buffer to be reset as well, but in case the buffer is made smaller, we'll have to take care to flush some of the "older" bytes that no longer fit in the buffer. IMO, that was not worth the risk and hassle.

          I will update the patch momentarily based on the comments above, and keep you posted on the benchmark results.

          Regards,
          Karthick

          Show
          Karthick Sankarachary added a comment - Hi Shai, To answer your comments: buffer can still be final (and should) since it's only initialized in the ctor [K] Agreed. It's not like we want to allow the size of the buffer to be changed once it has been instantiated. I'd inline checkBufferSize in the ctor [K] Done. Again, we only need to check the buffer size one time in the ctor. I think that adding the same level of control to BufferedIndexInput would be useful too? [K] Actually, the BufferedIndexInput already allows this level of control, and then some. In fact, I plagiarized the #checkBufferSize method from that class, where it is used twice, once in the ctor, and then again in the #setBufferSize method. In theory, we could allow the size of the BufferedIndexOutput's buffer to be reset as well, but in case the buffer is made smaller, we'll have to take care to flush some of the "older" bytes that no longer fit in the buffer. IMO, that was not worth the risk and hassle. I will update the patch momentarily based on the comments above, and keep you posted on the benchmark results. Regards, Karthick
          Hide
          Shai Erera added a comment -

          Patch looks good ! Few comments:

          • buffer can still be final (and should) since it's only initialized in the ctor
          • I'd inline checkBufferSize in the ctor
          • I think that adding the same level of control to BufferedIndexInput would be useful too?

          In general, I think the size of the buffer (1024) is set like that because larger buffer sizes did not improve the performance. Can you perhaps run on the benchmark indexing algorithms, w/ the buffer size set to larger values and report the results? It'd be interesting to note if there are any improvements before we open up the API like that.

          Show
          Shai Erera added a comment - Patch looks good ! Few comments: buffer can still be final (and should) since it's only initialized in the ctor I'd inline checkBufferSize in the ctor I think that adding the same level of control to BufferedIndexInput would be useful too? In general, I think the size of the buffer (1024) is set like that because larger buffer sizes did not improve the performance. Can you perhaps run on the benchmark indexing algorithms, w/ the buffer size set to larger values and report the results? It'd be interesting to note if there are any improvements before we open up the API like that.

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Karthick Sankarachary
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development