Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Environment:

      Operating System: other
      Platform: Other

      Description

      From java-dev:

      On Friday 09 September 2005 00:34, Doug Cutting wrote:
      > Paul Elschot wrote:
      > > I suppose one of these cases are when many terms are used in a query.
      > > Would it be easily possible to make the buffer size for a term iterator
      > > depend on the numbers of documents to be iterated?
      > > Many terms only occur in a few documents, so this could be a
      > > nice win on total buffer size for the many terms case.
      >
      > This would not be too difficult.
      >
      > Look in SegmentTermDocs.java.  The buffer may be allocated when the
      > parent's stream is first cloned, but clone() won't allocate a buffer if
      > the source hasn't had a buffer allocated yet, and nothing should perform
      > i/o directly on the parent's freqStream, so in practice a buffer should
      > not be allocated until the first read is performed on the clone.

      I tried delaying the buffer allocation in BufferedIndexInput by
      using this clone() method:

        public Object clone()

      {     BufferedIndexInput clone = (BufferedIndexInput)super.clone();     clone.buffer = null;     clone.bufferLength = 0;     clone.bufferPosition = 0;     clone.bufferStart = getFilePointer();     return clone;   }

      With this all term document iterators seem to be empty, no
      query in the test cases gives any results, for example TestDemo
      and TestBoolean2.
      As far as I can see, this delaying should work, but it doesn't and
      I have no idea why.

      End of quote from java-dev.

      Doug replied that at a glance this clone method looks good.
      Without this delayed buffer allocation, a reduced buffer size
      for TermDocs cannot be implemented easily.

      1. lucene-430.patch
        1 kB
        Michael Busch

        Activity

        Hide
        Michael Busch added a comment -

        I'm attaching a patch that is slightly different from the patch Paul submitted. In refill() it calls seekInternal(bufferStart) in case the buffer is null. The reason is that after a clone the value of bufferStart might be different from the actual file pointer. This causes some test cases to fail with the original patch because refill() reads the data to buffer from the wrong position.

        With this version all test cases pass.

        Show
        Michael Busch added a comment - I'm attaching a patch that is slightly different from the patch Paul submitted. In refill() it calls seekInternal(bufferStart) in case the buffer is null. The reason is that after a clone the value of bufferStart might be different from the actual file pointer. This causes some test cases to fail with the original patch because refill() reads the data to buffer from the wrong position. With this version all test cases pass.
        Hide
        Paul Elschot added a comment -

        Are you also going to try and make the buffer size dependent on the number of docs that contain the term?

        The current patch still uses BUFFERSIZE only.

        Show
        Paul Elschot added a comment - Are you also going to try and make the buffer size dependent on the number of docs that contain the term? The current patch still uses BUFFERSIZE only.
        Hide
        Michael Busch added a comment -

        Paul,

        you are right, the patch I submitted just avoids copying the buffer. I was thinking about adding a method setBufferSize() to BufferedIndexInput. Please see my comment about this in LUCENE-888.

        Show
        Michael Busch added a comment - Paul, you are right, the patch I submitted just avoids copying the buffer. I was thinking about adding a method setBufferSize() to BufferedIndexInput. Please see my comment about this in LUCENE-888 .
        Hide
        Michael Busch added a comment -

        OK, I committed this patch that delays the allocation of the buffer after a clone. I'll leave this issue open. After LUCENE-888 is committed I will attach a patch here that adjusts the buffer size of the freq stream in SegmentTermDocs according to the document frequency.

        Show
        Michael Busch added a comment - OK, I committed this patch that delays the allocation of the buffer after a clone. I'll leave this issue open. After LUCENE-888 is committed I will attach a patch here that adjusts the buffer size of the freq stream in SegmentTermDocs according to the document frequency.
        Hide
        Michael Busch added a comment -

        Paul, have you experimented with changing the buffer sizes and how it affects search
        performance?

        Show
        Michael Busch added a comment - Paul, have you experimented with changing the buffer sizes and how it affects search performance?
        Hide
        Paul Elschot added a comment -

        I wish I could have done that, sorry. I left this one after I failed to change to buffer size and reported it. Time flies, fortunately.

        Show
        Paul Elschot added a comment - I wish I could have done that, sorry. I left this one after I failed to change to buffer size and reported it. Time flies, fortunately.
        Hide
        Michael Busch added a comment -

        I left this one after I failed to change to buffer size and reported it.

        This change was committed a while ago, so if you have some time maybe
        you could run some performance experiments? I somehow doubt that this
        will improve performance significantly, but maybe I'm wrong.

        I'm going to unassign this for now, because I probably won't have time to
        run the performance experiments anytime soon. Maybe after I finished
        working on the other issues assigned to me I'll get back to this one.

        Show
        Michael Busch added a comment - I left this one after I failed to change to buffer size and reported it. This change was committed a while ago, so if you have some time maybe you could run some performance experiments? I somehow doubt that this will improve performance significantly, but maybe I'm wrong. I'm going to unassign this for now, because I probably won't have time to run the performance experiments anytime soon. Maybe after I finished working on the other issues assigned to me I'll get back to this one.
        Hide
        Shai Erera added a comment -

        Michael B. committed the patch long time ago and the issue remained open for benchmark results. I doubt if they'd matter now.

        Show
        Shai Erera added a comment - Michael B. committed the patch long time ago and the issue remained open for benchmark results. I doubt if they'd matter now.

          People

          • Assignee:
            Unassigned
            Reporter:
            Paul Elschot
          • Votes:
            2 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development