Lucene - Core
  1. Lucene - Core
  2. LUCENE-761

Clone proxStream lazily in SegmentTermPositions

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In SegmentTermPositions the proxStream should be cloned lazily, i. e. at the first time nextPosition() is called. Then the initialization costs of TermPositions are not higher anymore compared to TermDocs and thus there is no reason anymore for Scorers to use TermDocs instead of TermPositions. In fact, all Scorers should use TermPositions, because custom subclasses of existing scorers might want to access payloads, which is only possible via TermPositions. We could further merge SegmentTermDocs and SegmentTermPositions into one class and deprecate the interface TermDocs.

      I'm going to attach a patch once the payloads feature (LUCENE-755) is committed.

      1. lucene-761.patch
        1 kB
        Michael Busch

        Activity

        Hide
        Grant Ingersoll added a comment -

        Hi Michael,

        I am not sure I understand why 755 blocks this one. I would think it would be the other way around, that way we could integrate this into scoring and people could access it seamlessly w/o having to change their query code (except maybe the similarity, as I suggested, or by adding some other interface).

        -Grant

        Show
        Grant Ingersoll added a comment - Hi Michael, I am not sure I understand why 755 blocks this one. I would think it would be the other way around, that way we could integrate this into scoring and people could access it seamlessly w/o having to change their query code (except maybe the similarity, as I suggested, or by adding some other interface). -Grant
        Hide
        Michael Busch added a comment -

        Grant,

        your are absolutely right, 755 does not block this issue. The reason why I wanted to wait to submit a patch here was that 755 and this one are changing the same files. So committing this one would have prevented 755 from applying cleanly on the trunk. But since there were a couple of commits in the last days/weeks and the Payloads API is still under discussion I can as well submit a patch here now, because I have to change 755 to apply cleanly to the trunk anyway.

        Show
        Michael Busch added a comment - Grant, your are absolutely right, 755 does not block this issue. The reason why I wanted to wait to submit a patch here was that 755 and this one are changing the same files. So committing this one would have prevented 755 from applying cleanly on the trunk. But since there were a couple of commits in the last days/weeks and the Payloads API is still under discussion I can as well submit a patch here now, because I have to change 755 to apply cleanly to the trunk anyway.
        Hide
        Grant Ingersoll added a comment -

        If I understand correctly, all we need on this one is to move line 37 of SegmentTermPositions to line 55, right?

        Show
        Grant Ingersoll added a comment - If I understand correctly, all we need on this one is to move line 37 of SegmentTermPositions to line 55, right?
        Hide
        Michael Busch added a comment -

        Grant,

        you're right, it is a simple change to clone the stream lazily. And I think I will do that for now. The benefit then is, that it won't be more expensive anymore to use a SegmentTermPosition object instead of a SegmentTermDocs in scorers.

        However, there might be one drawback. SegmentTermDocs implements the method
        int read(final int[] docs, final int[] freqs)
        which is used by TermScorer for better performance. SegmentTermPositions overwrites this method and just throws a UnsupportedOperationException. This just becomes a problem if we want to make TermScorer extendable, so that subclasses can make use of payloads.... But actually I don't see much benefit in extending TermScorer over just extending Scorer for such a use case. What do you think?

        Show
        Michael Busch added a comment - Grant, you're right, it is a simple change to clone the stream lazily. And I think I will do that for now. The benefit then is, that it won't be more expensive anymore to use a SegmentTermPosition object instead of a SegmentTermDocs in scorers. However, there might be one drawback. SegmentTermDocs implements the method int read(final int[] docs, final int[] freqs) which is used by TermScorer for better performance. SegmentTermPositions overwrites this method and just throws a UnsupportedOperationException. This just becomes a problem if we want to make TermScorer extendable, so that subclasses can make use of payloads.... But actually I don't see much benefit in extending TermScorer over just extending Scorer for such a use case. What do you think?
        Hide
        Michael Busch added a comment -

        Here is the simple patch. All unit tests pass. I'll commit this soon if nobody objects...

        Show
        Michael Busch added a comment - Here is the simple patch. All unit tests pass. I'll commit this soon if nobody objects...
        Hide
        Michael Busch added a comment -

        I just committed this.

        Show
        Michael Busch added a comment - I just committed this.

          People

          • Assignee:
            Michael Busch
            Reporter:
            Michael Busch
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development