Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.2, 2.7
    • Component/s: None
    • Labels:
      None

      Description

      I've came accross the well known behaviour of query results returning -1 when asked for getSize().

      While this is ok for optimization reasons (lazy results fetching), I just discovered that the default "resultFetchSize" value in lucene queries is Integer.MAX_VALUE, so in all queries I've ever executed, all results were actually fetched before asking for getSize, so IMHO nothing prevents getSize() to return the real value instead -1

        Activity

        Hide
        Jukka Zitting added a comment -

        Merged to the 2.6 branch in revision 1484685.

        Show
        Jukka Zitting added a comment - Merged to the 2.6 branch in revision 1484685.
        Hide
        Cédric Damioli added a comment -

        Committed in trunk in revision 1465974

        Show
        Cédric Damioli added a comment - Committed in trunk in revision 1465974
        Hide
        Jukka Zitting added a comment -

        Looks good to me, +1 to committing. Let's keep it in trunk for now and only consider backporting to the maintenance branches if no problems come up.

        Show
        Jukka Zitting added a comment - Looks good to me, +1 to committing. Let's keep it in trunk for now and only consider backporting to the maintenance branches if no problems come up.
        Hide
        Cédric Damioli added a comment -

        I would be very excited to train my new commit karma with this patch
        But I'd like to get review/feedback from others, to ensure I've not broken anything !

        Or should I commit as is (all tests pass) and wait for later feedback ?

        Show
        Cédric Damioli added a comment - I would be very excited to train my new commit karma with this patch But I'd like to get review/feedback from others, to ensure I've not broken anything ! Or should I commit as is (all tests pass) and wait for later feedback ?
        Hide
        Cédric Damioli added a comment -

        I've refactored the patch. It now handle more cases.

        Actual size and total size are only set when accurate.
        Lucene results aren't considered anymore when computing size, so that size does not shrink over time as before.

        The only drawback is that when limit is set and reached, the total size cannot be estimated anymore. But IMHO it's a good thing.
        I've modified a test case for not failing in that particular case.

        Marcel, Jukka, could you have a look ?

        Show
        Cédric Damioli added a comment - I've refactored the patch. It now handle more cases. Actual size and total size are only set when accurate. Lucene results aren't considered anymore when computing size, so that size does not shrink over time as before. The only drawback is that when limit is set and reached, the total size cannot be estimated anymore. But IMHO it's a good thing. I've modified a test case for not failing in that particular case. Marcel, Jukka, could you have a look ?
        Hide
        Cédric Damioli added a comment -

        I won't have enough time today, but I'll do this later this week.

        Show
        Cédric Damioli added a comment - I won't have enough time today, but I'll do this later this week.
        Hide
        Marcel Reutegger added a comment -

        Can you please update the patch with all the changes you discussed?

        Show
        Marcel Reutegger added a comment - Can you please update the patch with all the changes you discussed?
        Hide
        Jukka Zitting added a comment -

        Nice! Looks good to me.

        Marcel, Alex, others; any comments on this?

        Show
        Jukka Zitting added a comment - Nice! Looks good to me. Marcel, Alex, others; any comments on this?
        Hide
        Cédric Damioli added a comment -

        you're obviously right with my current patch

        But if we go ahead with your proposal, what if we simply write (ignoring offset):

        numResults = -1;
        if (resultNodes.size() < maxResultSize)

        { numResults = resultNodes.size(); }

        the getTotalSize() become :

        public int getTotalSize()

        { return numResults; }

        In this case, the numResults value is either -1 or the good value. It cannot shrink over time as before.

        Show
        Cédric Damioli added a comment - you're obviously right with my current patch But if we go ahead with your proposal, what if we simply write (ignoring offset): numResults = -1; if (resultNodes.size() < maxResultSize) { numResults = resultNodes.size(); } the getTotalSize() become : public int getTotalSize() { return numResults; } In this case, the numResults value is either -1 or the good value. It cannot shrink over time as before.
        Hide
        Jukka Zitting added a comment -

        The problem is the "+ invalid" part. By definition (ignoring offset/limit and assuming all results have been fetched):

        invalid == result.getSize() - resultNodes.size()

        Thus:

        resultNodes.size() + invalid == resultNodes.size() + (result.getSize() - resultNodes.size()) == result.getSize()

        !!!

        Show
        Jukka Zitting added a comment - The problem is the "+ invalid" part. By definition (ignoring offset/limit and assuming all results have been fetched): invalid == result.getSize() - resultNodes.size() Thus: resultNodes.size() + invalid == resultNodes.size() + (result.getSize() - resultNodes.size()) == result.getSize() !!!
        Hide
        Cédric Damioli added a comment -

        Your solution is precisely what I tried to implement, setting numResults to non -1 value if there are less than N results.

        Il you simply drop the "numResults = result.getSize();" line, the underlying Lucene size is not even considered and there's no security problem anymore. WDYT ?

        Show
        Cédric Damioli added a comment - Your solution is precisely what I tried to implement, setting numResults to non -1 value if there are less than N results. Il you simply drop the "numResults = result.getSize();" line, the underlying Lucene size is not even considered and there's no security problem anymore. WDYT ?
        Hide
        Jukka Zitting added a comment -

        I'm hesitant to apply the patch as is for the following two reasons:

        • AFAICT the numResults calculation in the patch should not include the + (int) offset term. The offset is added higher up in the code when skipping first few results, but it probably shouldn't be included in the numResults count.
        • More seriously, I'm not very happy with the way the getSize() method could currently be used to circumvent read access controls. Making the method return fewer -1 results will make that issue more pressing. See the FIXME comment in getTotalSize(). Thus, I'd rather see us abandon the underlying Lucene result size entirely as a component in calculating the getSize() return value. Instead we could simply prefetch up to some N result nodes, have getSize() return the count if there are fewer than N results and -1 if more.
        Show
        Jukka Zitting added a comment - I'm hesitant to apply the patch as is for the following two reasons: AFAICT the numResults calculation in the patch should not include the + (int) offset term. The offset is added higher up in the code when skipping first few results, but it probably shouldn't be included in the numResults count. More seriously, I'm not very happy with the way the getSize() method could currently be used to circumvent read access controls. Making the method return fewer -1 results will make that issue more pressing. See the FIXME comment in getTotalSize(). Thus, I'd rather see us abandon the underlying Lucene result size entirely as a component in calculating the getSize() return value. Instead we could simply prefetch up to some N result nodes, have getSize() return the count if there are fewer than N results and -1 if more.
        Hide
        Cédric Damioli added a comment -

        As asked on the dev ML, I set this as fixed for 2.6 for discussing potential integration

        I use this patch in production since one year without problems, but I certainly don't cover all JCR areas (especially, I don't use ACL which may impact query results).

        Unit tests pass well with this patch included

        Show
        Cédric Damioli added a comment - As asked on the dev ML, I set this as fixed for 2.6 for discussing potential integration I use this patch in production since one year without problems, but I certainly don't cover all JCR areas (especially, I don't use ACL which may impact query results). Unit tests pass well with this patch included
        Hide
        Cédric Damioli added a comment -

        Attached is a proposal that aims to return the real size instead of -1 in certain cases.

        All tests pass with this patch, but unfortunately there's not many tests testing for getSize()

        It would be great if a query expert could review this small patch to see if I haven't broken anything
        Thanks

        Show
        Cédric Damioli added a comment - Attached is a proposal that aims to return the real size instead of -1 in certain cases. All tests pass with this patch, but unfortunately there's not many tests testing for getSize() It would be great if a query expert could review this small patch to see if I haven't broken anything Thanks

          People

          • Assignee:
            Cédric Damioli
            Reporter:
            Cédric Damioli
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development