Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      New test case that helps stress test the APIs to support sharding....

      1. LUCENE-3639.patch
        43 kB
        Michael McCandless
      2. LUCENE-3639.patch
        22 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch; I think it's close.

        It only tests random TermQuery, verifying the hits match across a IS(MR) and searching as shards... we can improve over time.

        I also found a minor bug in TopDocs.merge, where it sets maxScore to Float.MIN_VALUE instead of Float.NaN when there are 0 hits.

        Show
        Michael McCandless added a comment - Initial patch; I think it's close. It only tests random TermQuery, verifying the hits match across a IS(MR) and searching as shards... we can improve over time. I also found a minor bug in TopDocs.merge, where it sets maxScore to Float.MIN_VALUE instead of Float.NaN when there are 0 hits.
        Hide
        Michael McCandless added a comment -

        New patch, beefing up the test some more. I think it's ready.

        I improved SLM's age calculation to use double precision
        and to compute age by how long ago the searcher was replaced with a
        new searcher (not how long ago the searcher was first enrolled).

        I also fixed a bug TopDocs.merge when you use searchAfter with shards:
        it was incorrectly assuming that topDocs.scoreDocs.length == 0 meant
        topDocs.totalHits == 0, which is not necessarily true if you use
        searchAfter.

        Show
        Michael McCandless added a comment - New patch, beefing up the test some more. I think it's ready. I improved SLM's age calculation to use double precision and to compute age by how long ago the searcher was replaced with a new searcher (not how long ago the searcher was first enrolled). I also fixed a bug TopDocs.merge when you use searchAfter with shards: it was incorrectly assuming that topDocs.scoreDocs.length == 0 meant topDocs.totalHits == 0, which is not necessarily true if you use searchAfter.
        Hide
        Robert Muir added a comment -

        starting to look good, i like testing of the distributed stats!

        I think, looking at this test, that unfortunately that distributed scoring is still too difficult with lucene.
        Its nice how you dont do any extra rewrites or anything like that, but I don't like this:

        +    // TODO: nothing evicts from here!!!  Somehow, on searcher
        +    // expiration on remote nodes we must evict from our
        +    // local cache...?
        

        There are two problems here we should separate:
        1. the searcher should be able to get the stats, and ensure that they are available for scorers. this is separate from:
        2. the searcher doing some caching of stats to prevent network traffic.

        Currently your cache handles both 1 and 2, but i think a cache should be a cache.

        Maybe we can improve the scoring api, here is where the problem is:

          public Weight createNormalizedWeight(Query query) throws IOException {
            query = rewrite(query);
            // right here is where you want to extractTerms and get stats (from cache or remotely)
            Weight weight = query.createWeight(this); // right here, in the weights ctor, is where its going to callback to your IS to ask for those stats
        

        I don't like the fact that you need to handle this crazy state here... there has to be somethign we can do to simplify this.

        Show
        Robert Muir added a comment - starting to look good, i like testing of the distributed stats! I think, looking at this test, that unfortunately that distributed scoring is still too difficult with lucene. Its nice how you dont do any extra rewrites or anything like that, but I don't like this: + // TODO: nothing evicts from here!!! Somehow, on searcher + // expiration on remote nodes we must evict from our + // local cache...? There are two problems here we should separate: 1. the searcher should be able to get the stats, and ensure that they are available for scorers. this is separate from: 2. the searcher doing some caching of stats to prevent network traffic. Currently your cache handles both 1 and 2, but i think a cache should be a cache. Maybe we can improve the scoring api, here is where the problem is: public Weight createNormalizedWeight(Query query) throws IOException { query = rewrite(query); // right here is where you want to extractTerms and get stats (from cache or remotely) Weight weight = query.createWeight(this); // right here, in the weights ctor, is where its going to callback to your IS to ask for those stats I don't like the fact that you need to handle this crazy state here... there has to be somethign we can do to simplify this.
        Hide
        Robert Muir added a comment -

        By the way, the test is great... i actually think we should just commit it as-is.

        Otherwise, how can we improve this stuff? we need this test!

        Show
        Robert Muir added a comment - By the way, the test is great... i actually think we should just commit it as-is. Otherwise, how can we improve this stuff? we need this test!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development