Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1483

Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This issue changes how an IndexSearcher searches over multiple segments. The current method of searching multiple segments is to use a MultiSegmentReader and treat all of the segments as one. This causes filters and FieldCaches to be keyed to the MultiReader and makes reopen expensive. If only a few segments change, the FieldCache is still loaded for all of them.

      This patch changes things by searching each individual segment one at a time, but sharing the HitCollector used across each segment. This allows FieldCaches and Filters to be keyed on individual SegmentReaders, making reopen much cheaper. FieldCache loading over multiple segments can be much faster as well - with the old method, all unique terms for every segment is enumerated against each segment - because of the likely logarithmic change in terms per segment, this can be very wasteful. Searching individual segments avoids this cost. The term/document statistics from the multireader are used to score results for each segment.

      When sorting, its more difficult to use a single HitCollector for each sub searcher. Ordinals are not comparable across segments. To account for this, a new field sort enabled HitCollector is introduced that is able to collect and sort across segments (because of its ability to compare ordinals across segments). This TopFieldCollector class will collect the values/ordinals for a given segment, and upon moving to the next segment, translate any ordinals/values so that they can be compared against the values for the new segment. This is done lazily.

      All and all, the switch seems to provide numerous performance benefits, in both sorted and non sorted search. We were seeing a good loss on indices with lots of segments (1000?) and certain queue sizes / queries, but the latest results seem to show thats been mostly taken care of (you shouldnt be using such a large queue on such a segmented index anyway).

      • Introduces
        • MultiReaderHitCollector - a HitCollector that can collect across multiple IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders.
        • TopFieldCollector - a HitCollector that can compare values/ordinals across IndexReaders and sort on fields.
        • FieldValueHitQueue - a Priority queue that is part of the TopFieldCollector implementation.
        • FieldComparator - a new Comparator class that works across IndexReaders. Part of the TopFieldCollector implementation.
        • FieldComparatorSource - new class to allow for custom Comparators.
      • Alters
        • IndexSearcher uses a single HitCollector to collect hits against each individual SegmentReader. All the other changes stem from this
      • Deprecates
        • TopFieldDocCollector
        • FieldSortedHitQueue
      1. LUCENE-1483.patch
        6 kB
        Michael McCandless
      2. LUCENE-1483.patch
        198 kB
        Michael McCandless
      3. LUCENE-1483.patch
        199 kB
        Michael McCandless
      4. LUCENE-1483.patch
        199 kB
        Michael McCandless
      5. LUCENE-1483.patch
        199 kB
        Michael McCandless
      6. LUCENE-1483.patch
        192 kB
        Michael McCandless
      7. LUCENE-1483.patch
        206 kB
        Michael McCandless
      8. LUCENE-1483.patch
        207 kB
        Michael McCandless
      9. LUCENE-1483.patch
        218 kB
        Michael McCandless
      10. LUCENE-1483.patch
        207 kB
        Michael McCandless
      11. LUCENE-1483.patch
        202 kB
        Michael McCandless
      12. LUCENE-1483.patch
        187 kB
        Mark Miller
      13. LUCENE-1483.patch
        184 kB
        Mark Miller
      14. LUCENE-1483.patch
        183 kB
        Mark Miller
      15. LUCENE-1483.patch
        183 kB
        Mark Miller
      16. LUCENE-1483.patch
        171 kB
        Michael McCandless
      17. LUCENE-1483.patch
        171 kB
        Michael McCandless
      18. LUCENE-1483.patch
        167 kB
        Michael McCandless
      19. LUCENE-1483.patch
        167 kB
        Michael McCandless
      20. LUCENE-1483.patch
        159 kB
        Mark Miller
      21. LUCENE-1483.patch
        160 kB
        Mark Miller
      22. LUCENE-1483.patch
        168 kB
        Mark Miller
      23. LUCENE-1483.patch
        108 kB
        Mark Miller
      24. LUCENE-1483.patch
        108 kB
        Mark Miller
      25. LUCENE-1483.patch
        78 kB
        Michael McCandless
      26. LUCENE-1483.patch
        61 kB
        Mark Miller
      27. LUCENE-1483.patch
        18 kB
        Mark Miller
      28. LUCENE-1483.patch
        18 kB
        Mark Miller
      29. LUCENE-1483.patch
        18 kB
        Mark Miller
      30. LUCENE-1483.patch
        12 kB
        Mark Miller
      31. LUCENE-1483.patch
        11 kB
        Mark Miller
      32. LUCENE-1483.patch
        11 kB
        Michael McCandless
      33. LUCENE-1483.patch
        10 kB
        Mark Miller
      34. LUCENE-1483.patch
        11 kB
        Mark Miller
      35. LUCENE-1483.patch
        12 kB
        Mark Miller
      36. LUCENE-1483-backcompat.patch
        4 kB
        Michael McCandless
      37. LUCENE-1483-partial.patch
        62 kB
        Michael McCandless
      38. sortBench.py
        6 kB
        Michael McCandless
      39. sortCollate.py
        2 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          thetaphi Uwe Schindler added a comment -

          Reverted the removal of SorterTemplate, its now replaced by a better implementation, only inspired by CGLIB: LUCENE-2719

          Show
          thetaphi Uwe Schindler added a comment - Reverted the removal of SorterTemplate, its now replaced by a better implementation, only inspired by CGLIB: LUCENE-2719
          Hide
          mikemccand Michael McCandless added a comment -

          OK I just removed SorterTemplate.java

          Show
          mikemccand Michael McCandless added a comment - OK I just removed SorterTemplate.java
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          we should declare that it's for internal use by Lucene only

          +1

          Any objections?

          Not from me - the chances are slim to nil anyone is using it I bet - and if they are, they can copy it into their code base. We don't want to have to mange util classes we don't use.

          Show
          markrmiller@gmail.com Mark Miller added a comment - we should declare that it's for internal use by Lucene only +1 Any objections? Not from me - the chances are slim to nil anyone is using it I bet - and if they are, they can copy it into their code base. We don't want to have to mange util classes we don't use.
          Hide
          mikemccand Michael McCandless added a comment -

          originally borrowed to sort readers and starts at the same time - since then, we stopped doing that sort. We could drop it - though technically its a bw compat break now

          I think, going forward, whenever we adopt some utility class like this, we should declare that it's for internal use by Lucene only, and we made it public only to use within different sub-packages in Lucene. And that we reserve full future rights to break back compat – change the APIs, change semantics of the methods, remove the class entirely.

          EG I've done this with DoubleBarrelLRUCache.

          In this particular case, I'm inclined to simply make an exception to back compat, and simply remove the class, for 3.1. Any objections?

          Show
          mikemccand Michael McCandless added a comment - originally borrowed to sort readers and starts at the same time - since then, we stopped doing that sort. We could drop it - though technically its a bw compat break now I think, going forward, whenever we adopt some utility class like this, we should declare that it's for internal use by Lucene only, and we made it public only to use within different sub-packages in Lucene. And that we reserve full future rights to break back compat – change the APIs, change semantics of the methods, remove the class entirely. EG I've done this with DoubleBarrelLRUCache. In this particular case, I'm inclined to simply make an exception to back compat, and simply remove the class, for 3.1. Any objections?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Sure is - but IP teams are crazy Other stuff was likely borrowed in other cases, there just wasn't a trail like this left behind Even things that we have gotten software grants for don't have a trail thats easily followable always (or at all from the source code) - so essentially, we are being trusted in any case. But if you look at previous discussions that have come up about this type of thing, common sense won't win

          Show
          markrmiller@gmail.com Mark Miller added a comment - Sure is - but IP teams are crazy Other stuff was likely borrowed in other cases, there just wasn't a trail like this left behind Even things that we have gotten software grants for don't have a trail thats easily followable always (or at all from the source code) - so essentially, we are being trusted in any case. But if you look at previous discussions that have come up about this type of thing, common sense won't win
          Show
          rcmuir Robert Muir added a comment - but isn't this code ok to borrow? http://cglib.cvs.sourceforge.net/viewvc/cglib/cglib/src/proxy/net/sf/cglib/util/SorterTemplate.java?revision=1.2&view=markup&pathrev=MAIN
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I originally borrowed to sort readers and starts at the same time - since then, we stopped doing that sort. We could drop it - though technically its a bw compat break now

          Show
          markrmiller@gmail.com Mark Miller added a comment - I originally borrowed to sort readers and starts at the same time - since then, we stopped doing that sort. We could drop it - though technically its a bw compat break now
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          That's strange... looks like SorterTemplate isn't even used anywhere. Mike, was this an accidental commit?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - That's strange... looks like SorterTemplate isn't even used anywhere. Mike, was this an accidental commit?
          Hide
          gunnar Gunnar Wagenknecht added a comment -

          Hi,

          This issue introduced SorterTemplate.java. Right in the header it says "Borrowed from Cglib.". Unfortunately, our IP team is not able to clarify provenance on CGLib. It seems that we can't get in touch with the authors. Any help is appreciated.

          Is it possible to exclude that file from this patch or replace it with an alternate implementation?

          Show
          gunnar Gunnar Wagenknecht added a comment - Hi, This issue introduced SorterTemplate.java. Right in the header it says "Borrowed from Cglib.". Unfortunately, our IP team is not able to clarify provenance on CGLib. It seems that we can't get in touch with the authors. Any help is appreciated. Is it possible to exclude that file from this patch or replace it with an alternate implementation?
          Hide
          thetaphi Uwe Schindler added a comment -

          This will be changed as part of LUCENE-1575

          Show
          thetaphi Uwe Schindler added a comment - This will be changed as part of LUCENE-1575
          Hide
          shaie Shai Erera added a comment -

          Hi Jeremy

          This will be taken care of in 1575 by removing the IndexReader[] arg from TopFieldCollector. As a matter of fact, 1575 changes quite a bit the collector's API, so you might want to take a look there. Anyway, I've run into the same issue there and realized this arg can be safely removed from TopFieldCollector as well as FieldValueHitQueue.

          Show
          shaie Shai Erera added a comment - Hi Jeremy This will be taken care of in 1575 by removing the IndexReader[] arg from TopFieldCollector. As a matter of fact, 1575 changes quite a bit the collector's API, so you might want to take a look there. Anyway, I've run into the same issue there and realized this arg can be safely removed from TopFieldCollector as well as FieldValueHitQueue.
          Hide
          jvolkman Jeremy Volkman added a comment -

          I'm trying to create a FieldValueHitQueue outside of an IndexSearcher. One part of my code collects all results in a fashion similar to http://www.gossamer-threads.com/lists/lucene/java-user/66362#66362. At the end of my collection, I used to pass the results through a FieldSortedHitQueue of the proper size to get sorted results. The problem now is that FieldValueHitQueue takes an array of subreaders instead of one IndexReader. As far as I can tell, there's no way for me to get a proper sorted array of subreaders for an IndexReader without copying and pasting the gatherSubReaders and sortSubReaders methods from IndexSearcher. This isn't desirable, so could IndexSearcher perhaps provide some sort of getSortedSubReaders() method? Either that, or extract this functionality out into a common utility method that IndexSearcher uses.

          Show
          jvolkman Jeremy Volkman added a comment - I'm trying to create a FieldValueHitQueue outside of an IndexSearcher. One part of my code collects all results in a fashion similar to http://www.gossamer-threads.com/lists/lucene/java-user/66362#66362 . At the end of my collection, I used to pass the results through a FieldSortedHitQueue of the proper size to get sorted results. The problem now is that FieldValueHitQueue takes an array of subreaders instead of one IndexReader. As far as I can tell, there's no way for me to get a proper sorted array of subreaders for an IndexReader without copying and pasting the gatherSubReaders and sortSubReaders methods from IndexSearcher. This isn't desirable, so could IndexSearcher perhaps provide some sort of getSortedSubReaders() method? Either that, or extract this functionality out into a common utility method that IndexSearcher uses.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          OK I committed the missing deprecations.

          +1. Thanks.

          Show
          markrmiller@gmail.com Mark Miller added a comment - OK I committed the missing deprecations. +1. Thanks.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I committed the missing deprecations (Committed revision 747019).

          Show
          mikemccand Michael McCandless added a comment - OK I committed the missing deprecations (Committed revision 747019).
          Hide
          mikemccand Michael McCandless added a comment -

          Hmm – we didn't deprecate SortComparator/SortComparatorSource with this, but I think we should have? Does that sound right? If so I can work up a patch...

          Show
          mikemccand Michael McCandless added a comment - Hmm – we didn't deprecate SortComparator/SortComparatorSource with this, but I think we should have? Does that sound right? If so I can work up a patch...
          Hide
          mikemccand Michael McCandless added a comment -

          Committed revision 740021.

          Show
          mikemccand Michael McCandless added a comment - Committed revision 740021.
          Hide
          mikemccand Michael McCandless added a comment -

          javadoc: maxDoc should be numDocs in "in order of decreasing maxDoc()"

          Woops, right! I'll fix & commit. Thanks Yonik!

          Show
          mikemccand Michael McCandless added a comment - javadoc: maxDoc should be numDocs in "in order of decreasing maxDoc()" Woops, right! I'll fix & commit. Thanks Yonik!
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          +1, thanks Mike!

          javadoc: maxDoc should be numDocs in "in order of decreasing maxDoc()"

          Show
          yseeley@gmail.com Yonik Seeley added a comment - +1, thanks Mike! javadoc: maxDoc should be numDocs in "in order of decreasing maxDoc()"
          Hide
          mikemccand Michael McCandless added a comment -

          public IndexSearcher(IndexReader r, boolean sortSegments) (or docsInOrder?)

          Sounds good... I added an expert ctor that takes boolean docsInOrder (attached).

          Show
          mikemccand Michael McCandless added a comment - public IndexSearcher(IndexReader r, boolean sortSegments) (or docsInOrder?) Sounds good... I added an expert ctor that takes boolean docsInOrder (attached).
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          My initial thought was to just add it to the most expert level constructor (i.e. avoid adding flags to every constructor):
          public IndexSearcher(IndexReader r, boolean sortSegments) (or docsInOrder?)

          Show
          yseeley@gmail.com Yonik Seeley added a comment - My initial thought was to just add it to the most expert level constructor (i.e. avoid adding flags to every constructor): public IndexSearcher(IndexReader r, boolean sortSegments) (or docsInOrder?)
          Hide
          mikemccand Michael McCandless added a comment -

          It seemes cleaner to just sort or not in the constructor

          OK, I agree... I'll work out a patch.

          Show
          mikemccand Michael McCandless added a comment - It seemes cleaner to just sort or not in the constructor OK, I agree... I'll work out a patch.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          >> IndexSearcher(IndexReader, boolean sortSegments) or something

          > But isn't that too coarse? (Ie, it's only very specific kinds of queries, against very specific fields, that are affected by visiting segments in reverse-size order)?

          Actually, I think a number of applications will probably be affected (those sensitive to doc order).
          Think about anyone using SortedVIntListBuilder, or anyone with their own hit collector that short-circuits too soon based on the assumption that docs often come in index order. The worst thing is that they often will come in index order... and only break later on under certain circumstances.

          We could add a flag to methods that take a hit collector, but that would require keeping two lists of Segments (sorted and original) and corresponding bases. It seemes cleaner to just sort or not in the constructor..... people who want both behaviors can instantiate two IndexSearcher objects.

          > Actually, how would this help fix Solr's "external file" field (vs the private FieldCache approach)?

          It's unrelated, I should have made that clearer.

          > It's really like Solr needs its own "key", combined with the SegmentReader

          Yep. No worries... I think I've got this one figured out. There are numbers of options (including thread-local too), but I'm liking the sound of wrapping segment readers for general flexibility.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - >> IndexSearcher(IndexReader, boolean sortSegments) or something > But isn't that too coarse? (Ie, it's only very specific kinds of queries, against very specific fields, that are affected by visiting segments in reverse-size order)? Actually, I think a number of applications will probably be affected (those sensitive to doc order). Think about anyone using SortedVIntListBuilder, or anyone with their own hit collector that short-circuits too soon based on the assumption that docs often come in index order. The worst thing is that they often will come in index order... and only break later on under certain circumstances. We could add a flag to methods that take a hit collector, but that would require keeping two lists of Segments (sorted and original) and corresponding bases. It seemes cleaner to just sort or not in the constructor..... people who want both behaviors can instantiate two IndexSearcher objects. > Actually, how would this help fix Solr's "external file" field (vs the private FieldCache approach)? It's unrelated, I should have made that clearer. > It's really like Solr needs its own "key", combined with the SegmentReader Yep. No worries... I think I've got this one figured out. There are numbers of options (including thread-local too), but I'm liking the sound of wrapping segment readers for general flexibility.
          Hide
          mikemccand Michael McCandless added a comment -

          How about adding a API to FieldCache/ExtendedFieldCache, that makes it possible to remove all cached values for a specific field name?

          I think maybe that's too coarse – there may be situations when you want to keep both the old and the new cache around (eg, while warming is taking place, but also in cases where you don't immediately retired the old reader once a new one is reopened).

          It's really like Solr needs its own "key", combined with the SegmentReader, when retrieving entries from FieldCache, or the ability to use a private FieldCache for certain queries (which I think is a viable solution today for *ValueSource).

          Show
          mikemccand Michael McCandless added a comment - How about adding a API to FieldCache/ExtendedFieldCache, that makes it possible to remove all cached values for a specific field name? I think maybe that's too coarse – there may be situations when you want to keep both the old and the new cache around (eg, while warming is taking place, but also in cases where you don't immediately retired the old reader once a new one is reopened). It's really like Solr needs its own "key", combined with the SegmentReader, when retrieving entries from FieldCache, or the ability to use a private FieldCache for certain queries (which I think is a viable solution today for *ValueSource).
          Hide
          mikemccand Michael McCandless added a comment -

          IndexSearcher(IndexReader, boolean sortSegments) or something

          But isn't that too coarse? (Ie, it's only very specific kinds of queries, against very specific fields, that are affected by visiting segments in reverse-size order)?

          Actually, how would this help fix Solr's "external file" field (vs the private FieldCache approach)?

          Show
          mikemccand Michael McCandless added a comment - IndexSearcher(IndexReader, boolean sortSegments) or something But isn't that too coarse? (Ie, it's only very specific kinds of queries, against very specific fields, that are affected by visiting segments in reverse-size order)? Actually, how would this help fix Solr's "external file" field (vs the private FieldCache approach)?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          > > How about an option to deliver docs in sorted order (basically, skip the segment sort)
          > We could do that; would you add it to Query?

          I wouldn't want to add another field on Query...I was thinking more along the lines of adding a flag to the IndexSearcher constructor: IndexSearcher(IndexReader, boolean sortSegments) or something

          Show
          yseeley@gmail.com Yonik Seeley added a comment - > > How about an option to deliver docs in sorted order (basically, skip the segment sort) > We could do that; would you add it to Query? I wouldn't want to add another field on Query...I was thinking more along the lines of adding a flag to the IndexSearcher constructor: IndexSearcher(IndexReader, boolean sortSegments) or something
          Hide
          thetaphi Uwe Schindler added a comment -

          Some other idea:
          How about adding a API to FieldCache/ExtendedFieldCache, that makes it possible to remove all cached values for a specific field name? It would remove the stale arrays of that field from each IndexReader's cache in the WeakHashMap. So it would not make a difference if the search uses one reader or each segment in separate. This would be a move forward to LUCENE-831.

          Show
          thetaphi Uwe Schindler added a comment - Some other idea: How about adding a API to FieldCache/ExtendedFieldCache, that makes it possible to remove all cached values for a specific field name? It would remove the stale arrays of that field from each IndexReader's cache in the WeakHashMap. So it would not make a difference if the search uses one reader or each segment in separate. This would be a move forward to LUCENE-831 .
          Hide
          mikemccand Michael McCandless added a comment -

          After a reopen, I could tear apart the MultiSegmentReader and create my own MultiReader, wrapping each reader. The only real difference between the two is getVersion() and isCurrent(), right?

          That's a neat idea! But there are other differences (eg you wouldn't be able to make changes, I think? Because no IndexReader has the SegmentInfos?).

          Show
          mikemccand Michael McCandless added a comment - After a reopen, I could tear apart the MultiSegmentReader and create my own MultiReader, wrapping each reader. The only real difference between the two is getVersion() and isCurrent(), right? That's a neat idea! But there are other differences (eg you wouldn't be able to make changes, I think? Because no IndexReader has the SegmentInfos?).
          Hide
          mikemccand Michael McCandless added a comment -

          How about an option to deliver docs in sorted order (basically, skip the segment sort)

          We could do that; would you add it to Query?

          Show
          mikemccand Michael McCandless added a comment - How about an option to deliver docs in sorted order (basically, skip the segment sort) We could do that; would you add it to Query?
          Hide
          mikemccand Michael McCandless added a comment -

          You are using oal.function.ValueSource as the interface for loading values for "external file" fields, right?

          Could you maintain a separate FieldCacheImpl, newly created each time you reopen the reader, and extend *ValueSource so that they use that private FieldCache instead of the default one? Then, when a ValueSourceQuery is created, you'd have to init it with the right private FieldCache instance corresponding to your MultiSegmentReader. Would that work?

          Show
          mikemccand Michael McCandless added a comment - You are using oal.function.ValueSource as the interface for loading values for "external file" fields, right? Could you maintain a separate FieldCacheImpl, newly created each time you reopen the reader, and extend *ValueSource so that they use that private FieldCache instead of the default one? Then, when a ValueSourceQuery is created, you'd have to init it with the right private FieldCache instance corresponding to your MultiSegmentReader. Would that work?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          As far as a Solr fix... the code needs more context than the IndexReader passed in.

          After a reopen, I could tear apart the MultiSegmentReader and create my own MultiReader, wrapping each reader. The only real difference between the two is getVersion() and isCurrent(), right?

          Show
          yseeley@gmail.com Yonik Seeley added a comment - As far as a Solr fix... the code needs more context than the IndexReader passed in. After a reopen, I could tear apart the MultiSegmentReader and create my own MultiReader, wrapping each reader. The only real difference between the two is getVersion() and isCurrent(), right?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          How about an option to deliver docs in sorted order (basically, skip the segment sort)

          Show
          yseeley@gmail.com Yonik Seeley added a comment - How about an option to deliver docs in sorted order (basically, skip the segment sort)
          Hide
          mikemccand Michael McCandless added a comment -

          One immediate workaround would be to use the legacy searching (SortField.setUseLegacySearch) when sorting by an "external file" field?

          Show
          mikemccand Michael McCandless added a comment - One immediate workaround would be to use the legacy searching (SortField.setUseLegacySearch) when sorting by an "external file" field?
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          My previous comment:

          I tracked down how this patch was causing Solr failures:

          ExternalFileField in Solr maps from a uniqueKey to a float value from a separate file.
          There is a cache that is essentially keyed by (IndexReader,field) that gives back a float[].

          Any change in the index used to cause all values to be updated (cache miss because the MultiReader was a different instance). Now, since it's called segment-at-a-time, only new segments are reloaded from the file, leaving older segments with stale values.

          It's certainly in the very gray area... but perhaps Solr won't be the only one affected by this - maybe apps that implement security filters, etc?

          Yonik, why was the failure so intermittent?

          It failed for others but not for me due to a Solr bug that prevented IndexReader.reopen() from being used on Windows.
          As to why it reportedly worked for Mark when he built Lucene himself.... <shrug>... at this point perhaps testing error.

          [...]

          Lucene implicitly assumes that a FieldCache's arrays do not change for
          a given segment; this is normally safe since the arrays are derived
          from the postings in the field (which are write once).

          But it sounds like Solr changed that assumption, and the values in the
          (Solr-subclass of) FieldCache's arrays are now derived from something
          external, which is no longer write once.

          Right... it used to hold in solr because nothing really operated below the MultiReader level.
          The intention is that at the time when a new IndexReader is opened, the entire file is read.
          This patch changes that up.

          How do you plan to fix it with Solr? It seems like, since you are
          maintaining a private cache, you could forcefully evict entries from
          the cache for all SegmentReaders whenever the external file has
          changed (or a new MultiSegmentReader had been opened)?

          It's not so easy... the same segment could be associated with two different active MultiReaders (with a different set of values for each). When the scorer is created, only the SegmentReader is passed with no other context.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - My previous comment: I tracked down how this patch was causing Solr failures: ExternalFileField in Solr maps from a uniqueKey to a float value from a separate file. There is a cache that is essentially keyed by (IndexReader,field) that gives back a float[]. Any change in the index used to cause all values to be updated (cache miss because the MultiReader was a different instance). Now, since it's called segment-at-a-time, only new segments are reloaded from the file, leaving older segments with stale values. It's certainly in the very gray area... but perhaps Solr won't be the only one affected by this - maybe apps that implement security filters, etc? Yonik, why was the failure so intermittent? It failed for others but not for me due to a Solr bug that prevented IndexReader.reopen() from being used on Windows. As to why it reportedly worked for Mark when he built Lucene himself.... <shrug>... at this point perhaps testing error. [...] Lucene implicitly assumes that a FieldCache's arrays do not change for a given segment; this is normally safe since the arrays are derived from the postings in the field (which are write once). But it sounds like Solr changed that assumption, and the values in the (Solr-subclass of) FieldCache's arrays are now derived from something external, which is no longer write once. Right... it used to hold in solr because nothing really operated below the MultiReader level. The intention is that at the time when a new IndexReader is opened, the entire file is read. This patch changes that up. How do you plan to fix it with Solr? It seems like, since you are maintaining a private cache, you could forcefully evict entries from the cache for all SegmentReaders whenever the external file has changed (or a new MultiSegmentReader had been opened)? It's not so easy... the same segment could be associated with two different active MultiReaders (with a different set of values for each). When the scorer is created, only the SegmentReader is passed with no other context.
          Hide
          mikemccand Michael McCandless added a comment -

          Committed revision 738219. Thanks to everyone who helped out
          here...and many thanks to Mark for working through so many iterations
          as we explored the different approaches here!

          Show
          mikemccand Michael McCandless added a comment - Committed revision 738219. Thanks to everyone who helped out here...and many thanks to Mark for working through so many iterations as we explored the different approaches here!
          Hide
          mikemccand Michael McCandless added a comment -

          The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way.

          I think we'd have to collect to separate collectors and then merge
          (like ParallelMultiSearcher does today)?

          I think this (separate thread for the "big" segments, and one thread
          for the "long tail") would be a good approach, except I don't like
          that the performance would depend so much on the structure of the
          index. EG after you've optimized your index you'd suddenly get no
          concurrency, and presumably worse performance than when you had a few
          big segments.

          Could we instead divide the index into chunks and have each thread
          skipTo the start of its chunk? EG if the index has N docs, and you
          want to use M threads, each thread visits N/M docs. If that can work
          it should be less dependent on the index structure.

          Show
          mikemccand Michael McCandless added a comment - The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way. I think we'd have to collect to separate collectors and then merge (like ParallelMultiSearcher does today)? I think this (separate thread for the "big" segments, and one thread for the "long tail") would be a good approach, except I don't like that the performance would depend so much on the structure of the index. EG after you've optimized your index you'd suddenly get no concurrency, and presumably worse performance than when you had a few big segments. Could we instead divide the index into chunks and have each thread skipTo the start of its chunk? EG if the index has N docs, and you want to use M threads, each thread visits N/M docs. If that can work it should be less dependent on the index structure.
          Hide
          mikemccand Michael McCandless added a comment -

          since last Friday, we had no problem with the new sort implementation.

          OK, excellent. I will commit shortly!

          Show
          mikemccand Michael McCandless added a comment - since last Friday, we had no problem with the new sort implementation. OK, excellent. I will commit shortly!
          Hide
          thetaphi Uwe Schindler added a comment -

          Jason: We should open a new issue for that after this one is solved. Maybe we can create a good parallelized implementation after solving the problems with MultiReaderHitCollector (if more than one thread call setNextReader with collect calls inbetween, it would not work).

          Show
          thetaphi Uwe Schindler added a comment - Jason: We should open a new issue for that after this one is solved. Maybe we can create a good parallelized implementation after solving the problems with MultiReaderHitCollector (if more than one thread call setNextReader with collect calls inbetween, it would not work).
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Mike,
          since last Friday, we had no problem with the new sort implementation. No exceptions from Lucene or any problems with Lucene. The sorting of results was (as far as I have seen) always correct (tested was SortField.INT, SortField.STRING). The index was updated each half hour and reopened, really great performance. There were also no errors after an optimize() and reopen again on Sunday (only that it took longer than to warmup the sorting).

          Show
          thetaphi Uwe Schindler added a comment - Hi Mike, since last Friday, we had no problem with the new sort implementation. No exceptions from Lucene or any problems with Lucene. The sorting of results was (as far as I have seen) always correct (tested was SortField.INT, SortField.STRING). The index was updated each half hour and reopened, really great performance. There were also no errors after an optimize() and reopen again on Sunday (only that it took longer than to warmup the sorting).
          Hide
          jasonrutherglen Jason Rutherglen added a comment -

          Uwe: "An improvement of the current ParallelMultiSearcher
          could be, as mentioned before, to use the new sorting by doc count of
          all subreaders implemented by gatherSubReaders"

          +1 Parallelization is important as many applications are looking for
          reduced latency of queries, something parallelizing with multiple
          threads guarantees (provided there is sufficient hardware).

          Uwe: "The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way."

          True, thread locking (synchronization) definitely won't work for this.

          Show
          jasonrutherglen Jason Rutherglen added a comment - Uwe: "An improvement of the current ParallelMultiSearcher could be, as mentioned before, to use the new sorting by doc count of all subreaders implemented by gatherSubReaders" +1 Parallelization is important as many applications are looking for reduced latency of queries, something parallelizing with multiple threads guarantees (provided there is sufficient hardware). Uwe: "The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way." True, thread locking (synchronization) definitely won't work for this.
          Hide
          thetaphi Uwe Schindler added a comment -

          Good comment, you are right! I forget the remote application. Nevertheless, for local-only applications, the new IndexSearcher is preferable over a MultiSearcher with separate IndexSearchers for each index.

          An improvement of the current ParallelMultiSearcher could be, as mentioned before, to use the new sorting by doc count of all subreaders implemented by gatherSubReaders in the new IndexSearcher together with a maxThreads property. The new ParallelIndexSearcher could spawn max threads for the first (and biggest) n index readers in the list and one for the rest. The search could then done in parallel. This solves the problem of the current ParallelMultiSearcher spawning too many threads (because there is no limitation). The new one could only parallelize the search inside bigger indexes (this is just an idea). Note: This ParallelIndexSearcher would not get separate Searchables as c'tor parameter, but one IndexReader like IndexSearcher. The parallelization is only done based on all sub-readers. The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way.

          Show
          thetaphi Uwe Schindler added a comment - Good comment, you are right! I forget the remote application. Nevertheless, for local-only applications, the new IndexSearcher is preferable over a MultiSearcher with separate IndexSearchers for each index. An improvement of the current ParallelMultiSearcher could be, as mentioned before, to use the new sorting by doc count of all subreaders implemented by gatherSubReaders in the new IndexSearcher together with a maxThreads property. The new ParallelIndexSearcher could spawn max threads for the first (and biggest) n index readers in the list and one for the rest. The search could then done in parallel. This solves the problem of the current ParallelMultiSearcher spawning too many threads (because there is no limitation). The new one could only parallelize the search inside bigger indexes (this is just an idea). Note: This ParallelIndexSearcher would not get separate Searchables as c'tor parameter, but one IndexReader like IndexSearcher. The parallelization is only done based on all sub-readers. The only problem with parallelization is that the MultiReaderHitCollector must be synchronized in some way.
          Hide
          cutting Doug Cutting added a comment -

          > In principle, the MultiSearcher is obsolete now.

          It's still useful with RemoteSearchable, no? That was the original motivating case for MultiSearcher. While the RMI-based RemoteSearchable might not be industrial-strength, it does serve as a placeholder for distributed search. We'd like the Lucene APIs to be able to gracefully support distributed search, and I think that requires MultiSearcher.

          Show
          cutting Doug Cutting added a comment - > In principle, the MultiSearcher is obsolete now. It's still useful with RemoteSearchable, no? That was the original motivating case for MultiSearcher. While the RMI-based RemoteSearchable might not be industrial-strength, it does serve as a placeholder for distributed search. We'd like the Lucene APIs to be able to gracefully support distributed search, and I think that requires MultiSearcher.
          Hide
          thetaphi Uwe Schindler added a comment -

          Now, I cannot find any other improvements.

          Just a comment for discussion:
          In principle, the MultiSearcher is obsolete now. Maybe we can deprecate it and remove it in 3.0. The hint is: Use IndexSearcher with a MultiReader instead.

          Only ParallelMultiSearcher is of use now, but I am not sure how to keep it alive (maybe as a ParallelIndexSearcher inherited from IndexSearcher, that does the searches of the bigger segments in parallel. Parallel searching the smaller segments is of no use).

          Show
          thetaphi Uwe Schindler added a comment - Now, I cannot find any other improvements. Just a comment for discussion: In principle, the MultiSearcher is obsolete now. Maybe we can deprecate it and remove it in 3.0. The hint is: Use IndexSearcher with a MultiReader instead. Only ParallelMultiSearcher is of use now, but I am not sure how to keep it alive (maybe as a ParallelIndexSearcher inherited from IndexSearcher, that does the searches of the bigger segments in parallel. Parallel searching the smaller segments is of no use).
          Hide
          mikemccand Michael McCandless added a comment -

          Excellent – new patch with those fixes. Thanks for reviewing Uwe!

          Show
          mikemccand Michael McCandless added a comment - Excellent – new patch with those fixes. Thanks for reviewing Uwe!
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Mike,

          looks good. I did not yet test the backwards compat one. I forget yesterday to add one other note:
          in sortSubReaders is an unneeded System.arraycopy(). This was needed before, when you did not have the List for collecting the subreaders. Now the array is coped by List.toArray() and then again copied by System.arraycopy:

              List subReadersList = new ArrayList();
              gatherSubReaders(subReadersList, reader);
              final IndexReader[] subReaders = (IndexReader[]) subReadersList.toArray(indexReaderZeroArray);
              final int length = subReaders.length;
              sortedSubReaders = new IndexReader[length];
              System.arraycopy(subReaders, 0, sortedSubReaders, 0,
                               length);
              sortedStarts = new int[length];
          

          More efficient would be:

              List subReadersList = new ArrayList();
              gatherSubReaders(subReadersList, reader);
              sortedSubReaders = (IndexReader[]) subReadersList.toArray(indexReaderZeroArray);
              final int length = sortedSubReaders.length;
              sortedStarts = new int[length];
          

          I did not create a patch, because I do not want to mix all changes alltogether with yours, I think it is simplier to change it at your checkout.

          And: the casts to MultiSegmentReader in the TestIndexReopen are not needed anymore, as every IndexReader has getSequentialSubReaders().

          I think, it is perfect now! +1 for commit!

          Show
          thetaphi Uwe Schindler added a comment - Hi Mike, looks good. I did not yet test the backwards compat one. I forget yesterday to add one other note: in sortSubReaders is an unneeded System.arraycopy(). This was needed before, when you did not have the List for collecting the subreaders. Now the array is coped by List.toArray() and then again copied by System.arraycopy: List subReadersList = new ArrayList(); gatherSubReaders(subReadersList, reader); final IndexReader[] subReaders = (IndexReader[]) subReadersList.toArray(indexReaderZeroArray); final int length = subReaders.length; sortedSubReaders = new IndexReader[length]; System .arraycopy(subReaders, 0, sortedSubReaders, 0, length); sortedStarts = new int [length]; More efficient would be: List subReadersList = new ArrayList(); gatherSubReaders(subReadersList, reader); sortedSubReaders = (IndexReader[]) subReadersList.toArray(indexReaderZeroArray); final int length = sortedSubReaders.length; sortedStarts = new int [length]; I did not create a patch, because I do not want to mix all changes alltogether with yours, I think it is simplier to change it at your checkout. And: the casts to MultiSegmentReader in the TestIndexReopen are not needed anymore, as every IndexReader has getSequentialSubReaders(). I think, it is perfect now! +1 for commit!
          Hide
          mikemccand Michael McCandless added a comment -

          New patch. I'm attaching the actual patch plus the "back compat"
          changes patch. In order to run with these patches you should 1) run
          "ant test-tag" so it checks out the back-compat branch under "tags",
          2) apply the main patch, 3) apply the back-compat patch on your
          tags/... sources, then run "ant test test-tag".

          Changes:

          • Renamed to getSequentialSubReaders, which returns null (use me
            directly), empty array (I am a null reader), or non-empty array (I
            do have sub-readers)
          • Removed the "for testing" package private
            Multi*Reader.getSubReaders methods since they are the same as
            getSequentialSubReaders.
          Show
          mikemccand Michael McCandless added a comment - New patch. I'm attaching the actual patch plus the "back compat" changes patch. In order to run with these patches you should 1) run "ant test-tag" so it checks out the back-compat branch under "tags", 2) apply the main patch, 3) apply the back-compat patch on your tags/... sources, then run "ant test test-tag". Changes: Renamed to getSequentialSubReaders, which returns null (use me directly), empty array (I am a null reader), or non-empty array (I do have sub-readers) Removed the "for testing" package private Multi*Reader.getSubReaders methods since they are the same as getSequentialSubReaders.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I will rename to getSequentialSubReaders, with the semantics above.

          by the way: The package private method getSubReaders in MultiSegmentReader can be removed, it is not used in the test (which uses getSequentialReader).

          Yeah, but then "ant test-tag" fails to compile. But I agree, we should fix it (and I'll commit the fix to back-compat branch at the same time) and MultiReader's getSubReaders too. Probably we should plant tags along the back-compat branch and reference that tag in build.xml, this way if others have a checkout ant run "ant test-tag" they won't see compile failures if they haven't yet updated this commit. I'll do that.

          Show
          mikemccand Michael McCandless added a comment - OK I will rename to getSequentialSubReaders, with the semantics above. by the way: The package private method getSubReaders in MultiSegmentReader can be removed, it is not used in the test (which uses getSequentialReader). Yeah, but then "ant test-tag" fails to compile. But I agree, we should fix it (and I'll commit the fix to back-compat branch at the same time) and MultiReader's getSubReaders too. Probably we should plant tags along the back-compat branch and reference that tag in build.xml, this way if others have a checkout ant run "ant test-tag" they won't see compile failures if they haven't yet updated this commit. I'll do that.
          Hide
          thetaphi Uwe Schindler added a comment - - edited

          So null --> I cannot be split into sub-readers; empty array --> I am a null reader; array.length > 0 --> I do have sequential sub-readers?

          This is a good optimization. If a MultiReader would return null instead of an empty array, it wouldn't be a problem (the empty reader would be searched with no results). But returning an empty array is better in this case. So gatherSubReaders() should only check for (null) and then add the parent reader itsself to the List and in all other cases do the recursion.

          Show
          thetaphi Uwe Schindler added a comment - - edited So null --> I cannot be split into sub-readers; empty array --> I am a null reader; array.length > 0 --> I do have sequential sub-readers? This is a good optimization. If a MultiReader would return null instead of an empty array, it wouldn't be a problem (the empty reader would be searched with no results). But returning an empty array is better in this case. So gatherSubReaders() should only check for (null) and then add the parent reader itsself to the List and in all other cases do the recursion.
          Hide
          thetaphi Uwe Schindler added a comment -

          by the way: The package private method getSubReaders in MultiSegmentReader can be removed, it is not used in the test (which uses getSequentialReader).

          Show
          thetaphi Uwe Schindler added a comment - by the way: The package private method getSubReaders in MultiSegmentReader can be removed, it is not used in the test (which uses getSequentialReader).
          Hide
          thetaphi Uwe Schindler added a comment -

          You are right, I noticed that when I started to prepared a patch and wondered about failed tests The name-clash and backwards compatibility needs this. So getSequentialReaders() or getSequentialSubReaders() is ok. With the same name, a ParallelReader used in a IndexSearcher would crash the whole thing.

          Show
          thetaphi Uwe Schindler added a comment - You are right, I noticed that when I started to prepared a patch and wondered about failed tests The name-clash and backwards compatibility needs this. So getSequentialReaders() or getSequentialSubReaders() is ok. With the same name, a ParallelReader used in a IndexSearcher would crash the whole thing.
          Hide
          mikemccand Michael McCandless added a comment -

          And... if an empty list is returned I think that should mean this reader has no sub-readers, ie, it can be skipped? EG a MultiSegmentReader with 0 segments.

          So null --> I cannot be split into sub-readers; empty array --> I am a null reader; array.length > 0 --> I do have sequential sub-readers?

          Show
          mikemccand Michael McCandless added a comment - And... if an empty list is returned I think that should mean this reader has no sub-readers, ie, it can be skipped? EG a MultiSegmentReader with 0 segments. So null --> I cannot be split into sub-readers; empty array --> I am a null reader; array.length > 0 --> I do have sequential sub-readers?
          Hide
          mikemccand Michael McCandless added a comment -

          Though... I think I still like getSequentialReaders? EG, some readers might have subreaders that are not "sequential", eg ParallelReader already defines getSubReaders() but we cannot treat them sequentially.

          Show
          mikemccand Michael McCandless added a comment - Though... I think I still like getSequentialReaders? EG, some readers might have subreaders that are not "sequential", eg ParallelReader already defines getSubReaders() but we cannot treat them sequentially.
          Hide
          mikemccand Michael McCandless added a comment -

          Excellent! I like that solution. I'll make that change.

          Show
          mikemccand Michael McCandless added a comment - Excellent! I like that solution. I'll make that change.
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi again,
          I found a small problem with getSequentialReaders() and your recursion. In my opinion, the whole problem is the way, how getSequentialReaders() is defined in IndexReader.

          First the problem:
          If you create an MultiReader consisting of only one IndexReader (this can be the case if you supply 3 indexes in a user interface and the user can check them for searching. If the user only checks one, the MultiReader will contain one). The recursion in IndexSearcher.gatherSubReaders() stops, when the count is 1. If this one reader is a unoptimized index (a MultiSegmentReader), the sub-readers are not gathered.

          In my opinion, I would do it in another way:
          a) rename getSequentialReaders() into getSubReaders(). IndexReader.getSubReaders returns an empty array or null (because it has no sub readers). Multi(Segment)Reader returns the sub-readers.
          b) gatherSubReaders then looks like this:

            protected void gatherSubReaders(List allSubReaders, IndexReader r) {
              IndexReader[] subReaders = r.getSubReaders();
              if (subReaders == null || subReaders.length == 0) {
                allSubReaders.add(r);
              } else {
                for(int i=0;i<subReaders.length;i++) {
                  gatherSubReaders(allSubReaders, subReaders[i]);
                }
              }
            }
          

          In my opinion, this is the cleaner way. Returning a one-entry-array containing itsself in getSequentialReader() is not clean and can lead to endless loops if used wrongly.

          Show
          thetaphi Uwe Schindler added a comment - Hi again, I found a small problem with getSequentialReaders() and your recursion. In my opinion, the whole problem is the way, how getSequentialReaders() is defined in IndexReader. First the problem: If you create an MultiReader consisting of only one IndexReader (this can be the case if you supply 3 indexes in a user interface and the user can check them for searching. If the user only checks one, the MultiReader will contain one). The recursion in IndexSearcher.gatherSubReaders() stops, when the count is 1. If this one reader is a unoptimized index (a MultiSegmentReader), the sub-readers are not gathered. In my opinion, I would do it in another way: a) rename getSequentialReaders() into getSubReaders(). IndexReader.getSubReaders returns an empty array or null (because it has no sub readers). Multi(Segment)Reader returns the sub-readers. b) gatherSubReaders then looks like this: protected void gatherSubReaders(List allSubReaders, IndexReader r) { IndexReader[] subReaders = r.getSubReaders(); if (subReaders == null || subReaders.length == 0) { allSubReaders.add(r); } else { for ( int i=0;i<subReaders.length;i++) { gatherSubReaders(allSubReaders, subReaders[i]); } } } In my opinion, this is the cleaner way. Returning a one-entry-array containing itsself in getSequentialReader() is not clean and can lead to endless loops if used wrongly.
          Hide
          thetaphi Uwe Schindler added a comment -

          Un-deprecate TopDocs.get/setMaxScore

          Fine, now my code compiles without deprecation warnings.

          Added logic in IndexSearcher to recursively expand sub-readers, so eg a MultiReader containing several MultiSegmentReaders would visit the SegmentReaders one by one.

          This is even better, when I looked into the code, I was thinking about that, too. I have portals consisting of many separate indexes, concenated with MultiReader. With your new patch, the Searcher will use all readers downto the lowest, single segment of each index.

          By the way: No warnings or exceptions here until now.

          Show
          thetaphi Uwe Schindler added a comment - Un-deprecate TopDocs.get/setMaxScore Fine, now my code compiles without deprecation warnings. Added logic in IndexSearcher to recursively expand sub-readers, so eg a MultiReader containing several MultiSegmentReaders would visit the SegmentReaders one by one. This is even better, when I looked into the code, I was thinking about that, too. I have portals consisting of many separate indexes, concenated with MultiReader. With your new patch, the Searcher will use all readers downto the lowest, single segment of each index. By the way: No warnings or exceptions here until now.
          Hide
          mikemccand Michael McCandless added a comment -

          New patch:

          • Un-deprecate TopDocs.get/setMaxScore
          • Made IndexSearcher.sortSubReaders protected so a subclass can
            change it
          • Added logic in IndexSearcher to recursively expand sub-readers, so
            eg a MultiReader containing several MultiSegmentReaders would
            visit the SegmentReaders one by one.
          • Small javadoc fixes
          Show
          mikemccand Michael McCandless added a comment - New patch: Un-deprecate TopDocs.get/setMaxScore Made IndexSearcher.sortSubReaders protected so a subclass can change it Added logic in IndexSearcher to recursively expand sub-readers, so eg a MultiReader containing several MultiSegmentReaders would visit the SegmentReaders one by one. Small javadoc fixes
          Hide
          thetaphi Uwe Schindler added a comment -

          Hes not saying after the warm up, but that the warm up should be faster based on that.

          I meant not "looking after the initial warmup", I meant "looking for the initial warmup" . The speed difference for the initial warmup was neglectible for my index (not optimized). But I did not know, how many segments and terms the index really had. I can find this out for a better test-case here. I keep you informed.

          Show
          thetaphi Uwe Schindler added a comment - Hes not saying after the warm up, but that the warm up should be faster based on that. I meant not "looking after the initial warmup", I meant "looking for the initial warmup" . The speed difference for the initial warmup was neglectible for my index (not optimized). But I did not know, how many segments and terms the index really had. I can find this out for a better test-case here. I keep you informed.
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          I was looking after the initial warmup, but noticed no difference. Maybe the string field I used was not distinct enough. What is a good number for a noticeable speed improve (50% distinct terms?).

          Hes not saying after the warm up, but that the warm up should be faster based on that.

          Its because of this:

          The old way, if you had 5 segments with unique terms distributions of 50,000, 6000, 6000, 5, 5, then for the old way, we would try to load all 62,010 terms for every segment - 62,010 x 5 -310,050.

          With the new way, we load 50,000 terms for the first, 6000 for the next, then 6000, then 5 and 5: total of 62,010.

          Even though most of the 62,010 wont be found in the 5 term segment, it still takes a long time to check them all. So the more unique terms and the more segments, the worse the problem got.

          edit
          little fix on those numbers

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited I was looking after the initial warmup, but noticed no difference. Maybe the string field I used was not distinct enough. What is a good number for a noticeable speed improve (50% distinct terms?). Hes not saying after the warm up, but that the warm up should be faster based on that. Its because of this: The old way, if you had 5 segments with unique terms distributions of 50,000, 6000, 6000, 5, 5, then for the old way, we would try to load all 62,010 terms for every segment - 62,010 x 5 -310,050. With the new way, we load 50,000 terms for the first, 6000 for the next, then 6000, then 5 and 5: total of 62,010. Even though most of the 62,010 wont be found in the 5 term segment, it still takes a long time to check them all. So the more unique terms and the more segments, the worse the problem got. edit little fix on those numbers
          Hide
          thetaphi Uwe Schindler added a comment -

          One test failed with the following exception:

          Could you try removing your contrib/benchmark/work directory and see
          if the test then passes? I've seen similar exceptions in the past where
          somehow the reuters docs weren't properly unpacked.

          Test passes now, sorry for the false alarm.

          Speedup on warming should be even better when you sort by textual fields that have a highish number of unique terms.

          I was looking after the initial warmup, but noticed no difference. Maybe the string field I used was not distinct enough. What is a good number for a noticeable speed improve (50% distinct terms?).

          During reviewing my TrieRangeQuery/-Filter code, I noticed, that TrieRangeFilter.getDocIdSet() is executed separately for each segment now (this is clear to me). The test case should also return the number of terms visited during the split of the trie range on stdout. As each segment was searched separate, the statistics about terms was incorrect (because the range statistics were reset on each getDocIdSet() call). I changed the test case to optimize the index before testing the TrieRangeQuery to have only one segment, to show the real trie range split stats.

          Show
          thetaphi Uwe Schindler added a comment - One test failed with the following exception: Could you try removing your contrib/benchmark/work directory and see if the test then passes? I've seen similar exceptions in the past where somehow the reuters docs weren't properly unpacked. Test passes now, sorry for the false alarm. Speedup on warming should be even better when you sort by textual fields that have a highish number of unique terms. I was looking after the initial warmup, but noticed no difference. Maybe the string field I used was not distinct enough. What is a good number for a noticeable speed improve (50% distinct terms?). During reviewing my TrieRangeQuery/-Filter code, I noticed, that TrieRangeFilter.getDocIdSet() is executed separately for each segment now (this is clear to me). The test case should also return the number of terms visited during the split of the trie range on stdout. As each segment was searched separate, the statistics about terms was incorrect (because the range statistics were reset on each getDocIdSet() call). I changed the test case to optimize the index before testing the TrieRangeQuery to have only one segment, to show the real trie range split stats.
          Hide
          mikemccand Michael McCandless added a comment -

          One test failed with the following exception:

          Could you try removing your contrib/benchmark/work directory and see
          if the test then passes? I've seen similar exceptions in the past where
          somehow the reuters docs weren't properly unpacked.

          In my test environment I got no errors/exception etc. After tests, I set it online on my productive system (I will look into the error logs of the webserver for exceptions).

          That's good to hear! Please report back if you get excetpions in the
          production system... (and thanks for that vote of confidence!).

          In the past, the warmup for filling the field cache was about 3 seconds - now it shows up without any recognizable time. So no warmup after reopen is needed anymore.

          Excellent! This was the primary (original) goal here...

          Speedup on warming should be even better when you sort by textual
          fields that have a highish number of unique terms.

          OK, for relevance/score sorted TopDocs, this is no problem, as the maximum score is in ScoreDoc[0], but for docs sorted by fields (which extends TopFieldDoc), you need the max score. If you have generic search code that does not distinguish between TopDocs and TopFieldDocs, the generic code is to divide by TopDocs.getMaxScore().

          OK I agree we should un-deprecate this and keep it; I'll do that.

          Thanks for reviewing this Uwe!

          Show
          mikemccand Michael McCandless added a comment - One test failed with the following exception: Could you try removing your contrib/benchmark/work directory and see if the test then passes? I've seen similar exceptions in the past where somehow the reuters docs weren't properly unpacked. In my test environment I got no errors/exception etc. After tests, I set it online on my productive system (I will look into the error logs of the webserver for exceptions). That's good to hear! Please report back if you get excetpions in the production system... (and thanks for that vote of confidence!). In the past, the warmup for filling the field cache was about 3 seconds - now it shows up without any recognizable time. So no warmup after reopen is needed anymore. Excellent! This was the primary (original) goal here... Speedup on warming should be even better when you sort by textual fields that have a highish number of unique terms. OK, for relevance/score sorted TopDocs, this is no problem, as the maximum score is in ScoreDoc [0] , but for docs sorted by fields (which extends TopFieldDoc), you need the max score. If you have generic search code that does not distinguish between TopDocs and TopFieldDocs, the generic code is to divide by TopDocs.getMaxScore(). OK I agree we should un-deprecate this and keep it; I'll do that. Thanks for reviewing this Uwe!
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Michael & Mike,
          great work. I patched my tree and compiled, no problems. One test failed with the following exception:

          <testcase classname="org.apache.lucene.benchmark.quality.TestQualityRun" name="testTrecQuality" time="64.656">
            <error type="java.lang.NullPointerException">java.lang.NullPointerException
          	at org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker.getNextDocData(ReutersDocMaker.java:112)
          	at org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker.makeDocument(BasicDocMaker.java:98)
          	at org.apache.lucene.benchmark.byTask.tasks.AddDocTask.setup(AddDocTask.java:61)
          	at org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:89)
          	at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
          	at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
          	at org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
          	at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141)
          	at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122)
          	at org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92)
          	at org.apache.lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246)
          	at org.apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73)
          	at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.execBenchmark(TestPerfTasksLogic.java:455)
          	at org.apache.lucene.benchmark.quality.TestQualityRun.createReutersIndex(TestQualityRun.java:173)
          	at org.apache.lucene.benchmark.quality.TestQualityRun.testTrecQuality(TestQualityRun.java:56)
            </error>
          </testcase>
          

          I am not sure if this failing test have anything to do with the patch.

          I then tested the resulting lucene-core.jar as a drop-in-replacement for my big portal (www.pangaea.de) - as we are waiting for the optimized reopen with sorted search results here since months. In my test environment I got no errors/exception etc. After tests, I set it online on my productive system (I will look into the error logs of the webserver for exceptions).

          The speed increase of a sorted search after a reopen of the index (600,000 docs), that only added some documents into a new cfs segment, is incredible. In the past, the warmup for filling the field cache was about 3 seconds - now it shows up without any recognizable time. So no warmup after reopen is needed anymore.

          One thing I noticed when compiling my code against the new lucene-core.jar:
          Top(Field)Docs deprecated the method getMaxScore(). Why is this so? To display search results with a score normalized to 1.0, you need to divide by the maximum score. The docs say, you should implement your own HitCollector (why that, I want to use TopDocs?), but why does TopDocs deprecate the maximum score? OK, for relevance/score sorted TopDocs, this is no problem, as the maximum score is in ScoreDoc[0], but for docs sorted by fields (which extends TopFieldDoc), you need the max score. If you have generic search code that does not distinguish between TopDocs and TopFieldDocs, the generic code is to divide by TopDocs.getMaxScore().

          I am happy, +1 for commiting. But let getMaxScore live after Lucene 3.0!

          Show
          thetaphi Uwe Schindler added a comment - Hi Michael & Mike, great work. I patched my tree and compiled, no problems. One test failed with the following exception: <testcase classname= "org.apache.lucene.benchmark.quality.TestQualityRun" name= "testTrecQuality" time= "64.656" > <error type= "java.lang.NullPointerException" >java.lang.NullPointerException at org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker.getNextDocData(ReutersDocMaker.java:112) at org.apache.lucene.benchmark.byTask.feeds.BasicDocMaker.makeDocument(BasicDocMaker.java:98) at org.apache.lucene.benchmark.byTask.tasks.AddDocTask.setup(AddDocTask.java:61) at org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:89) at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141) at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122) at org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92) at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doSerialTasks(TaskSequence.java:141) at org.apache.lucene.benchmark.byTask.tasks.TaskSequence.doLogic(TaskSequence.java:122) at org.apache.lucene.benchmark.byTask.tasks.PerfTask.runAndMaybeStats(PerfTask.java:92) at org.apache.lucene.benchmark.byTask.utils.Algorithm.execute(Algorithm.java:246) at org.apache.lucene.benchmark.byTask.Benchmark.execute(Benchmark.java:73) at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.execBenchmark(TestPerfTasksLogic.java:455) at org.apache.lucene.benchmark.quality.TestQualityRun.createReutersIndex(TestQualityRun.java:173) at org.apache.lucene.benchmark.quality.TestQualityRun.testTrecQuality(TestQualityRun.java:56) </error> </testcase> I am not sure if this failing test have anything to do with the patch. I then tested the resulting lucene-core.jar as a drop-in-replacement for my big portal (www.pangaea.de) - as we are waiting for the optimized reopen with sorted search results here since months. In my test environment I got no errors/exception etc. After tests, I set it online on my productive system (I will look into the error logs of the webserver for exceptions). The speed increase of a sorted search after a reopen of the index (600,000 docs), that only added some documents into a new cfs segment, is incredible. In the past, the warmup for filling the field cache was about 3 seconds - now it shows up without any recognizable time. So no warmup after reopen is needed anymore. One thing I noticed when compiling my code against the new lucene-core.jar: Top(Field)Docs deprecated the method getMaxScore(). Why is this so? To display search results with a score normalized to 1.0, you need to divide by the maximum score. The docs say, you should implement your own HitCollector (why that, I want to use TopDocs?), but why does TopDocs deprecate the maximum score? OK, for relevance/score sorted TopDocs, this is no problem, as the maximum score is in ScoreDoc [0] , but for docs sorted by fields (which extends TopFieldDoc), you need the max score. If you have generic search code that does not distinguish between TopDocs and TopFieldDocs, the generic code is to divide by TopDocs.getMaxScore(). I am happy, +1 for commiting. But let getMaxScore live after Lucene 3.0!
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I went over all of the diffs for a while last night - I didn't see anything that seemed worth putting up a patch for. All of the tweaks, fixes, polish looks great to me. All tests also passing for me.

          +1 on committing in a few days barring any objections.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I went over all of the diffs for a while last night - I didn't see anything that seemed worth putting up a patch for. All of the tweaks, fixes, polish looks great to me. All tests also passing for me. +1 on committing in a few days barring any objections.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I named it "TopScoreDocCollector" and left TopDocCollector as it was. New patch attached.

          I think this is finally ready to commit! I'll wait a few days for any comments...

          Show
          mikemccand Michael McCandless added a comment - OK I named it "TopScoreDocCollector" and left TopDocCollector as it was. New patch attached. I think this is finally ready to commit! I'll wait a few days for any comments...
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          Nice work Mike - pretty polished. I've spent a little time looking it over, but I'm going to look more tonight. Everything looking pretty good to me.

          Not sure what to name that new class, but here are some ideas:

          TopScoreDocCollector
          TopHitCollector
          TopResultCollector
          TopMatchCollector
          TopCollector
          TopScoreCollector

          Could be a low score, so that last one is odd, but I guess the low would kind of be the top...
          edit
          nevermind...I was thinking lowest score could be considered top match, but it wouldnt be the case with this hitcollector implementation, so I guess it makes as much sense as any of the others.

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited Nice work Mike - pretty polished. I've spent a little time looking it over, but I'm going to look more tonight. Everything looking pretty good to me. Not sure what to name that new class, but here are some ideas: TopScoreDocCollector TopHitCollector TopResultCollector TopMatchCollector TopCollector TopScoreCollector Could be a low score, so that last one is odd, but I guess the low would kind of be the top... edit nevermind...I was thinking lowest score could be considered top match, but it wouldnt be the case with this hitcollector implementation, so I guess it makes as much sense as any of the others.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I made a bunch of small fixes.

          With this patch there is one back-compat test failing
          (TestScorerPerf), because it seems to assume that a searcher on an
          index with 0 segments will call Scorer.collect(HitCollector). We used
          to do so, but no longer as of this patch, and I don't think it's a
          reasonable assumption, so I plan to commit a small fix to the test on
          the back compat branch (and trunk) to add a single empty doc to the
          index.

          One thing worries me: we changed TopDocCollector to subclass
          MultiReaderHitCollector instead of HitCollector, which means it
          (TopDocCollector) now tracks docBase. But this means subclasses of
          TopDocCollector will suddenly see the not-rebased docID passed to
          their collect methods, resulting in a hard to detect bug for the app.

          Maybe we should leave TopDocCollector how it was, and make a new
          TopDocCollector (any name suggestion?) that subclasses from
          MultiReaderHitCollector?

          Changes:

          • Renamed StringOrdValOnDem2Comparator --> StringOrdValComparator;
            commented out the other String*Comparator except for StringVal
          • Switched over to sorted sub-readers for all searching in
            IndexSearcher
          • Moved sub-reader sorting to IndexSearcher's ctor
          • Also removed SortField.STRING* except for STRING (uses
            StringOrdValComparator) and STRING_VAL (uses StringValComparator)
          • Changed FieldComparatorSource from interface --> abstract class;
            removed Serializable
          • Moved the "set legacy sort when using legacy custom sort" logic
            into SortField out of IndexSearcher
          • Fixed TimeLimitedCollector, ParallelMultiSearcher, MultiSearcher
            to "pass down" MultiReaderHitCollector if that's what they were
            passed in, else wrap the HitCollector
          • Added test in TestSort to test FieldComparatorSource (custom
            sort)
          • Addressed/removed all nocommits, removed dead code
          • Added some javadocs; removed unused imports
          • Fixed whitespace
          • Added entries to CHANGES.txt
          Show
          mikemccand Michael McCandless added a comment - OK I made a bunch of small fixes. With this patch there is one back-compat test failing (TestScorerPerf), because it seems to assume that a searcher on an index with 0 segments will call Scorer.collect(HitCollector). We used to do so, but no longer as of this patch, and I don't think it's a reasonable assumption, so I plan to commit a small fix to the test on the back compat branch (and trunk) to add a single empty doc to the index. One thing worries me: we changed TopDocCollector to subclass MultiReaderHitCollector instead of HitCollector, which means it (TopDocCollector) now tracks docBase. But this means subclasses of TopDocCollector will suddenly see the not-rebased docID passed to their collect methods, resulting in a hard to detect bug for the app. Maybe we should leave TopDocCollector how it was, and make a new TopDocCollector (any name suggestion?) that subclasses from MultiReaderHitCollector? Changes: Renamed StringOrdValOnDem2Comparator --> StringOrdValComparator; commented out the other String*Comparator except for StringVal Switched over to sorted sub-readers for all searching in IndexSearcher Moved sub-reader sorting to IndexSearcher's ctor Also removed SortField.STRING* except for STRING (uses StringOrdValComparator) and STRING_VAL (uses StringValComparator) Changed FieldComparatorSource from interface --> abstract class; removed Serializable Moved the "set legacy sort when using legacy custom sort" logic into SortField out of IndexSearcher Fixed TimeLimitedCollector, ParallelMultiSearcher, MultiSearcher to "pass down" MultiReaderHitCollector if that's what they were passed in, else wrap the HitCollector Added test in TestSort to test FieldComparatorSource (custom sort) Addressed/removed all nocommits, removed dead code Added some javadocs; removed unused imports Fixed whitespace Added entries to CHANGES.txt
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Okay, its not a handbook, but this should be a much better summary. Hopefully I didnt miss anything too major.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Okay, its not a handbook, but this should be a much better summary. Hopefully I didnt miss anything too major.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Here is a start at a better summary. It could be improved.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Here is a start at a better summary. It could be improved.
          Hide
          michaelbusch Michael Busch added a comment -

          Mark and Mike,

          this issue and the patch are amazingly long and catching up here after vacation is pretty hard. Maybe you could update the description of this issue with a summary (maybe a bullet list?) that describes the main goals and changes here? That would be great...

          Show
          michaelbusch Michael Busch added a comment - Mark and Mike, this issue and the patch are amazingly long and catching up here after vacation is pretty hard. Maybe you could update the description of this issue with a summary (maybe a bullet list?) that describes the main goals and changes here? That would be great...
          Hide
          mikemccand Michael McCandless added a comment -

          I'm working on another iteration of this patch, cleaning things up, adding javadocs, etc., in preparation for committing...

          Show
          mikemccand Michael McCandless added a comment - I'm working on another iteration of this patch, cleaning things up, adding javadocs, etc., in preparation for committing...
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Okay, I think I have it. I tried to count the terms per segment by getting a term enum and looping through it for each sub reader, but something must have been off with that count. Taking a second look with checkindex, and all of the docs / terms are piled into the first couple segments - the rest are a long tail with few terms. So it makes sense then - for every segment with few terms in it, all of the unique terms for the whole index get checked against it. A segment with even 1 term will be hit against for every unique term in the whole index. Thats what was happening in this case, as its like a logarithmic drop. I'll try playing around some more with less of a thin tail end of segments, but I guess that is enough to explain the drop in seeks in this case.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Okay, I think I have it. I tried to count the terms per segment by getting a term enum and looping through it for each sub reader, but something must have been off with that count. Taking a second look with checkindex, and all of the docs / terms are piled into the first couple segments - the rest are a long tail with few terms. So it makes sense then - for every segment with few terms in it, all of the unique terms for the whole index get checked against it. A segment with even 1 term will be hit against for every unique term in the whole index. Thats what was happening in this case, as its like a logarithmic drop. I'll try playing around some more with less of a thin tail end of segments, but I guess that is enough to explain the drop in seeks in this case.
          Hide
          mikemccand Michael McCandless added a comment -

          In fact this probably causes the underlying buffer in
          BufferedIndexReader to get reloaded many times whenever we cross a
          boundary

          OK I was wrong about this – there is logic in TermInfosReader.get to not go backwards to the last index term. So we are in fact reading each tis file sequentially...

          Show
          mikemccand Michael McCandless added a comment - In fact this probably causes the underlying buffer in BufferedIndexReader to get reloaded many times whenever we cross a boundary OK I was wrong about this – there is logic in TermInfosReader.get to not go backwards to the last index term. So we are in fact reading each tis file sequentially...
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          My previous results had a few oddities going with them (I was loosely playing around). Being a little more careful, here is an example of the difference, and the hotspots. Timings are probably not completely comparable as my comp couldnt keep up profiling the second version very well - its much slower without profiling as well though:

          Index is 600000 docs, 46 segments

          Load the fieldcache on one multireader

          method time invocations
          FieldCacheImpl.createValue 156536(98%) 1
          MultiTermDocs.next() 148499(93.5%) 621803
          MutliTermDocs(int) 140397(88.4%) 1002938
          SegmentTermDocs.seek(Term) 138332(87.1%) 1002938

          load the fieldcache on each sub reader of the multireader, one at a time

          method time invocations
          FieldCacheImpl.createValue 7815(80.4%) 46
          SegmentTermDocs.next() 3315(34.1%) 642046
          SegmentTermEnum.next() 1936(19.9%) 42046
          SegmentTermDocs.seek(TermEnum) 874(9%) 42046

          edit
          wrong values

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited My previous results had a few oddities going with them (I was loosely playing around). Being a little more careful, here is an example of the difference, and the hotspots. Timings are probably not completely comparable as my comp couldnt keep up profiling the second version very well - its much slower without profiling as well though: Index is 600000 docs, 46 segments Load the fieldcache on one multireader method time invocations FieldCacheImpl.createValue 156536(98%) 1 MultiTermDocs.next() 148499(93.5%) 621803 MutliTermDocs(int) 140397(88.4%) 1002938 SegmentTermDocs.seek(Term) 138332(87.1%) 1002938 load the fieldcache on each sub reader of the multireader, one at a time method time invocations FieldCacheImpl.createValue 7815(80.4%) 46 SegmentTermDocs.next() 3315(34.1%) 642046 SegmentTermEnum.next() 1936(19.9%) 42046 SegmentTermDocs.seek(TermEnum) 874(9%) 42046 edit wrong values
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          think the massive slowness of iterating through all terms & docs from a MultiTermEnum/Docs may come from asking the N-1 SegmentReaders to seek to a non-existent (for them) term.

          I've seen cases where the MultiTermEnum was the bottleneck (compared to the MultiTermDocs) when iterating over all docs for all terms in a field. But quickly looking at the code, MultiTermEnum.next() looks pretty efficient.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - think the massive slowness of iterating through all terms & docs from a MultiTermEnum/Docs may come from asking the N-1 SegmentReaders to seek to a non-existent (for them) term. I've seen cases where the MultiTermEnum was the bottleneck (compared to the MultiTermDocs) when iterating over all docs for all terms in a field. But quickly looking at the code, MultiTermEnum.next() looks pretty efficient.
          Hide
          mikemccand Michael McCandless added a comment -

          Even still, you are seeing like a 40% diff, but small enough times to not matter.

          Right, good point.

          I think the massive slowness of iterating through all terms & docs
          from a MultiTermEnum/Docs may come from asking the N-1 SegmentReaders
          to seek to a non-existent (for them) term.

          Ie when we ask MultiTermDocs to seek to a unique title X, only the
          particular segment that title X comes from actually has it, whereas
          the others do a costly seek to the index term just before it then scan
          to look for the non-existent term, and then repeat that for the next
          title, etc.

          In fact this probably causes the underlying buffer in
          BufferedIndexReader to get reloaded many times whenever we cross a
          boundary (ie, we keep flipping between buffer N and N+1, then back to
          N then N+1 again, etc.) – maybe that's the source massive slowness?

          BTW I think this change may also speed up Range/PrefixQuery as well.

          Show
          mikemccand Michael McCandless added a comment - Even still, you are seeing like a 40% diff, but small enough times to not matter. Right, good point. I think the massive slowness of iterating through all terms & docs from a MultiTermEnum/Docs may come from asking the N-1 SegmentReaders to seek to a non-existent (for them) term. Ie when we ask MultiTermDocs to seek to a unique title X, only the particular segment that title X comes from actually has it, whereas the others do a costly seek to the index term just before it then scan to look for the non-existent term, and then repeat that for the next title, etc. In fact this probably causes the underlying buffer in BufferedIndexReader to get reloaded many times whenever we cross a boundary (ie, we keep flipping between buffer N and N+1, then back to N then N+1 again, etc.) – maybe that's the source massive slowness? BTW I think this change may also speed up Range/PrefixQuery as well.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I think its pretty costly even for non id type fields. In your enum case, their are what, 50 unique values? Even still, you are seeing like a 40% diff, but small enough times to not matter.

          My test example has 20,000 unique terms for 600,000 documents (lots of overlap, 2-8 char strings, 1-9, I think), so quite a bit short of a primary key - but it still was WAY faster with the new method.

          Old method non optimized, 79 segments - 1.5 million seeks, WAY slow.
          Old method, optimized, 1 segment - 20,000 seeks, pretty darn fast.
          New method, non optimized, 79 segments - 40,000 seeks, pretty darn fast.

          While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize.

          Right, but for realtime search you don't have the luxury of
          optimizing. This patch makes warming time after reopen much faster
          for a many-segment index for apps that use FieldCache with mostly unique String
          fields.

          Right, I got you - I know we can't optimize. I was just realizing that explaining why 100 segments was so slow was not explaining why the new method on 100 segments was so fast. I still don't think I fully have why that is. I don't think getting to use the unique terms at each segment saves enough seeks for what I am seeing. Especially in this test case, the terms should be pretty evenly distributed across segments...

          Show
          markrmiller@gmail.com Mark Miller added a comment - I think its pretty costly even for non id type fields. In your enum case, their are what, 50 unique values? Even still, you are seeing like a 40% diff, but small enough times to not matter. My test example has 20,000 unique terms for 600,000 documents (lots of overlap, 2-8 char strings, 1-9, I think), so quite a bit short of a primary key - but it still was WAY faster with the new method. Old method non optimized, 79 segments - 1.5 million seeks, WAY slow. Old method, optimized, 1 segment - 20,000 seeks, pretty darn fast. New method, non optimized, 79 segments - 40,000 seeks, pretty darn fast. While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize. Right, but for realtime search you don't have the luxury of optimizing. This patch makes warming time after reopen much faster for a many-segment index for apps that use FieldCache with mostly unique String fields. Right, I got you - I know we can't optimize. I was just realizing that explaining why 100 segments was so slow was not explaining why the new method on 100 segments was so fast. I still don't think I fully have why that is. I don't think getting to use the unique terms at each segment saves enough seeks for what I am seeing. Especially in this test case, the terms should be pretty evenly distributed across segments...
          Hide
          mikemccand Michael McCandless added a comment -

          As we call next on MultiTermDocs it will get a TermDocs for each Reader and call seek to get to the Term. The seek appears pretty slow, and we do it for the number of Readers x the number of Terms to be loaded.

          Right – the uninverting we do to populate the FieldCache is very
          costly through MultiReader for fields that are mostly unique String
          (eg a title field, or a "primary key" id field, etc.).

          Enum type fields (like country) don't have this problem (1.0 sec vs
          0.6 sec to populate FieldCache through MultiReader for the 100 segment
          index).

          But, with this change, we sidestep this problem for Lucene's core, but
          for apps that directly load FieldCache for the MultiReader the problem
          is still there.

          Once we have column stride fields (LUCENE-1231) it should then be far
          faster to load the FieldCache for unique String fields.

          While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize.

          Right, but for realtime search you don't have the luxury of
          optimizing. This patch makes warming time after reopen much faster
          for a many-segment index for apps that use FieldCache with mostly unique String
          fields.

          Show
          mikemccand Michael McCandless added a comment - As we call next on MultiTermDocs it will get a TermDocs for each Reader and call seek to get to the Term. The seek appears pretty slow, and we do it for the number of Readers x the number of Terms to be loaded. Right – the uninverting we do to populate the FieldCache is very costly through MultiReader for fields that are mostly unique String (eg a title field, or a "primary key" id field, etc.). Enum type fields (like country) don't have this problem (1.0 sec vs 0.6 sec to populate FieldCache through MultiReader for the 100 segment index). But, with this change, we sidestep this problem for Lucene's core, but for apps that directly load FieldCache for the MultiReader the problem is still there. Once we have column stride fields ( LUCENE-1231 ) it should then be far faster to load the FieldCache for unique String fields. While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize. Right, but for realtime search you don't have the luxury of optimizing. This patch makes warming time after reopen much faster for a many-segment index for apps that use FieldCache with mostly unique String fields.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Whoops. As usual, getting ahead of myself. Perhaps there won't be big gains with those other queries.

          While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize.

          Even when we search each individual indexsearcher with a single hit queue (this patch), we have to load the field cache for each segment and do a seek, the same as the old method with the multireader. One seek for each reader for each term.

          However, it still appears that we get to do WAY fewer seeks this way - for my last example, maybe like 40000 seeks. Quite a bit better than 1.5 million. But why?

          Perhaps its because we can use only the terms from each segment. Then, rather than num readers X total unique terms seeks, you have the sum of unique terms per index seeks. That could count for a lot of saved seeks, but I am not sure that it accounts for all of them (1.5 mil to 40,000 is quite the drop). Beyond all the saved seeks though, I imagine its more efficient to hit the same reader n times, than to hit each reader round robin n times. Something makes me think there is something else that allows us to avoid seeks, but not sure what yet...

          Show
          markrmiller@gmail.com Mark Miller added a comment - Whoops. As usual, getting ahead of myself. Perhaps there won't be big gains with those other queries. While there is a big difference between searching a single segment vs multisegments for these things, we already knew about that - thats why you optimize. Even when we search each individual indexsearcher with a single hit queue (this patch), we have to load the field cache for each segment and do a seek, the same as the old method with the multireader. One seek for each reader for each term. However, it still appears that we get to do WAY fewer seeks this way - for my last example, maybe like 40000 seeks. Quite a bit better than 1.5 million. But why? Perhaps its because we can use only the terms from each segment. Then, rather than num readers X total unique terms seeks, you have the sum of unique terms per index seeks. That could count for a lot of saved seeks, but I am not sure that it accounts for all of them (1.5 mil to 40,000 is quite the drop). Beyond all the saved seeks though, I imagine its more efficient to hit the same reader n times, than to hit each reader round robin n times. Something makes me think there is something else that allows us to avoid seeks, but not sure what yet...
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Cool - this problem affects other things too, like SpanQueries. This will be a good boost for them as well if you have more than a few segments. Same with the multi term queries I think (enumeration should be faster). This patch is going to be dope for performance all around.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Cool - this problem affects other things too, like SpanQueries. This will be a good boost for them as well if you have more than a few segments. Same with the multi term queries I think (enumeration should be faster). This patch is going to be dope for performance all around.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Or put in another way:

          the single segment version, with about 20000 terms for a field will seek on one segment about 20,000 times, once for each term.

          a multisegment version over 79 segments will seek about 1,580,000 times. The MultiSegment seek for each term is pretty much free, so I dont count that, but you seek on each term for each individual segment as you call next on the TermDocs. If your terms are spread across all of the segments, its not even much cheaper on any given segment to do the seek.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Or put in another way: the single segment version, with about 20000 terms for a field will seek on one segment about 20,000 times, once for each term. a multisegment version over 79 segments will seek about 1,580,000 times. The MultiSegment seek for each term is pretty much free, so I dont count that, but you seek on each term for each individual segment as you call next on the TermDocs. If your terms are spread across all of the segments, its not even much cheaper on any given segment to do the seek.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Re: The FieldCache loading problem with MultiSegmentReader

          I think this is just the overhead cost associated with how we currently handle this.

          To load a FieldCache for a single segment we get a SegmentTermDocs and just keep calling next, which basically just calls readbyte and readint over and over on the index. Pretty efficient. Do it for each term to be loaded.

          To load a FieldCache for multiple segments we get a MultiSegmentReader to get MultiTermDocs. As we call next on MultiTermDocs it will get a TermDocs for each Reader and call seek to get to the Term. The seek appears pretty slow, and we do it for the number of Readers x the number of Terms to be loaded.

          Under seek, TermInfosReader.get looks slow with SegmentTermeEnum.scanTo looking to be the worst offender under it.

          All in all though, there is just a big difference in whats going on, that compounds with more segments.

          • Mark
          Show
          markrmiller@gmail.com Mark Miller added a comment - Re: The FieldCache loading problem with MultiSegmentReader I think this is just the overhead cost associated with how we currently handle this. To load a FieldCache for a single segment we get a SegmentTermDocs and just keep calling next, which basically just calls readbyte and readint over and over on the index. Pretty efficient. Do it for each term to be loaded. To load a FieldCache for multiple segments we get a MultiSegmentReader to get MultiTermDocs. As we call next on MultiTermDocs it will get a TermDocs for each Reader and call seek to get to the Term. The seek appears pretty slow, and we do it for the number of Readers x the number of Terms to be loaded. Under seek, TermInfosReader.get looks slow with SegmentTermeEnum.scanTo looking to be the worst offender under it. All in all though, there is just a big difference in whats going on, that compounds with more segments. Mark
          Hide
          mikemccand Michael McCandless added a comment -

          Woops, last one had compilation errors in contrib/benchmark... this one should work.

          Show
          mikemccand Michael McCandless added a comment - Woops, last one had compilation errors in contrib/benchmark... this one should work.
          Hide
          mikemccand Michael McCandless added a comment -

          New patch w/ ComparatorPolicy removed.

          Show
          mikemccand Michael McCandless added a comment - New patch w/ ComparatorPolicy removed.
          Hide
          mikemccand Michael McCandless added a comment -

          Do we want to keep the policy idea if we don't end up using any that switch comparators?

          Good question. I think we should not keep it – all our attempts to use it to improve performance have not succeeded. And then we can stick with FieldComparatorSource for custom sorting. I'll make this change.

          I'd also like to see if sorting readers by size really does any good.

          I think this likely does help, because if you visit the largest segment first, your queue will tend to quickly get most of its inserts out of the way. ORDDEM2 is fastest on the first segment because all ORDs are comparable. I think this'll matter most for indexes that have many deletions, such that the old large segments contain many deletes (in which case they are sorted later).

          Show
          mikemccand Michael McCandless added a comment - Do we want to keep the policy idea if we don't end up using any that switch comparators? Good question. I think we should not keep it – all our attempts to use it to improve performance have not succeeded. And then we can stick with FieldComparatorSource for custom sorting. I'll make this change. I'd also like to see if sorting readers by size really does any good. I think this likely does help, because if you visit the largest segment first, your queue will tend to quickly get most of its inserts out of the way. ORDDEM2 is fastest on the first segment because all ORDs are comparable. I think this'll matter most for indexes that have many deletions, such that the old large segments contain many deletes (in which case they are sorted later).
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Nice Mike! Those results look great. Do we want to keep the policy idea if we don't end up using any that switch comparators? I'd also like to see if sorting readers by size really does any good.

          >>I think for a custom sort we should allow user to pass in a ComparatorPolicy?

          Yeah def. If we keep the policies, we should rip out the custom comparator and allow for a custom policy.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Nice Mike! Those results look great. Do we want to keep the policy idea if we don't end up using any that switch comparators? I'd also like to see if sorting readers by size really does any good. >>I think for a custom sort we should allow user to pass in a ComparatorPolicy? Yeah def. If we keep the policies, we should rip out the custom comparator and allow for a custom policy.
          Hide
          mikemccand Michael McCandless added a comment -

          I decided to change the wiki indexes I use for perf. testing. Previously
          the 10 and 100 segment indexes all had fix-sized segments, which is
          not natural – normally segments have a logarithmic size distribution.
          So I created new 10 and 36 segment indexes from Wikipedia, with 2M
          docs that are more "natural", and re-ran tests on them.

          I think net/net ORDDEM2 makes a good default policy!

          queue=1000

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 1000 2000000 0.9 10.0 0.7 12.0 20.0%
          1 simple country ordsub text 1000 2000000 0.9 10.0 0.6 13.6 36.0%
          1 simple country ordval text 1000 2000000 0.9 10.0 0.6 13.9 39.0%
          1 simple country orddem text 1000 2000000 0.9 10.0 0.7 13.8 38.0%
          1 simple country orddem2 text 1000 2000000 0.9 10.0 0.6 13.9 39.0%
          1 wiki title val 147 1000 4984 2.1 739.1 2.3 379.7 -48.6%
          1 wiki title ordsub 147 1000 4984 2.1 739.1 2.1 954.7 29.2%
          1 wiki title ordval 147 1000 4984 2.1 739.1 2.1 951.3 28.7%
          1 wiki title orddem 147 1000 4984 2.1 739.1 2.1 858.8 16.2%
          1 wiki title orddem2 147 1000 4984 2.1 739.1 2.1 860.7 16.5%
          1 wiki title val text 1000 97191 2.1 109.5 2.4 33.1 -69.8%
          1 wiki title ordsub text 1000 97191 2.1 109.5 2.1 125.8 14.9%
          1 wiki title ordval text 1000 97191 2.1 109.5 2.1 126.1 15.2%
          1 wiki title orddem text 1000 97191 2.1 109.5 2.1 123.4 12.7%
          1 wiki title orddem2 text 1000 97191 2.1 109.5 2.1 124.5 13.7%
          1 wiki title val 1 1000 386435 2.1 46.5 2.4 12.7 -72.7%
          1 wiki title ordsub 1 1000 386435 2.1 46.5 2.1 60.7 30.5%
          1 wiki title ordval 1 1000 386435 2.1 46.5 2.1 61.3 31.8%
          1 wiki title orddem 1 1000 386435 2.1 46.5 2.1 61.0 31.2%
          1 wiki title orddem2 1 1000 386435 2.1 46.5 2.1 60.8 30.8%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 1000 2000000 0.8 9.7 0.6 11.7 20.6%
          10 simple country ordsub text 1000 2000000 0.8 9.7 0.6 13.4 38.1%
          10 simple country ordval text 1000 2000000 0.8 9.7 0.6 13.6 40.2%
          10 simple country orddem text 1000 2000000 0.8 9.7 0.6 13.6 40.2%
          10 simple country orddem2 text 1000 2000000 0.8 9.7 0.6 13.5 39.2%
          10 wiki title val 147 1000 4984 12.6 659.2 2.3 406.4 -38.3%
          10 wiki title ordsub 147 1000 4984 12.6 659.2 2.7 68.6 -89.6%
          10 wiki title ordval 147 1000 4984 12.6 659.2 2.1 86.4 -86.9%
          10 wiki title orddem 147 1000 4984 12.6 659.2 2.1 117.8 -82.1%
          10 wiki title orddem2 147 1000 4984 12.6 659.2 2.1 288.0 -56.3%
          10 wiki title val text 1000 97191 12.7 101.2 2.4 33.9 -66.5%
          10 wiki title ordsub text 1000 97191 12.7 101.2 2.7 68.8 -32.0%
          10 wiki title ordval text 1000 97191 12.7 101.2 2.2 83.1 -17.9%
          10 wiki title orddem text 1000 97191 12.7 101.2 2.2 87.4 -13.6%
          10 wiki title orddem2 text 1000 97191 12.7 101.2 2.2 100.8 -0.4%
          10 wiki title val 1 1000 386435 12.7 42.3 2.5 14.5 -65.7%
          10 wiki title ordsub 1 1000 386435 12.7 42.3 2.7 46.4 9.7%
          10 wiki title ordval 1 1000 386435 12.7 42.3 2.2 53.0 25.3%
          10 wiki title orddem 1 1000 386435 12.7 42.3 2.2 54.7 29.3%
          10 wiki title orddem2 1 1000 386435 12.7 42.3 2.2 57.4 35.7%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          36 wiki title val 147 1000 4984 42.1 609.6 2.3 358.5 -41.2%
          36 wiki title ordsub 147 1000 4984 42.1 609.6 2.7 37.6 -93.8%
          36 wiki title ordval 147 1000 4984 42.1 609.6 2.2 58.2 -90.5%
          36 wiki title orddem 147 1000 4984 42.1 609.6 2.1 113.6 -81.4%
          36 wiki title orddem2 147 1000 4984 42.1 609.6 2.1 258.2 -57.6%
          36 wiki title val text 1000 97191 42.4 98.5 2.4 34.1 -65.4%
          36 wiki title ordsub text 1000 97191 42.4 98.5 2.7 39.1 -60.3%
          36 wiki title ordval text 1000 97191 42.4 98.5 2.2 65.1 -33.9%
          36 wiki title orddem text 1000 97191 42.4 98.5 2.2 85.9 -12.8%
          36 wiki title orddem2 text 1000 97191 42.4 98.5 2.2 94.9 -3.7%
          36 wiki title val 1 1000 386435 42.0 44.2 2.4 14.5 -67.2%
          36 wiki title ordsub 1 1000 386435 42.0 44.2 2.7 30.7 -30.5%
          36 wiki title ordval 1 1000 386435 42.0 44.2 2.2 47.6 7.7%
          36 wiki title orddem 1 1000 386435 42.0 44.2 2.3 54.9 24.2%
          36 wiki title orddem2 1 1000 386435 42.0 44.2 2.2 56.8 28.5%

          queue=10

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 10 2000000 0.6 10.6 0.7 12.4 17.0%
          1 simple country ordsub text 10 2000000 0.6 10.6 0.6 14.5 36.8%
          1 simple country ordval text 10 2000000 0.6 10.6 0.6 14.6 37.7%
          1 simple country orddem text 10 2000000 0.6 10.6 0.6 14.6 37.7%
          1 simple country orddem2 text 10 2000000 0.6 10.6 0.6 14.6 37.7%
          1 wiki title val 147 10 4984 2.1 3727.4 2.3 2546.8 -31.7%
          1 wiki title ordsub 147 10 4984 2.1 3727.4 2.0 4537.5 21.7%
          1 wiki title ordval 147 10 4984 2.1 3727.4 2.0 4568.1 22.6%
          1 wiki title orddem 147 10 4984 2.1 3727.4 2.1 4552.5 22.1%
          1 wiki title orddem2 147 10 4984 2.1 3727.4 2.0 4537.5 21.7%
          1 wiki title val text 10 97191 2.1 143.4 2.3 39.2 -72.7%
          1 wiki title ordsub text 10 97191 2.1 143.4 2.1 165.5 15.4%
          1 wiki title ordval text 10 97191 2.1 143.4 2.1 167.7 16.9%
          1 wiki title orddem text 10 97191 2.1 143.4 2.1 167.7 16.9%
          1 wiki title orddem2 text 10 97191 2.1 143.4 2.1 168.2 17.3%
          1 wiki title val 1 10 386435 2.1 51.8 2.4 13.7 -73.6%
          1 wiki title ordsub 1 10 386435 2.1 51.8 2.1 66.3 28.0%
          1 wiki title ordval 1 10 386435 2.1 51.8 2.1 69.5 34.2%
          1 wiki title orddem 1 10 386435 2.1 51.8 2.1 69.5 34.2%
          1 wiki title orddem2 1 10 386435 2.1 51.8 2.1 69.5 34.2%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 10 2000000 0.7 10.2 0.5 12.2 19.6%
          10 simple country ordsub text 10 2000000 0.7 10.2 0.5 14.2 39.2%
          10 simple country ordval text 10 2000000 0.7 10.2 0.5 14.5 42.2%
          10 simple country orddem text 10 2000000 0.7 10.2 0.5 14.4 41.2%
          10 simple country orddem2 text 10 2000000 0.7 10.2 0.5 14.4 41.2%
          10 wiki title val 147 10 4984 12.5 2943.0 2.3 1761.1 -40.2%
          10 wiki title ordsub 147 10 4984 12.5 2943.0 2.1 3035.3 3.1%
          10 wiki title ordval 147 10 4984 12.5 2943.0 2.1 3325.2 13.0%
          10 wiki title orddem 147 10 4984 12.5 2943.0 2.1 3481.8 18.3%
          10 wiki title orddem2 147 10 4984 12.5 2943.0 2.1 3574.1 21.4%
          10 wiki title val text 10 97191 12.6 139.2 2.4 39.3 -71.8%
          10 wiki title ordsub text 10 97191 12.6 139.2 2.1 161.7 16.2%
          10 wiki title ordval text 10 97191 12.6 139.2 2.2 165.2 18.7%
          10 wiki title orddem text 10 97191 12.6 139.2 2.1 165.4 18.8%
          10 wiki title orddem2 text 10 97191 12.6 139.2 2.1 166.3 19.5%
          10 wiki title val 1 10 386435 12.6 50.4 2.4 15.5 -69.2%
          10 wiki title ordsub 1 10 386435 12.6 50.4 2.1 65.8 30.6%
          10 wiki title ordval 1 10 386435 12.6 50.4 2.1 68.1 35.1%
          10 wiki title orddem 1 10 386435 12.6 50.4 2.1 67.9 34.7%
          10 wiki title orddem2 1 10 386435 12.6 50.4 2.2 67.8 34.5%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          36 wiki title val 147 10 4984 42.0 2137.0 2.3 1143.0 -46.5%
          36 wiki title ordsub 147 10 4984 42.0 2137.0 2.1 1641.5 -23.2%
          36 wiki title ordval 147 10 4984 42.0 2137.0 2.1 2046.0 -4.3%
          36 wiki title orddem 147 10 4984 42.0 2137.0 2.1 2203.9 3.1%
          36 wiki title orddem2 147 10 4984 42.0 2137.0 2.1 2215.3 3.7%
          36 wiki title val text 10 97191 42.2 135.2 2.4 39.8 -70.6%
          36 wiki title ordsub text 10 97191 42.2 135.2 2.1 149.9 10.9%
          36 wiki title ordval text 10 97191 42.2 135.2 2.1 154.9 14.6%
          36 wiki title orddem text 10 97191 42.2 135.2 2.1 154.9 14.6%
          36 wiki title orddem2 text 10 97191 42.2 135.2 2.2 155.0 14.6%
          36 wiki title val 1 10 386435 41.9 49.7 2.4 15.7 -68.4%
          36 wiki title ordsub 1 10 386435 41.9 49.7 2.2 65.9 32.6%
          36 wiki title ordval 1 10 386435 41.9 49.7 2.2 67.4 35.6%
          36 wiki title orddem 1 10 386435 41.9 49.7 2.1 67.9 36.6%
          36 wiki title orddem2 1 10 386435 41.9 49.7 2.2 67.7 36.2%
          Show
          mikemccand Michael McCandless added a comment - I decided to change the wiki indexes I use for perf. testing. Previously the 10 and 100 segment indexes all had fix-sized segments, which is not natural – normally segments have a logarithmic size distribution. So I created new 10 and 36 segment indexes from Wikipedia, with 2M docs that are more "natural", and re-ran tests on them. I think net/net ORDDEM2 makes a good default policy! queue=1000 numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 1000 2000000 0.9 10.0 0.7 12.0 20.0% 1 simple country ordsub text 1000 2000000 0.9 10.0 0.6 13.6 36.0% 1 simple country ordval text 1000 2000000 0.9 10.0 0.6 13.9 39.0% 1 simple country orddem text 1000 2000000 0.9 10.0 0.7 13.8 38.0% 1 simple country orddem2 text 1000 2000000 0.9 10.0 0.6 13.9 39.0% 1 wiki title val 147 1000 4984 2.1 739.1 2.3 379.7 -48.6% 1 wiki title ordsub 147 1000 4984 2.1 739.1 2.1 954.7 29.2% 1 wiki title ordval 147 1000 4984 2.1 739.1 2.1 951.3 28.7% 1 wiki title orddem 147 1000 4984 2.1 739.1 2.1 858.8 16.2% 1 wiki title orddem2 147 1000 4984 2.1 739.1 2.1 860.7 16.5% 1 wiki title val text 1000 97191 2.1 109.5 2.4 33.1 -69.8% 1 wiki title ordsub text 1000 97191 2.1 109.5 2.1 125.8 14.9% 1 wiki title ordval text 1000 97191 2.1 109.5 2.1 126.1 15.2% 1 wiki title orddem text 1000 97191 2.1 109.5 2.1 123.4 12.7% 1 wiki title orddem2 text 1000 97191 2.1 109.5 2.1 124.5 13.7% 1 wiki title val 1 1000 386435 2.1 46.5 2.4 12.7 -72.7% 1 wiki title ordsub 1 1000 386435 2.1 46.5 2.1 60.7 30.5% 1 wiki title ordval 1 1000 386435 2.1 46.5 2.1 61.3 31.8% 1 wiki title orddem 1 1000 386435 2.1 46.5 2.1 61.0 31.2% 1 wiki title orddem2 1 1000 386435 2.1 46.5 2.1 60.8 30.8% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 1000 2000000 0.8 9.7 0.6 11.7 20.6% 10 simple country ordsub text 1000 2000000 0.8 9.7 0.6 13.4 38.1% 10 simple country ordval text 1000 2000000 0.8 9.7 0.6 13.6 40.2% 10 simple country orddem text 1000 2000000 0.8 9.7 0.6 13.6 40.2% 10 simple country orddem2 text 1000 2000000 0.8 9.7 0.6 13.5 39.2% 10 wiki title val 147 1000 4984 12.6 659.2 2.3 406.4 -38.3% 10 wiki title ordsub 147 1000 4984 12.6 659.2 2.7 68.6 -89.6% 10 wiki title ordval 147 1000 4984 12.6 659.2 2.1 86.4 -86.9% 10 wiki title orddem 147 1000 4984 12.6 659.2 2.1 117.8 -82.1% 10 wiki title orddem2 147 1000 4984 12.6 659.2 2.1 288.0 -56.3% 10 wiki title val text 1000 97191 12.7 101.2 2.4 33.9 -66.5% 10 wiki title ordsub text 1000 97191 12.7 101.2 2.7 68.8 -32.0% 10 wiki title ordval text 1000 97191 12.7 101.2 2.2 83.1 -17.9% 10 wiki title orddem text 1000 97191 12.7 101.2 2.2 87.4 -13.6% 10 wiki title orddem2 text 1000 97191 12.7 101.2 2.2 100.8 -0.4% 10 wiki title val 1 1000 386435 12.7 42.3 2.5 14.5 -65.7% 10 wiki title ordsub 1 1000 386435 12.7 42.3 2.7 46.4 9.7% 10 wiki title ordval 1 1000 386435 12.7 42.3 2.2 53.0 25.3% 10 wiki title orddem 1 1000 386435 12.7 42.3 2.2 54.7 29.3% 10 wiki title orddem2 1 1000 386435 12.7 42.3 2.2 57.4 35.7% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 36 wiki title val 147 1000 4984 42.1 609.6 2.3 358.5 -41.2% 36 wiki title ordsub 147 1000 4984 42.1 609.6 2.7 37.6 -93.8% 36 wiki title ordval 147 1000 4984 42.1 609.6 2.2 58.2 -90.5% 36 wiki title orddem 147 1000 4984 42.1 609.6 2.1 113.6 -81.4% 36 wiki title orddem2 147 1000 4984 42.1 609.6 2.1 258.2 -57.6% 36 wiki title val text 1000 97191 42.4 98.5 2.4 34.1 -65.4% 36 wiki title ordsub text 1000 97191 42.4 98.5 2.7 39.1 -60.3% 36 wiki title ordval text 1000 97191 42.4 98.5 2.2 65.1 -33.9% 36 wiki title orddem text 1000 97191 42.4 98.5 2.2 85.9 -12.8% 36 wiki title orddem2 text 1000 97191 42.4 98.5 2.2 94.9 -3.7% 36 wiki title val 1 1000 386435 42.0 44.2 2.4 14.5 -67.2% 36 wiki title ordsub 1 1000 386435 42.0 44.2 2.7 30.7 -30.5% 36 wiki title ordval 1 1000 386435 42.0 44.2 2.2 47.6 7.7% 36 wiki title orddem 1 1000 386435 42.0 44.2 2.3 54.9 24.2% 36 wiki title orddem2 1 1000 386435 42.0 44.2 2.2 56.8 28.5% queue=10 numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 10 2000000 0.6 10.6 0.7 12.4 17.0% 1 simple country ordsub text 10 2000000 0.6 10.6 0.6 14.5 36.8% 1 simple country ordval text 10 2000000 0.6 10.6 0.6 14.6 37.7% 1 simple country orddem text 10 2000000 0.6 10.6 0.6 14.6 37.7% 1 simple country orddem2 text 10 2000000 0.6 10.6 0.6 14.6 37.7% 1 wiki title val 147 10 4984 2.1 3727.4 2.3 2546.8 -31.7% 1 wiki title ordsub 147 10 4984 2.1 3727.4 2.0 4537.5 21.7% 1 wiki title ordval 147 10 4984 2.1 3727.4 2.0 4568.1 22.6% 1 wiki title orddem 147 10 4984 2.1 3727.4 2.1 4552.5 22.1% 1 wiki title orddem2 147 10 4984 2.1 3727.4 2.0 4537.5 21.7% 1 wiki title val text 10 97191 2.1 143.4 2.3 39.2 -72.7% 1 wiki title ordsub text 10 97191 2.1 143.4 2.1 165.5 15.4% 1 wiki title ordval text 10 97191 2.1 143.4 2.1 167.7 16.9% 1 wiki title orddem text 10 97191 2.1 143.4 2.1 167.7 16.9% 1 wiki title orddem2 text 10 97191 2.1 143.4 2.1 168.2 17.3% 1 wiki title val 1 10 386435 2.1 51.8 2.4 13.7 -73.6% 1 wiki title ordsub 1 10 386435 2.1 51.8 2.1 66.3 28.0% 1 wiki title ordval 1 10 386435 2.1 51.8 2.1 69.5 34.2% 1 wiki title orddem 1 10 386435 2.1 51.8 2.1 69.5 34.2% 1 wiki title orddem2 1 10 386435 2.1 51.8 2.1 69.5 34.2% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 10 2000000 0.7 10.2 0.5 12.2 19.6% 10 simple country ordsub text 10 2000000 0.7 10.2 0.5 14.2 39.2% 10 simple country ordval text 10 2000000 0.7 10.2 0.5 14.5 42.2% 10 simple country orddem text 10 2000000 0.7 10.2 0.5 14.4 41.2% 10 simple country orddem2 text 10 2000000 0.7 10.2 0.5 14.4 41.2% 10 wiki title val 147 10 4984 12.5 2943.0 2.3 1761.1 -40.2% 10 wiki title ordsub 147 10 4984 12.5 2943.0 2.1 3035.3 3.1% 10 wiki title ordval 147 10 4984 12.5 2943.0 2.1 3325.2 13.0% 10 wiki title orddem 147 10 4984 12.5 2943.0 2.1 3481.8 18.3% 10 wiki title orddem2 147 10 4984 12.5 2943.0 2.1 3574.1 21.4% 10 wiki title val text 10 97191 12.6 139.2 2.4 39.3 -71.8% 10 wiki title ordsub text 10 97191 12.6 139.2 2.1 161.7 16.2% 10 wiki title ordval text 10 97191 12.6 139.2 2.2 165.2 18.7% 10 wiki title orddem text 10 97191 12.6 139.2 2.1 165.4 18.8% 10 wiki title orddem2 text 10 97191 12.6 139.2 2.1 166.3 19.5% 10 wiki title val 1 10 386435 12.6 50.4 2.4 15.5 -69.2% 10 wiki title ordsub 1 10 386435 12.6 50.4 2.1 65.8 30.6% 10 wiki title ordval 1 10 386435 12.6 50.4 2.1 68.1 35.1% 10 wiki title orddem 1 10 386435 12.6 50.4 2.1 67.9 34.7% 10 wiki title orddem2 1 10 386435 12.6 50.4 2.2 67.8 34.5% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 36 wiki title val 147 10 4984 42.0 2137.0 2.3 1143.0 -46.5% 36 wiki title ordsub 147 10 4984 42.0 2137.0 2.1 1641.5 -23.2% 36 wiki title ordval 147 10 4984 42.0 2137.0 2.1 2046.0 -4.3% 36 wiki title orddem 147 10 4984 42.0 2137.0 2.1 2203.9 3.1% 36 wiki title orddem2 147 10 4984 42.0 2137.0 2.1 2215.3 3.7% 36 wiki title val text 10 97191 42.2 135.2 2.4 39.8 -70.6% 36 wiki title ordsub text 10 97191 42.2 135.2 2.1 149.9 10.9% 36 wiki title ordval text 10 97191 42.2 135.2 2.1 154.9 14.6% 36 wiki title orddem text 10 97191 42.2 135.2 2.1 154.9 14.6% 36 wiki title orddem2 text 10 97191 42.2 135.2 2.2 155.0 14.6% 36 wiki title val 1 10 386435 41.9 49.7 2.4 15.7 -68.4% 36 wiki title ordsub 1 10 386435 41.9 49.7 2.2 65.9 32.6% 36 wiki title ordval 1 10 386435 41.9 49.7 2.2 67.4 35.6% 36 wiki title orddem 1 10 386435 41.9 49.7 2.1 67.9 36.6% 36 wiki title orddem2 1 10 386435 41.9 49.7 2.2 67.7 36.2%
          Hide
          mikemccand Michael McCandless added a comment -

          OK new patch finally:

          • Fixed a few small things YourKit uncovered, eg the private
            binarySearch() method was inserting extra run-time access check
            method so I changed it to package protected.
          • I changed ComparatorPolicy.nextComparator -> nextReader, making it
            that method's job to possibly switch comparators as well as call
            setNextReader internally. This allows us to pass in the
            sub-reader when we init a new comparator.
          • Improved TestStressSort some more.
          • I made a new ORDDEM2 that does not convert slots when comparing,
            but does convert the bottom slot. So it uses ORD in
            compareBottom, and ORD comparison between slots if the slots have
            the same gen, else compare-by-value. Also, when converting, it
            bounds the binarySearch by using bottomOrd. This seems to give
            better performance in the high-segment-count, large-queue-size,
            few hits cases, and comparable performance in other cases.

          I think for a custom sort we should allow user to pass in a ComparatorPolicy?

          Show
          mikemccand Michael McCandless added a comment - OK new patch finally: Fixed a few small things YourKit uncovered, eg the private binarySearch() method was inserting extra run-time access check method so I changed it to package protected. I changed ComparatorPolicy.nextComparator -> nextReader, making it that method's job to possibly switch comparators as well as call setNextReader internally. This allows us to pass in the sub-reader when we init a new comparator. Improved TestStressSort some more. I made a new ORDDEM2 that does not convert slots when comparing, but does convert the bottom slot. So it uses ORD in compareBottom, and ORD comparison between slots if the slots have the same gen, else compare-by-value. Also, when converting, it bounds the binarySearch by using bottomOrd. This seems to give better performance in the high-segment-count, large-queue-size, few hits cases, and comparable performance in other cases. I think for a custom sort we should allow user to pass in a ComparatorPolicy?
          Hide
          mikemccand Michael McCandless added a comment -

          I was wondering the same thing. ORD (policy/policy2 on a 1 seg index) ought to be faster than ORDSUB, always, but it's not. Somthing is amiss. I'll try running under YourKit to see if something stands out.

          Show
          mikemccand Michael McCandless added a comment - I was wondering the same thing. ORD (policy/policy2 on a 1 seg index) ought to be faster than ORDSUB, always, but it's not. Somthing is amiss. I'll try running under YourKit to see if something stands out.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Any ideas on why ORDSUBORD would beat the policies with 1 segment and 1000 queue size? That technically boils down to ORD VS ORDSUBORD, and ORD should edge out or be about the same - but ORDSUBORD clearly wins by a few percent each time. Does this point to some funky policy problem or something?

          Show
          markrmiller@gmail.com Mark Miller added a comment - Any ideas on why ORDSUBORD would beat the policies with 1 segment and 1000 queue size? That technically boils down to ORD VS ORDSUBORD, and ORD should edge out or be about the same - but ORDSUBORD clearly wins by a few percent each time. Does this point to some funky policy problem or something?
          Hide
          mikemccand Michael McCandless added a comment -

          I agree, ORDDEM looks overall best. I'm going to try to tune it a bit...

          Show
          mikemccand Michael McCandless added a comment - I agree, ORDDEM looks overall best. I'm going to try to tune it a bit...
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          Disregarding any missing gains with those simple policies, the rest of those numbers actually look pretty good! Still some problems here and there (large queue size still sticky), but overall some solid gains as well.

          orddem seems to be best in most cases currently - maybe we can tweak that a little more somehow. Where its not better, or not much worse, is with a single segment. That result is interesting, because both policies beat it nicely, and its because they simpely use straight ord on the first segment. But ordsubord seems to outperform the policies. That doesn't make sense. Its largely the same, but should be a tad slower if anything. Other results match up so nicely, it seems like it might not be noise, in which case, weird.

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited Disregarding any missing gains with those simple policies, the rest of those numbers actually look pretty good! Still some problems here and there (large queue size still sticky), but overall some solid gains as well. orddem seems to be best in most cases currently - maybe we can tweak that a little more somehow. Where its not better, or not much worse, is with a single segment. That result is interesting, because both policies beat it nicely, and its because they simpely use straight ord on the first segment. But ordsubord seems to outperform the policies. That doesn't make sense. Its largely the same, but should be a tad slower if anything. Other results match up so nicely, it seems like it might not be noise, in which case, weird.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Ha, how disappointing. I suppose not entirely unexpected though. I wouldn't expect one segment as straight ord would be much of a gain, but I am surprised it doesn't at least edge out the transition cost. Policy 2 is even worse - likely the arbitrary switch to on demand is not very useful - on the large queues it appears better to just orddem the whole thing. Perhaps it should switch sooner, or use a better heuristic. You gave some good ideas for that above, but I don't think thats a quick get.

          Need to try and re micro bench little experiments to get an idea of what to do I think...

          I hope something smarter can eek out more gains than that.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Ha, how disappointing. I suppose not entirely unexpected though. I wouldn't expect one segment as straight ord would be much of a gain, but I am surprised it doesn't at least edge out the transition cost. Policy 2 is even worse - likely the arbitrary switch to on demand is not very useful - on the large queues it appears better to just orddem the whole thing. Perhaps it should switch sooner, or use a better heuristic. You gave some good ideas for that above, but I don't think thats a quick get. Need to try and re micro bench little experiments to get an idea of what to do I think... I hope something smarter can eek out more gains than that.
          Hide
          mikemccand Michael McCandless added a comment -

          OK here are the current results:

          Queue size 10:

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 10 2000000 0.6 10.7 0.7 12.4 15.9%
          1 simple country ordsub text 10 2000000 0.6 10.7 0.6 14.7 37.4%
          1 simple country ordval text 10 2000000 0.6 10.7 0.6 14.5 35.5%
          1 simple country orddem text 10 2000000 0.6 10.7 0.6 14.6 36.4%
          1 simple country policy text 10 2000000 0.6 10.7 0.6 14.5 35.5%
          1 simple country policy2 text 10 2000000 0.6 10.7 0.6 14.6 36.4%
          1 wiki title val 147 10 4984 2.1 3743.8 2.3 2557.4 -31.7%
          1 wiki title ordsub 147 10 4984 2.1 3743.8 2.0 4561.8 21.8%
          1 wiki title ordval 147 10 4984 2.1 3743.8 2.0 4551.4 21.6%
          1 wiki title orddem 147 10 4984 2.1 3743.8 2.0 4569.4 22.1%
          1 wiki title policy 147 10 4984 2.1 3743.8 2.0 4551.8 21.6%
          1 wiki title policy2 147 10 4984 2.1 3743.8 2.1 4526.6 20.9%
          1 wiki title val text 10 97191 2.1 144.2 2.4 38.9 -73.0%
          1 wiki title ordsub text 10 97191 2.1 144.2 2.1 166.3 15.3%
          1 wiki title ordval text 10 97191 2.1 144.2 2.1 166.1 15.2%
          1 wiki title orddem text 10 97191 2.1 144.2 2.1 168.3 16.7%
          1 wiki title policy text 10 97191 2.1 144.2 2.1 165.3 14.6%
          1 wiki title policy2 text 10 97191 2.1 144.2 2.1 165.1 14.5%
          1 wiki title val 1 10 386435 2.1 51.2 2.4 13.6 -73.4%
          1 wiki title ordsub 1 10 386435 2.1 51.2 2.1 68.8 34.4%
          1 wiki title ordval 1 10 386435 2.1 51.2 2.1 68.9 34.6%
          1 wiki title orddem 1 10 386435 2.1 51.2 2.1 69.3 35.4%
          1 wiki title policy 1 10 386435 2.1 51.2 2.1 66.5 29.9%
          1 wiki title policy2 1 10 386435 2.1 51.2 2.1 66.3 29.5%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 10 2000000 0.7 10.4 0.5 12.2 17.3%
          10 simple country ordsub text 10 2000000 0.7 10.4 0.5 14.6 40.4%
          10 simple country ordval text 10 2000000 0.7 10.4 0.5 14.7 41.3%
          10 simple country orddem text 10 2000000 0.7 10.4 0.5 14.7 41.3%
          10 simple country policy text 10 2000000 0.7 10.4 0.5 14.0 34.6%
          10 simple country policy2 text 10 2000000 0.7 10.4 0.5 14.0 34.6%
          10 wiki title val 147 10 4984 12.5 3004.5 2.6 1735.7 -42.2%
          10 wiki title ordsub 147 10 4984 12.5 3004.5 2.1 3077.2 2.4%
          10 wiki title ordval 147 10 4984 12.5 3004.5 2.1 3415.4 13.7%
          10 wiki title orddem 147 10 4984 12.5 3004.5 2.1 3537.0 17.7%
          10 wiki title policy 147 10 4984 12.5 3004.5 2.1 3463.8 15.3%
          10 wiki title policy2 147 10 4984 12.5 3004.5 2.1 3356.7 11.7%
          10 wiki title val text 10 97191 12.7 139.4 2.5 38.5 -72.4%
          10 wiki title ordsub text 10 97191 12.7 139.4 2.1 164.0 17.6%
          10 wiki title ordval text 10 97191 12.7 139.4 2.1 164.8 18.2%
          10 wiki title orddem text 10 97191 12.7 139.4 2.1 164.7 18.1%
          10 wiki title policy text 10 97191 12.7 139.4 2.1 163.1 17.0%
          10 wiki title policy2 text 10 97191 12.7 139.4 2.1 162.9 16.9%
          10 wiki title val 1 10 386435 12.7 50.3 2.5 15.2 -69.8%
          10 wiki title ordsub 1 10 386435 12.7 50.3 2.1 67.4 34.0%
          10 wiki title ordval 1 10 386435 12.7 50.3 2.1 68.4 36.0%
          10 wiki title orddem 1 10 386435 12.7 50.3 2.1 68.4 36.0%
          10 wiki title policy 1 10 386435 12.7 50.3 2.1 65.7 30.6%
          10 wiki title policy2 1 10 386435 12.7 50.3 2.1 66.2 31.6%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 10 2000000 1.0 8.8 4.0 9.5 8.0%
          100 simple country ordsub text 10 2000000 1.0 8.8 0.6 11.5 30.7%
          100 simple country ordval text 10 2000000 1.0 8.8 0.6 11.7 33.0%
          100 simple country orddem text 10 2000000 1.0 8.8 0.6 11.8 34.1%
          100 simple country policy text 10 2000000 1.0 8.8 0.6 11.4 29.5%
          100 simple country policy2 text 10 2000000 1.0 8.8 0.6 11.3 28.4%
          100 wiki title val 147 10 4984 94.6 1066.9 4.6 450.6 -57.8%
          100 wiki title ordsub 147 10 4984 94.6 1066.9 2.1 480.8 -54.9%
          100 wiki title ordval 147 10 4984 94.6 1066.9 2.1 624.2 -41.5%
          100 wiki title orddem 147 10 4984 94.6 1066.9 2.1 703.3 -34.1%
          100 wiki title policy 147 10 4984 94.6 1066.9 2.1 724.1 -32.1%
          100 wiki title policy2 147 10 4984 94.6 1066.9 2.1 638.5 -40.2%
          100 wiki title val text 10 97191 94.9 110.2 3.2 37.9 -65.6%
          100 wiki title ordsub text 10 97191 94.9 110.2 2.1 118.1 7.2%
          100 wiki title ordval text 10 97191 94.9 110.2 2.2 121.6 10.3%
          100 wiki title orddem text 10 97191 94.9 110.2 2.1 123.4 12.0%
          100 wiki title policy text 10 97191 94.9 110.2 2.1 123.1 11.7%
          100 wiki title policy2 text 10 97191 94.9 110.2 2.1 121.9 10.6%
          100 wiki title val 1 10 386435 94.3 47.9 3.2 14.7 -69.3%
          100 wiki title ordsub 1 10 386435 94.3 47.9 2.1 61.8 29.0%
          100 wiki title ordval 1 10 386435 94.3 47.9 2.1 62.5 30.5%
          100 wiki title orddem 1 10 386435 94.3 47.9 2.1 63.3 32.2%
          100 wiki title policy 1 10 386435 94.3 47.9 2.1 61.8 29.0%
          100 wiki title policy2 1 10 386435 94.3 47.9 2.1 61.2 27.8%

          Queue size 1000:

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 1000 2000000 0.9 10.0 0.7 12.0 20.0%
          1 simple country ordsub text 1000 2000000 0.9 10.0 0.6 14.1 41.0%
          1 simple country ordval text 1000 2000000 0.9 10.0 0.6 14.0 40.0%
          1 simple country orddem text 1000 2000000 0.9 10.0 0.6 13.8 38.0%
          1 simple country policy text 1000 2000000 0.9 10.0 0.7 13.8 38.0%
          1 simple country policy2 text 1000 2000000 0.9 10.0 0.6 13.7 37.0%
          1 wiki title val 147 1000 4984 2.1 740.6 2.3 385.0 -48.0%
          1 wiki title ordsub 147 1000 4984 2.1 740.6 2.1 970.0 31.0%
          1 wiki title ordval 147 1000 4984 2.1 740.6 2.1 938.0 26.7%
          1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 839.3 13.3%
          1 wiki title policy 147 1000 4984 2.1 740.6 2.1 938.4 26.7%
          1 wiki title policy2 147 1000 4984 2.1 740.6 2.1 924.4 24.8%
          1 wiki title val text 1000 97191 2.1 108.7 2.4 33.1 -69.5%
          1 wiki title ordsub text 1000 97191 2.1 108.7 2.1 127.8 17.6%
          1 wiki title ordval text 1000 97191 2.1 108.7 2.1 125.9 15.8%
          1 wiki title orddem text 1000 97191 2.1 108.7 2.1 122.8 13.0%
          1 wiki title policy text 1000 97191 2.1 108.7 2.1 125.7 15.6%
          1 wiki title policy2 text 1000 97191 2.1 108.7 2.1 125.4 15.4%
          1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.8 -72.3%
          1 wiki title ordsub 1 1000 386435 2.1 46.2 2.1 62.0 34.2%
          1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 61.3 32.7%
          1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 60.6 31.2%
          1 wiki title policy 1 1000 386435 2.1 46.2 2.1 60.0 29.9%
          1 wiki title policy2 1 1000 386435 2.1 46.2 2.1 60.4 30.7%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 1000 2000000 0.8 9.7 0.5 11.9 22.7%
          10 simple country ordsub text 1000 2000000 0.8 9.7 0.5 13.8 42.3%
          10 simple country ordval text 1000 2000000 0.8 9.7 0.6 13.7 41.2%
          10 simple country orddem text 1000 2000000 0.8 9.7 0.6 13.7 41.2%
          10 simple country policy text 1000 2000000 0.8 9.7 0.6 13.5 39.2%
          10 simple country policy2 text 1000 2000000 0.8 9.7 0.6 13.5 39.2%
          10 wiki title val 147 1000 4984 12.7 664.2 2.3 411.8 -38.0%
          10 wiki title ordsub 147 1000 4984 12.7 664.2 2.2 58.3 -91.2%
          10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 71.1 -89.3%
          10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 89.9 -86.5%
          10 wiki title policy 147 1000 4984 12.7 664.2 2.1 89.8 -86.5%
          10 wiki title policy2 147 1000 4984 12.7 664.2 2.1 71.2 -89.3%
          10 wiki title val text 1000 97191 12.6 100.4 2.4 33.4 -66.7%
          10 wiki title ordsub text 1000 97191 12.6 100.4 2.3 68.4 -31.9%
          10 wiki title ordval text 1000 97191 12.6 100.4 2.2 80.6 -19.7%
          10 wiki title orddem text 1000 97191 12.6 100.4 2.2 84.6 -15.7%
          10 wiki title policy text 1000 97191 12.6 100.4 2.2 84.6 -15.7%
          10 wiki title policy2 text 1000 97191 12.6 100.4 2.2 79.7 -20.6%
          10 wiki title val 1 1000 386435 12.7 42.4 2.4 14.3 -66.3%
          10 wiki title ordsub 1 1000 386435 12.7 42.4 2.7 47.8 12.7%
          10 wiki title ordval 1 1000 386435 12.7 42.4 2.2 52.9 24.8%
          10 wiki title orddem 1 1000 386435 12.7 42.4 2.2 54.3 28.1%
          10 wiki title policy 1 1000 386435 12.7 42.4 2.2 53.0 25.0%
          10 wiki title policy2 1 1000 386435 12.7 42.4 2.2 51.9 22.4%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 1000 2000000 1.0 8.5 0.6 9.4 10.6%
          100 simple country ordsub text 1000 2000000 1.0 8.5 0.6 10.5 23.5%
          100 simple country ordval text 1000 2000000 1.0 8.5 0.6 10.3 21.2%
          100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.6 24.7%
          100 simple country policy text 1000 2000000 1.0 8.5 0.7 10.3 21.2%
          100 simple country policy2 text 1000 2000000 1.0 8.5 0.7 10.0 17.6%
          100 wiki title val 147 1000 4984 93.8 442.8 2.2 243.4 -45.0%
          100 wiki title ordsub 147 1000 4984 93.8 442.8 2.8 12.5 -97.2%
          100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 19.0 -95.7%
          100 wiki title orddem 147 1000 4984 93.8 442.8 2.2 58.4 -86.8%
          100 wiki title policy 147 1000 4984 93.8 442.8 2.1 58.6 -86.8%
          100 wiki title policy2 147 1000 4984 93.8 442.8 2.2 18.7 -95.8%
          100 wiki title val text 1000 97191 93.4 88.0 2.3 32.1 -63.5%
          100 wiki title ordsub text 1000 97191 93.4 88.0 2.4 15.9 -81.9%
          100 wiki title ordval text 1000 97191 93.4 88.0 2.2 27.4 -68.9%
          100 wiki title orddem text 1000 97191 93.4 88.0 2.2 46.6 -47.0%
          100 wiki title policy text 1000 97191 93.4 88.0 2.2 46.8 -46.8%
          100 wiki title policy2 text 1000 97191 93.4 88.0 2.2 27.1 -69.2%
          100 wiki title val 1 1000 386435 92.8 41.0 2.3 13.2 -67.8%
          100 wiki title ordsub 1 1000 386435 92.8 41.0 2.8 15.9 -61.2%
          100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 26.6 -35.1%
          100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 37.3 -9.0%
          100 wiki title policy 1 1000 386435 92.8 41.0 2.2 37.3 -9.0%
          100 wiki title policy2 1 1000 386435 92.8 41.0 2.2 25.9 -36.8%
          Show
          mikemccand Michael McCandless added a comment - OK here are the current results: Queue size 10: numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 10 2000000 0.6 10.7 0.7 12.4 15.9% 1 simple country ordsub text 10 2000000 0.6 10.7 0.6 14.7 37.4% 1 simple country ordval text 10 2000000 0.6 10.7 0.6 14.5 35.5% 1 simple country orddem text 10 2000000 0.6 10.7 0.6 14.6 36.4% 1 simple country policy text 10 2000000 0.6 10.7 0.6 14.5 35.5% 1 simple country policy2 text 10 2000000 0.6 10.7 0.6 14.6 36.4% 1 wiki title val 147 10 4984 2.1 3743.8 2.3 2557.4 -31.7% 1 wiki title ordsub 147 10 4984 2.1 3743.8 2.0 4561.8 21.8% 1 wiki title ordval 147 10 4984 2.1 3743.8 2.0 4551.4 21.6% 1 wiki title orddem 147 10 4984 2.1 3743.8 2.0 4569.4 22.1% 1 wiki title policy 147 10 4984 2.1 3743.8 2.0 4551.8 21.6% 1 wiki title policy2 147 10 4984 2.1 3743.8 2.1 4526.6 20.9% 1 wiki title val text 10 97191 2.1 144.2 2.4 38.9 -73.0% 1 wiki title ordsub text 10 97191 2.1 144.2 2.1 166.3 15.3% 1 wiki title ordval text 10 97191 2.1 144.2 2.1 166.1 15.2% 1 wiki title orddem text 10 97191 2.1 144.2 2.1 168.3 16.7% 1 wiki title policy text 10 97191 2.1 144.2 2.1 165.3 14.6% 1 wiki title policy2 text 10 97191 2.1 144.2 2.1 165.1 14.5% 1 wiki title val 1 10 386435 2.1 51.2 2.4 13.6 -73.4% 1 wiki title ordsub 1 10 386435 2.1 51.2 2.1 68.8 34.4% 1 wiki title ordval 1 10 386435 2.1 51.2 2.1 68.9 34.6% 1 wiki title orddem 1 10 386435 2.1 51.2 2.1 69.3 35.4% 1 wiki title policy 1 10 386435 2.1 51.2 2.1 66.5 29.9% 1 wiki title policy2 1 10 386435 2.1 51.2 2.1 66.3 29.5% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 10 2000000 0.7 10.4 0.5 12.2 17.3% 10 simple country ordsub text 10 2000000 0.7 10.4 0.5 14.6 40.4% 10 simple country ordval text 10 2000000 0.7 10.4 0.5 14.7 41.3% 10 simple country orddem text 10 2000000 0.7 10.4 0.5 14.7 41.3% 10 simple country policy text 10 2000000 0.7 10.4 0.5 14.0 34.6% 10 simple country policy2 text 10 2000000 0.7 10.4 0.5 14.0 34.6% 10 wiki title val 147 10 4984 12.5 3004.5 2.6 1735.7 -42.2% 10 wiki title ordsub 147 10 4984 12.5 3004.5 2.1 3077.2 2.4% 10 wiki title ordval 147 10 4984 12.5 3004.5 2.1 3415.4 13.7% 10 wiki title orddem 147 10 4984 12.5 3004.5 2.1 3537.0 17.7% 10 wiki title policy 147 10 4984 12.5 3004.5 2.1 3463.8 15.3% 10 wiki title policy2 147 10 4984 12.5 3004.5 2.1 3356.7 11.7% 10 wiki title val text 10 97191 12.7 139.4 2.5 38.5 -72.4% 10 wiki title ordsub text 10 97191 12.7 139.4 2.1 164.0 17.6% 10 wiki title ordval text 10 97191 12.7 139.4 2.1 164.8 18.2% 10 wiki title orddem text 10 97191 12.7 139.4 2.1 164.7 18.1% 10 wiki title policy text 10 97191 12.7 139.4 2.1 163.1 17.0% 10 wiki title policy2 text 10 97191 12.7 139.4 2.1 162.9 16.9% 10 wiki title val 1 10 386435 12.7 50.3 2.5 15.2 -69.8% 10 wiki title ordsub 1 10 386435 12.7 50.3 2.1 67.4 34.0% 10 wiki title ordval 1 10 386435 12.7 50.3 2.1 68.4 36.0% 10 wiki title orddem 1 10 386435 12.7 50.3 2.1 68.4 36.0% 10 wiki title policy 1 10 386435 12.7 50.3 2.1 65.7 30.6% 10 wiki title policy2 1 10 386435 12.7 50.3 2.1 66.2 31.6% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 10 2000000 1.0 8.8 4.0 9.5 8.0% 100 simple country ordsub text 10 2000000 1.0 8.8 0.6 11.5 30.7% 100 simple country ordval text 10 2000000 1.0 8.8 0.6 11.7 33.0% 100 simple country orddem text 10 2000000 1.0 8.8 0.6 11.8 34.1% 100 simple country policy text 10 2000000 1.0 8.8 0.6 11.4 29.5% 100 simple country policy2 text 10 2000000 1.0 8.8 0.6 11.3 28.4% 100 wiki title val 147 10 4984 94.6 1066.9 4.6 450.6 -57.8% 100 wiki title ordsub 147 10 4984 94.6 1066.9 2.1 480.8 -54.9% 100 wiki title ordval 147 10 4984 94.6 1066.9 2.1 624.2 -41.5% 100 wiki title orddem 147 10 4984 94.6 1066.9 2.1 703.3 -34.1% 100 wiki title policy 147 10 4984 94.6 1066.9 2.1 724.1 -32.1% 100 wiki title policy2 147 10 4984 94.6 1066.9 2.1 638.5 -40.2% 100 wiki title val text 10 97191 94.9 110.2 3.2 37.9 -65.6% 100 wiki title ordsub text 10 97191 94.9 110.2 2.1 118.1 7.2% 100 wiki title ordval text 10 97191 94.9 110.2 2.2 121.6 10.3% 100 wiki title orddem text 10 97191 94.9 110.2 2.1 123.4 12.0% 100 wiki title policy text 10 97191 94.9 110.2 2.1 123.1 11.7% 100 wiki title policy2 text 10 97191 94.9 110.2 2.1 121.9 10.6% 100 wiki title val 1 10 386435 94.3 47.9 3.2 14.7 -69.3% 100 wiki title ordsub 1 10 386435 94.3 47.9 2.1 61.8 29.0% 100 wiki title ordval 1 10 386435 94.3 47.9 2.1 62.5 30.5% 100 wiki title orddem 1 10 386435 94.3 47.9 2.1 63.3 32.2% 100 wiki title policy 1 10 386435 94.3 47.9 2.1 61.8 29.0% 100 wiki title policy2 1 10 386435 94.3 47.9 2.1 61.2 27.8% Queue size 1000: numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 1000 2000000 0.9 10.0 0.7 12.0 20.0% 1 simple country ordsub text 1000 2000000 0.9 10.0 0.6 14.1 41.0% 1 simple country ordval text 1000 2000000 0.9 10.0 0.6 14.0 40.0% 1 simple country orddem text 1000 2000000 0.9 10.0 0.6 13.8 38.0% 1 simple country policy text 1000 2000000 0.9 10.0 0.7 13.8 38.0% 1 simple country policy2 text 1000 2000000 0.9 10.0 0.6 13.7 37.0% 1 wiki title val 147 1000 4984 2.1 740.6 2.3 385.0 -48.0% 1 wiki title ordsub 147 1000 4984 2.1 740.6 2.1 970.0 31.0% 1 wiki title ordval 147 1000 4984 2.1 740.6 2.1 938.0 26.7% 1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 839.3 13.3% 1 wiki title policy 147 1000 4984 2.1 740.6 2.1 938.4 26.7% 1 wiki title policy2 147 1000 4984 2.1 740.6 2.1 924.4 24.8% 1 wiki title val text 1000 97191 2.1 108.7 2.4 33.1 -69.5% 1 wiki title ordsub text 1000 97191 2.1 108.7 2.1 127.8 17.6% 1 wiki title ordval text 1000 97191 2.1 108.7 2.1 125.9 15.8% 1 wiki title orddem text 1000 97191 2.1 108.7 2.1 122.8 13.0% 1 wiki title policy text 1000 97191 2.1 108.7 2.1 125.7 15.6% 1 wiki title policy2 text 1000 97191 2.1 108.7 2.1 125.4 15.4% 1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.8 -72.3% 1 wiki title ordsub 1 1000 386435 2.1 46.2 2.1 62.0 34.2% 1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 61.3 32.7% 1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 60.6 31.2% 1 wiki title policy 1 1000 386435 2.1 46.2 2.1 60.0 29.9% 1 wiki title policy2 1 1000 386435 2.1 46.2 2.1 60.4 30.7% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 1000 2000000 0.8 9.7 0.5 11.9 22.7% 10 simple country ordsub text 1000 2000000 0.8 9.7 0.5 13.8 42.3% 10 simple country ordval text 1000 2000000 0.8 9.7 0.6 13.7 41.2% 10 simple country orddem text 1000 2000000 0.8 9.7 0.6 13.7 41.2% 10 simple country policy text 1000 2000000 0.8 9.7 0.6 13.5 39.2% 10 simple country policy2 text 1000 2000000 0.8 9.7 0.6 13.5 39.2% 10 wiki title val 147 1000 4984 12.7 664.2 2.3 411.8 -38.0% 10 wiki title ordsub 147 1000 4984 12.7 664.2 2.2 58.3 -91.2% 10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 71.1 -89.3% 10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 89.9 -86.5% 10 wiki title policy 147 1000 4984 12.7 664.2 2.1 89.8 -86.5% 10 wiki title policy2 147 1000 4984 12.7 664.2 2.1 71.2 -89.3% 10 wiki title val text 1000 97191 12.6 100.4 2.4 33.4 -66.7% 10 wiki title ordsub text 1000 97191 12.6 100.4 2.3 68.4 -31.9% 10 wiki title ordval text 1000 97191 12.6 100.4 2.2 80.6 -19.7% 10 wiki title orddem text 1000 97191 12.6 100.4 2.2 84.6 -15.7% 10 wiki title policy text 1000 97191 12.6 100.4 2.2 84.6 -15.7% 10 wiki title policy2 text 1000 97191 12.6 100.4 2.2 79.7 -20.6% 10 wiki title val 1 1000 386435 12.7 42.4 2.4 14.3 -66.3% 10 wiki title ordsub 1 1000 386435 12.7 42.4 2.7 47.8 12.7% 10 wiki title ordval 1 1000 386435 12.7 42.4 2.2 52.9 24.8% 10 wiki title orddem 1 1000 386435 12.7 42.4 2.2 54.3 28.1% 10 wiki title policy 1 1000 386435 12.7 42.4 2.2 53.0 25.0% 10 wiki title policy2 1 1000 386435 12.7 42.4 2.2 51.9 22.4% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 1000 2000000 1.0 8.5 0.6 9.4 10.6% 100 simple country ordsub text 1000 2000000 1.0 8.5 0.6 10.5 23.5% 100 simple country ordval text 1000 2000000 1.0 8.5 0.6 10.3 21.2% 100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.6 24.7% 100 simple country policy text 1000 2000000 1.0 8.5 0.7 10.3 21.2% 100 simple country policy2 text 1000 2000000 1.0 8.5 0.7 10.0 17.6% 100 wiki title val 147 1000 4984 93.8 442.8 2.2 243.4 -45.0% 100 wiki title ordsub 147 1000 4984 93.8 442.8 2.8 12.5 -97.2% 100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 19.0 -95.7% 100 wiki title orddem 147 1000 4984 93.8 442.8 2.2 58.4 -86.8% 100 wiki title policy 147 1000 4984 93.8 442.8 2.1 58.6 -86.8% 100 wiki title policy2 147 1000 4984 93.8 442.8 2.2 18.7 -95.8% 100 wiki title val text 1000 97191 93.4 88.0 2.3 32.1 -63.5% 100 wiki title ordsub text 1000 97191 93.4 88.0 2.4 15.9 -81.9% 100 wiki title ordval text 1000 97191 93.4 88.0 2.2 27.4 -68.9% 100 wiki title orddem text 1000 97191 93.4 88.0 2.2 46.6 -47.0% 100 wiki title policy text 1000 97191 93.4 88.0 2.2 46.8 -46.8% 100 wiki title policy2 text 1000 97191 93.4 88.0 2.2 27.1 -69.2% 100 wiki title val 1 1000 386435 92.8 41.0 2.3 13.2 -67.8% 100 wiki title ordsub 1 1000 386435 92.8 41.0 2.8 15.9 -61.2% 100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 26.6 -35.1% 100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 37.3 -9.0% 100 wiki title policy 1 1000 386435 92.8 41.0 2.2 37.3 -9.0% 100 wiki title policy2 1 1000 386435 92.8 41.0 2.2 25.9 -36.8%
          Hide
          mikemccand Michael McCandless added a comment -

          What was causing the two policies to miss, even though the individual comparators passed?

          I think (not sure!) it was not correctly setting bottom* in one of the transition cases. There were also a couple bugs in some of the setBottoms.

          Show
          mikemccand Michael McCandless added a comment - What was causing the two policies to miss, even though the individual comparators passed? I think (not sure!) it was not correctly setting bottom* in one of the transition cases. There were also a couple bugs in some of the setBottoms.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          What was causing the two policies to miss, even though the individual comparators passed? Was it more set bottom stuff messing up the transition between comparators? I couldn't see a transition problem last week. I see some of you other tweaks, but in terms of comparator transistion, I only see the readerGeneration rename there offhand (nice change). I actually think the generation can start with 0 rather than 1, as it normally starts at -1.

          I see that bottom optimization made the ordsubord bottomCompare more complicated as well - I'm surprised I saw everything passing with just a negative 1 rather than 1.

          Very much looking forward to the benchmark results! Wish I had more time to work on this, thanks for letting me in on so much of it.

          • Mark
          Show
          markrmiller@gmail.com Mark Miller added a comment - What was causing the two policies to miss, even though the individual comparators passed? Was it more set bottom stuff messing up the transition between comparators? I couldn't see a transition problem last week. I see some of you other tweaks, but in terms of comparator transistion, I only see the readerGeneration rename there offhand (nice change). I actually think the generation can start with 0 rather than 1, as it normally starts at -1. I see that bottom optimization made the ordsubord bottomCompare more complicated as well - I'm surprised I saw everything passing with just a negative 1 rather than 1. Very much looking forward to the benchmark results! Wish I had more time to work on this, thanks for letting me in on so much of it. Mark
          Hide
          mikemccand Michael McCandless added a comment -

          OK new patch attached. TestStressSort now passes – there were a
          number of silly small bugs I had to fix. Next up I'll run some perf
          tests...

          Show
          mikemccand Michael McCandless added a comment - OK new patch attached. TestStressSort now passes – there were a number of silly small bugs I had to fix. Next up I'll run some perf tests...
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Still coming. Heavily side tracked for a bit with other stuff unfortunately, but I did fix the ORD_SUBORD issue last night (had to return -1 for the subord comparison rather than 1, something you had caught and left a 'is this wrong?' comment about).

          The two policy problems are trickier, and must have something to do with the switch from one comparator to another. I see you straightened out some of the setbottom stuff that had to go on, but I can't find what else is off yet. Like I said, busy time for me for another week or two, but I'll find some time this weekend I hope.

          I still find it odd that it always the last hit thats off (at least from a couple quick tests) - somehow theres a clue there...

          • Mark
          Show
          markrmiller@gmail.com Mark Miller added a comment - Still coming. Heavily side tracked for a bit with other stuff unfortunately, but I did fix the ORD_SUBORD issue last night (had to return -1 for the subord comparison rather than 1, something you had caught and left a 'is this wrong?' comment about). The two policy problems are trickier, and must have something to do with the switch from one comparator to another. I see you straightened out some of the setbottom stuff that had to go on, but I can't find what else is off yet. Like I said, busy time for me for another week or two, but I'll find some time this weekend I hope. I still find it odd that it always the last hit thats off (at least from a couple quick tests) - somehow theres a clue there... Mark
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          Its the ORDSUBORD again (which I don't think we will use) and the two Policies. Odd because its the last hit of 10 that fails for all 3. I'll ferret it out tonight.

          • Mark

          EDIT

          yup...always the last entry thats wrong no matter the queue size - for all 3, which is odd because ORD_SUBORD doesnt have too much of a relationship to the two policies. Will be a fun one.

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited Its the ORDSUBORD again (which I don't think we will use) and the two Policies. Odd because its the last hit of 10 that fails for all 3. I'll ferret it out tonight. Mark EDIT yup...always the last entry thats wrong no matter the queue size - for all 3, which is odd because ORD_SUBORD doesnt have too much of a relationship to the two policies. Will be a fun one.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          It runs legacy vs new sort and asserts that they are the same.

          Clever. Very good idea.

          I'll fix it up. Also, if you have any ideas about what Policies you want to start with, I'd be happy to push those around a bit too.

          Show
          markrmiller@gmail.com Mark Miller added a comment - It runs legacy vs new sort and asserts that they are the same. Clever. Very good idea. I'll fix it up. Also, if you have any ideas about what Policies you want to start with, I'd be happy to push those around a bit too.
          Hide
          mikemccand Michael McCandless added a comment -

          Attached full patch (though you'll get failed hunks because of the
          annoying $Id$ expansion problem).

          I fixed various small issues, and added a new TestStressSort test. It
          runs legacy vs new sort and asserts that they are the same.

          It is currently failing... but I haven't spent any time digging into
          why.

          Mark could you dig and try to figure out why it's failing? I think we
          should resolve it before running (or, trusting) perf tests.

          Also: I wonder if we can remove the null checking in the compare
          methods for String*Comparator? EG maybe we need a new
          FieldCache.getString

          {s,Index}

          methods that optionally take a
          "fillNulls" param, and if true nulls are replaced with empty string?
          However... that would unfortunately cause a difference whereby ""
          would be equal to null (whereas now null sorts ahead of ""), which is
          not back compatible. I guess we could make a "non-null" comparator
          and use it whenever it's known there are no nulls in the FieldCache
          array. It may not be worth the hassle. If the value is never null,
          cpu will guess the right branch path every time, so penalty should
          be small (yet non-zero!).

          Show
          mikemccand Michael McCandless added a comment - Attached full patch (though you'll get failed hunks because of the annoying $Id$ expansion problem). I fixed various small issues, and added a new TestStressSort test. It runs legacy vs new sort and asserts that they are the same. It is currently failing... but I haven't spent any time digging into why. Mark could you dig and try to figure out why it's failing? I think we should resolve it before running (or, trusting) perf tests. Also: I wonder if we can remove the null checking in the compare methods for String*Comparator? EG maybe we need a new FieldCache.getString {s,Index} methods that optionally take a "fillNulls" param, and if true nulls are replaced with empty string? However... that would unfortunately cause a difference whereby "" would be equal to null (whereas now null sorts ahead of ""), which is not back compatible. I guess we could make a "non-null" comparator and use it whenever it's known there are no nulls in the FieldCache array. It may not be worth the hassle. If the value is never null, cpu will guess the right branch path every time, so penalty should be small (yet non-zero!).
          Hide
          mikemccand Michael McCandless added a comment -

          > Can't seem to use the partial patch

          Woops, sorry – did I miss a file I had changed? I just quickly pulled together the files I thought I had changed. I really want the "svn patch" command....

          > In thinking about this, we are going to drop those other sort types though right?

          True, I think. Or, we may keep some of them around for "expert" usage. I still think it'd be very helpful if one could alter a static array or something in TestSort to list the STRING sort methods to test out, and TestSort runs entirely with each method; then if one is making a private method one could stick it into the test to make sure it works. Making a functionally correct STRING sort method is proving to be quite a challenge!

          OK I'll test the new patch... thanks Mark!

          Show
          mikemccand Michael McCandless added a comment - > Can't seem to use the partial patch Woops, sorry – did I miss a file I had changed? I just quickly pulled together the files I thought I had changed. I really want the "svn patch" command.... > In thinking about this, we are going to drop those other sort types though right? True, I think. Or, we may keep some of them around for "expert" usage. I still think it'd be very helpful if one could alter a static array or something in TestSort to list the STRING sort methods to test out, and TestSort runs entirely with each method; then if one is making a private method one could stick it into the test to make sure it works. Making a functionally correct STRING sort method is proving to be quite a challenge! OK I'll test the new patch... thanks Mark!
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Merged everything and put Sort.ORD back the way it was (using ORD_SUBORD).

          Added two new types, STRING_POLICY and STRING_POLICY2. Nothing good, but something to get started with I suppose.

          The first will just use straight ord and then switch to ordval on demand after the first reader.

          The second will start with straight ord, go to ordval if the next readers have over 1000 docs (or is 5000?), and then finally move to ordvaldem.

          Whats left:

          Polish (mostly javadoc at this point I think)
          A good default String policy
          Custom FieldComparator test
          Tests for each of the new String comparators (not sure how yet if we remove the new SortField types - perhaps we do leave them after all?)
          Remove the new SortField types that are being used for testing

          Show
          markrmiller@gmail.com Mark Miller added a comment - Merged everything and put Sort.ORD back the way it was (using ORD_SUBORD). Added two new types, STRING_POLICY and STRING_POLICY2. Nothing good, but something to get started with I suppose. The first will just use straight ord and then switch to ordval on demand after the first reader. The second will start with straight ord, go to ordval if the next readers have over 1000 docs (or is 5000?), and then finally move to ordvaldem. Whats left: Polish (mostly javadoc at this point I think) A good default String policy Custom FieldComparator test Tests for each of the new String comparators (not sure how yet if we remove the new SortField types - perhaps we do leave them after all?) Remove the new SortField types that are being used for testing
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods?

          Yeah, this makes sense in any case. I just keep switching them by hand as I work on them.

          In thinking about this, we are going to drop those other sort types though right? I figured we would still just have String, and the comparator policy for String would pick the right comparators rather than the sort type?

          Show
          markrmiller@gmail.com Mark Miller added a comment - I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods? Yeah, this makes sense in any case. I just keep switching them by hand as I work on them. In thinking about this, we are going to drop those other sort types though right? I figured we would still just have String, and the comparator policy for String would pick the right comparators rather than the sort type?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Can't seem to use the partial patch, but I'll try to put in by hand. Just gotta remember to make sure I don't miss anything.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Can't seem to use the partial patch, but I'll try to put in by hand. Just gotta remember to make sure I don't miss anything.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Could we just make ctors on each comparator that take the other comparator and copy over what they need? This way we can make attrs "private final" again, in case that helps the JRE optimize.

          Right, good idea.

          I'll get everything together and put up a patch.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Could we just make ctors on each comparator that take the other comparator and copy over what they need? This way we can make attrs "private final" again, in case that helps the JRE optimize. Right, good idea. I'll get everything together and put up a patch.
          Hide
          mikemccand Michael McCandless added a comment -

          > I'm trying to get local lucene and solr to play nice (SOLR-773). The hoops you have to jump through to avoid memory leaks make the final code too strange and not reusable.

          With this patch we are changing how custom sorting works.

          Previously, Lucene would iterate the terms for you, asking you to
          produce a Comparable for each one. With this patch, we are asking you
          to implement FieldComparator, which compares docs/slots directly and
          must be aware of switching sub-readers during searching.

          Ryan, can you have a look at FieldComparator to see if it "works" for
          local lucene (and any other feedback on it)?

          I think the best outcome here would be to get this issue done, and
          then get local lucene switched over to this new API (so local lucene
          sees the benefits of the new API, and sidesteps the memory leak in
          LUCENE-1304).

          We may still need to do LUCENE-1304 in case others hit the memory leak
          of the old custom sort API.

          Show
          mikemccand Michael McCandless added a comment - > I'm trying to get local lucene and solr to play nice ( SOLR-773 ). The hoops you have to jump through to avoid memory leaks make the final code too strange and not reusable. With this patch we are changing how custom sorting works. Previously, Lucene would iterate the terms for you, asking you to produce a Comparable for each one. With this patch, we are asking you to implement FieldComparator, which compares docs/slots directly and must be aware of switching sub-readers during searching. Ryan, can you have a look at FieldComparator to see if it "works" for local lucene (and any other feedback on it)? I think the best outcome here would be to get this issue done, and then get local lucene switched over to this new API (so local lucene sees the benefits of the new API, and sidesteps the memory leak in LUCENE-1304 ). We may still need to do LUCENE-1304 in case others hit the memory leak of the old custom sort API.
          Hide
          mikemccand Michael McCandless added a comment -

          > Not sure about new constructors or package private for that part of the switch...

          Could we just make ctors on each comparator that take the other comparator and copy over what they need? This way we can make attrs "private final" again, in case that helps the JRE optimize.

          Show
          mikemccand Michael McCandless added a comment - > Not sure about new constructors or package private for that part of the switch... Could we just make ctors on each comparator that take the other comparator and copy over what they need? This way we can make attrs "private final" again, in case that helps the JRE optimize.
          Hide
          mikemccand Michael McCandless added a comment -

          Attached prototype changes to switch to "setBottom" and "compareBottom" API for FieldComparator, but, I only included the few files I modified over the last patch, and it does not pass TestSort when I switch to it (fails the same tests ORD fails on).

          Mark can you switch the comparators to this new API (and remove the compare(int, int, float) method) and fix the test failures? Once that passes tests, I'll re-run perf test and we can tune the default policy. I think we are close!

          Show
          mikemccand Michael McCandless added a comment - Attached prototype changes to switch to "setBottom" and "compareBottom" API for FieldComparator, but, I only included the few files I modified over the last patch, and it does not pass TestSort when I switch to it (fails the same tests ORD fails on). Mark can you switch the comparators to this new API (and remove the compare(int, int, float) method) and fix the test failures? Once that passes tests, I'll re-run perf test and we can tune the default policy. I think we are close!
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I think we are wrapping up, but it may make sense to do 1304 anyway. That code will be deprecated, but if you use a custom comparator, it will use the deprecated code. The custom comparator will be removed in 3.0 I think, and you'd have to make a new comparator or comparator policy.

          So its probably best to do 1304 if we want it, just for the 2.9 release.

          • Mark
          Show
          markrmiller@gmail.com Mark Miller added a comment - I think we are wrapping up, but it may make sense to do 1304 anyway. That code will be deprecated, but if you use a custom comparator, it will use the deprecated code. The custom comparator will be removed in 3.0 I think, and you'd have to make a new comparator or comparator policy. So its probably best to do 1304 if we want it, just for the 2.9 release. Mark
          Hide
          ryantxu Ryan McKinley added a comment -

          Any estimates on how far along this is?

          Is it close enough that the reasonably simple patch in LUCENE-1304 should wait? Or do you think it is worth waiting for this?

          I'm trying to get local lucene and solr to play nice (SOLR-773). The hoops you have to jump through to avoid memory leaks make the final code too strange and not reusable.

          Show
          ryantxu Ryan McKinley added a comment - Any estimates on how far along this is? Is it close enough that the reasonably simple patch in LUCENE-1304 should wait? Or do you think it is worth waiting for this? I'm trying to get local lucene and solr to play nice ( SOLR-773 ). The hoops you have to jump through to avoid memory leaks make the final code too strange and not reusable.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Here is what that example policy has to be essentially. We just have to create a good way to do the right conversion I guess. I'll work on whatever you don't put up when you share your latest optimizations.

              case SortField.STRING_ORD:
                return new ComparatorPolicy(){
                  private FieldComparator comparator = new FieldComparator.StringOrdComparator(numHits, field);
                  private boolean first = true;
                  private boolean second = true;
                  public FieldComparator nextComparator(FieldComparator oldComparator,
                      IndexReader reader, int numHits, int numSlotsFull)
                      throws IOException {
                    if(first) {
                      first = false;
                      return comparator;
                    } else if(second){
                      StringOrdValOnDemComparator comp = new FieldComparator.StringOrdValOnDemComparator(numHits, field);
                      comp.values = ((FieldComparator.StringOrdComparator)comparator).values;
                      comp.ords = ((FieldComparator.StringOrdComparator)comparator).ords;
                      comp.currentReaderIndex = 1;
                      comparator = comp;
                      second = false;
                      return comp;
                    }
                    return comparator;
                  }};
          
          Show
          markrmiller@gmail.com Mark Miller added a comment - Here is what that example policy has to be essentially. We just have to create a good way to do the right conversion I guess. I'll work on whatever you don't put up when you share your latest optimizations. case SortField.STRING_ORD: return new ComparatorPolicy(){ private FieldComparator comparator = new FieldComparator.StringOrdComparator(numHits, field); private boolean first = true ; private boolean second = true ; public FieldComparator nextComparator(FieldComparator oldComparator, IndexReader reader, int numHits, int numSlotsFull) throws IOException { if (first) { first = false ; return comparator; } else if (second){ StringOrdValOnDemComparator comp = new FieldComparator.StringOrdValOnDemComparator(numHits, field); comp.values = ((FieldComparator.StringOrdComparator)comparator).values; comp.ords = ((FieldComparator.StringOrdComparator)comparator).ords; comp.currentReaderIndex = 1; comparator = comp; second = false ; return comp; } return comparator; }};
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          There are other little conversion steps that have to be considered as well I think. Like when you switch to ord dem, you won't have the ReaderIndex array filled in properly, etc. (probably an issue with that example policy in there beyond the ords copy)

          Depending on what you come from and what you go to, a couple little hoops have to be jumped.

          Show
          markrmiller@gmail.com Mark Miller added a comment - There are other little conversion steps that have to be considered as well I think. Like when you switch to ord dem, you won't have the ReaderIndex array filled in properly, etc. (probably an issue with that example policy in there beyond the ords copy) Depending on what you come from and what you go to, a couple little hoops have to be jumped.
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          Mark, I see 3 testcase failures in TestSort if I "pretend" that SortField.STRING means STRING_ORD - do you see that?

          Yeah, sorry. That STRING_ORD custom comparator policy is just a joke really, so I only really tested it on the StringSort test. It's just not initing the ords along with the values on switching. Making ords package private so that it can be changed (and changing it) fixes things. Not sure about new constructors or package private for that part of the switch...

          I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods?

          Yeah, this makes sense in any case. I just keep switching them by hand as I work on them.

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited Mark, I see 3 testcase failures in TestSort if I "pretend" that SortField.STRING means STRING_ORD - do you see that? Yeah, sorry. That STRING_ORD custom comparator policy is just a joke really, so I only really tested it on the StringSort test. It's just not initing the ords along with the values on switching. Making ords package private so that it can be changed (and changing it) fixes things. Not sure about new constructors or package private for that part of the switch... I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods? Yeah, this makes sense in any case. I just keep switching them by hand as I work on them.
          Hide
          mikemccand Michael McCandless added a comment -

          On what ComparatorPolicy to use by default... I think we should start
          with ORD, but gather counters of number of compares vs number of
          copies, and based on those counters (and comparing to numDocs())
          decide "how aggressively" to switch comparators? That determination
          should also take into account the queue size.

          An optimized index would always use ORD (w/o gathering counters),
          which is fastest.

          In the future... we could imagine allowing the query to dictate the
          order that segments are visited. EG if the query can roughly estimate
          how many hits it'll get on a given segment, we could order by that
          instead of simply numDocs().

          The query could also choose an appropriate ComparatorPolicy, eg, if it
          estimates it'll get very few hits, VAL is best right from the start,
          else start with ORD.

          Another future fix would be to implement ORDSUB with a single pass
          through the queue, using a reused secondary pqueue to do the full sort
          of the queue. This would let us assign subords much faster, I think.

          But I don't think we should pursue these optimizations as part of this
          issue... we need to bring closure here; we already have some solid
          gains to capture. I think we should wrapup now...

          Show
          mikemccand Michael McCandless added a comment - On what ComparatorPolicy to use by default... I think we should start with ORD, but gather counters of number of compares vs number of copies, and based on those counters (and comparing to numDocs()) decide "how aggressively" to switch comparators? That determination should also take into account the queue size. An optimized index would always use ORD (w/o gathering counters), which is fastest. In the future... we could imagine allowing the query to dictate the order that segments are visited. EG if the query can roughly estimate how many hits it'll get on a given segment, we could order by that instead of simply numDocs(). The query could also choose an appropriate ComparatorPolicy, eg, if it estimates it'll get very few hits, VAL is best right from the start, else start with ORD. Another future fix would be to implement ORDSUB with a single pass through the queue, using a reused secondary pqueue to do the full sort of the queue. This would let us assign subords much faster, I think. But I don't think we should pursue these optimizations as part of this issue... we need to bring closure here; we already have some solid gains to capture. I think we should wrapup now...
          Hide
          mikemccand Michael McCandless added a comment -

          I prototyped a rough change to the FieldComparator API, whereby
          TopFieldCollector calls setBottom to notify the comparator which slot
          is the bottom of the queue (whenever it changes), and then calls
          compareBottom (which replaces compare(int slot, int doc, float
          score)). This seems to offer decent perf. gains so I think we should
          make this change for real?

          I think it gives good gains because 1) compare to bottom is very
          frequent for a search that has many hits, and where the queue fairly
          quickly converges to the top N, 2) it allows the on-demand comparator
          to pre-cache the bottom's ord, and 3) it saves one array deref.

          TopFieldCollector would guarantee that compareBottom is not called
          unless setBottom was called; during the startup transient, setBottom
          is not called until the queue becomes full.

          Show
          mikemccand Michael McCandless added a comment - I prototyped a rough change to the FieldComparator API, whereby TopFieldCollector calls setBottom to notify the comparator which slot is the bottom of the queue (whenever it changes), and then calls compareBottom (which replaces compare(int slot, int doc, float score)). This seems to offer decent perf. gains so I think we should make this change for real? I think it gives good gains because 1) compare to bottom is very frequent for a search that has many hits, and where the queue fairly quickly converges to the top N, 2) it allows the on-demand comparator to pre-cache the bottom's ord, and 3) it saves one array deref. TopFieldCollector would guarantee that compareBottom is not called unless setBottom was called; during the startup transient, setBottom is not called until the queue becomes full.
          Hide
          mikemccand Michael McCandless added a comment -

          Mark, I see 3 testcase failures in TestSort if I "pretend" that SortField.STRING means STRING_ORD – do you see that?

          I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods?

          Show
          mikemccand Michael McCandless added a comment - Mark, I see 3 testcase failures in TestSort if I "pretend" that SortField.STRING means STRING_ORD – do you see that? I think we should fix TestSort so that it runs N times, each time using a different STRING sort method, to make sure we are covering all these methods?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          No real work, but some more necessary cleanup.

          Show
          markrmiller@gmail.com Mark Miller added a comment - No real work, but some more necessary cleanup.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Less ugly, but still some finishing work to do, including some intelligent ComparatorPolicies.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Less ugly, but still some finishing work to do, including some intelligent ComparatorPolicies.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I'll put up another patch soon with those changes. All look correct. I wasn't sure about maxdoc or numdoc (not that I spent much time thinking about it) because the fieldcache loads up all the deleted docs - for things like the terms array to be mapped to, that depends on maxdoc. But then how many hits are likely to end up in the queue is more related to numdocs. Switched it to numdocs.

          Still have to wrap up the custom comparator, but a comparatorpolicy changes things there. If you can set a custom comparatorpolicy (move it to sort?), than you can easily put in any custom comparators.

          • Mark
          Show
          markrmiller@gmail.com Mark Miller added a comment - I'll put up another patch soon with those changes. All look correct. I wasn't sure about maxdoc or numdoc (not that I spent much time thinking about it) because the fieldcache loads up all the deleted docs - for things like the terms array to be mapped to, that depends on maxdoc. But then how many hits are likely to end up in the queue is more related to numdocs. Switched it to numdocs. Still have to wrap up the custom comparator, but a comparatorpolicy changes things there. If you can set a custom comparatorpolicy (move it to sort?), than you can easily put in any custom comparators. Mark
          Hide
          mikemccand Michael McCandless added a comment -

          Patch looks great! I will run some perf tests once I'm back at home
          base (still on vacation now, far away from home!). Some comments:

          • The comment "use multiple queues" in IndexSearcher.java isn't
            right – it should be "use single reader"?
          • I think FieldValueHitQueue.getComparatorPolicy should receive
            subreaders? EG when there's only one sub-reader, straight ORD is
            best. And possibly its order in the sort? We may want to move
            getComparatorPolicy into SortField so that one may override for
            "expert" cases? But we could do this later.
          • We should fix sorting of sub-readers by maxDoc() to be done once,
            not per query?
          • I think we should sort sub-readers by numDocs() not maxDoc()?
          • We should fix javadocs of MultiReaderHitCollector to explain that
            sub-readers are visited not-in-order. (And other javadoc fixes,
            but this can wait for the "polishing" phase...).
          • I like the "don't fill fields" option
          Show
          mikemccand Michael McCandless added a comment - Patch looks great! I will run some perf tests once I'm back at home base (still on vacation now, far away from home!). Some comments: The comment "use multiple queues" in IndexSearcher.java isn't right – it should be "use single reader"? I think FieldValueHitQueue.getComparatorPolicy should receive subreaders? EG when there's only one sub-reader, straight ORD is best. And possibly its order in the sort? We may want to move getComparatorPolicy into SortField so that one may override for "expert" cases? But we could do this later. We should fix sorting of sub-readers by maxDoc() to be done once, not per query? I think we should sort sub-readers by numDocs() not maxDoc()? We should fix javadocs of MultiReaderHitCollector to explain that sub-readers are visited not-in-order. (And other javadoc fixes, but this can wait for the "polishing" phase...). I like the "don't fill fields" option
          Hide
          markrmiller@gmail.com Mark Miller added a comment - - edited

          So what looks like a promising strategy?

          Off the top I am thinking something as simple as:

          start with ORD with no fallback on the largest.
          if the next segments are fairly large, use ORD_VAL
          if the segments get somewhat smaller, move to ORD_DEM

          Oddly, I've seen VAL perform well in certain situations, so maybe it has its place, but I don't know where yet.

          edit

          Oh, yeah, queue size should also play a roll in the switching

          Show
          markrmiller@gmail.com Mark Miller added a comment - - edited So what looks like a promising strategy? Off the top I am thinking something as simple as: start with ORD with no fallback on the largest. if the next segments are fairly large, use ORD_VAL if the segments get somewhat smaller, move to ORD_DEM Oddly, I've seen VAL perform well in certain situations, so maybe it has its place, but I don't know where yet. edit Oh, yeah, queue size should also play a roll in the switching
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Still pretty ugly, but this patch has a working ord I think, allows readers in any order (giving largest to smallest now), and has a hackey ComparatorPolicy that can alter the comparators being used. Right now there is an ugly little one for ORD that switches to ORD_DEM after the first segment. The rest just use the same comparator throughout.

          Also added something we may take out - an option to not fill fields (which we cant use for back compat, but a user could). If you are not using a MultiSearcher, its kind of a waste of time to fill fields.

          Much to do, but fully functional I think.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Still pretty ugly, but this patch has a working ord I think, allows readers in any order (giving largest to smallest now), and has a hackey ComparatorPolicy that can alter the comparators being used. Right now there is an ugly little one for ORD that switches to ORD_DEM after the first segment. The rest just use the same comparator throughout. Also added something we may take out - an option to not fill fields (which we cant use for back compat, but a user could). If you are not using a MultiSearcher, its kind of a waste of time to fill fields. Much to do, but fully functional I think.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Cool, got the other one. When copying the doc into a slot (FieldComparator.copy), the subord had to be set to 0 so that it didnt retain a stale value. Straight ord looks good now as do the rest, so I'll try and get a custom comparator policy in there so we can benchmark a couple strategies.

          Rather than a new Comparator for a pure ord that doesnt fall to subord, perhaps the policy could return an overridden straight ord if its first and on the first index reader...

          Show
          markrmiller@gmail.com Mark Miller added a comment - Cool, got the other one. When copying the doc into a slot (FieldComparator.copy), the subord had to be set to 0 so that it didnt retain a stale value. Straight ord looks good now as do the rest, so I'll try and get a custom comparator policy in there so we can benchmark a couple strategies. Rather than a new Comparator for a pure ord that doesnt fall to subord, perhaps the policy could return an overridden straight ord if its first and on the first index reader...
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Alright, my new years gift to myself is going to be to work on this a bit.

          I've got part of the straight ord problem - if multiple queue entries clashed on conversion, and they were indeed not just clashes but clashes with the same oringal value, they were still getting an incremented subord when they should get an identical subord. So I've taken care of that, but something still fails. Looking.

          On a side note though, we probably want another straight ords comparator that does not fall back to subords. Then we can use that as the first comparator on a large segment, and equal values wont have to needlessly fallback to subords for a worthless 0 on 0 check.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Alright, my new years gift to myself is going to be to work on this a bit. I've got part of the straight ord problem - if multiple queue entries clashed on conversion, and they were indeed not just clashes but clashes with the same oringal value, they were still getting an incremented subord when they should get an identical subord. So I've taken care of that, but something still fails. Looking. On a side note though, we probably want another straight ords comparator that does not fall back to subords. Then we can use that as the first comparator on a large segment, and equal values wont have to needlessly fallback to subords for a worthless 0 on 0 check.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I was playing with this late last night and I got some of the work moving. I've got it so you can work the readers in any order (sorting by size at the moment, largest to smallest), but in doing so, straight ord no longer works. This is likely because the testing has gotten a little better, but I am not sure whats up. The other comparators work fine. Got a little of the return a new comparator type stuff in too, but nothing finished.

          Kind of got caught up on ord - when thats fixed I'll put up a patch though.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I was playing with this late last night and I got some of the work moving. I've got it so you can work the readers in any order (sorting by size at the moment, largest to smallest), but in doing so, straight ord no longer works. This is likely because the testing has gotten a little better, but I am not sure whats up. The other comparators work fine. Got a little of the return a new comparator type stuff in too, but nothing finished. Kind of got caught up on ord - when thats fixed I'll put up a patch though.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Nice Mike! Definitely what needs to be done and very cool. Pluggable policy sounds great. You should be able to ride the right turns often enough to really reduce some of those losses. Ord with no transition cost on the largest segment should be a nice win on its own. I think thats what I was missing - if you come from smallest, you have to map on the ord switch.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Nice Mike! Definitely what needs to be done and very cool. Pluggable policy sounds great. You should be able to ride the right turns often enough to really reduce some of those losses. Ord with no transition cost on the largest segment should be a nice win on its own. I think thats what I was missing - if you come from smallest, you have to map on the ord switch.
          Hide
          mikemccand Michael McCandless added a comment -

          > Only other option I see off hand is a comparator that can do both, but not as clean and probably adds a check in tightly looped code.

          Right, I wanted to avoid inner-loop check by swapping out the comparator in between segments. Though, modern CPUs are quite good when an if-statement consistently goes one way, so it could be a single comparator that does internal switching might perform fine. Still, if we fix the API to return a new comparator, we can then allow both options.

          I think in some cases we'd even fall back to VAL comparison.

          > Is largest to smallest best though?

          Good question; it's not obvious. We should try both, and perhaps allow for the collector to optionally specify the order.

          My thinking was the first large segment using ORD is "free" (because ORD is only costly on switching segments). If there are many hits, likely the queue has done most of the work it'll do (ie, the majority of the total # insertions will have been done), unless search is "degenerate". Perhaps the second segment, if large, warrants ORD, but them sometime soonish you'd switch to ORDDEM or VAL.

          The "long tail" of tiny segments would then normally be zipped through w/ hardly any insertions, so a higher insertion cost (with zero segment transition cost) is OK.

          But you're right: if we do the tiny segments first, then the queue would be small so transition cost is lower.

          We should make it simple to override a method to implement your own "search plan", and then provide a default heuristic that decides when to switch comparators. Probably that default heuristic should be based on how often compare was actually invoked for the segment. EG if the String sort is secondary to a numeric sort then even if there are many hits, if the numeric sort mostly wins (doesn't have many compare(...) == 0's) then the String sort should probably immediately switch to VAL after the first segment.

          Show
          mikemccand Michael McCandless added a comment - > Only other option I see off hand is a comparator that can do both, but not as clean and probably adds a check in tightly looped code. Right, I wanted to avoid inner-loop check by swapping out the comparator in between segments. Though, modern CPUs are quite good when an if-statement consistently goes one way, so it could be a single comparator that does internal switching might perform fine. Still, if we fix the API to return a new comparator, we can then allow both options. I think in some cases we'd even fall back to VAL comparison. > Is largest to smallest best though? Good question; it's not obvious. We should try both, and perhaps allow for the collector to optionally specify the order. My thinking was the first large segment using ORD is "free" (because ORD is only costly on switching segments). If there are many hits, likely the queue has done most of the work it'll do (ie, the majority of the total # insertions will have been done), unless search is "degenerate". Perhaps the second segment, if large, warrants ORD, but them sometime soonish you'd switch to ORDDEM or VAL. The "long tail" of tiny segments would then normally be zipped through w/ hardly any insertions, so a higher insertion cost (with zero segment transition cost) is OK. But you're right: if we do the tiny segments first, then the queue would be small so transition cost is lower. We should make it simple to override a method to implement your own "search plan", and then provide a default heuristic that decides when to switch comparators. Probably that default heuristic should be based on how often compare was actually invoked for the segment. EG if the String sort is secondary to a numeric sort then even if there are many hits, if the numeric sort mostly wins (doesn't have many compare(...) == 0's) then the String sort should probably immediately switch to VAL after the first segment.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Given how different the results are, depending on how many segments
          index has, queue size, how many hits search gets, etc., I think we
          need a dynamic solution, meaning in certain situations (many hits,
          small queue depth, small number of large segments) you use ORD but
          other times you use ORDDEM.

          Sounds interesting...

          So I'm thinking setNextReader should return a new comparator? Often
          it would simply return itself, but if it deems it worthwhile to switch
          eg from ORD to ORDDEM it would switch to ORDDEM and return that.

          I like that I think. Only other option I see off hand is a comparator that can do both, but not as clean and probably adds a check in tightly looped code.

          Then I also thought of a wild possible change: when doing searching,
          it'd be best to visit the segments from largest to smallest, doing ORD
          in the beginning and switching to ORDDEM at some point. So, could we
          do this? I think we only "require" in-order docs within a segment, so
          could we switch up segment order. We'd need to fix setNextReader API
          to take in reader & docBase. Would that work?

          I think this could work well. Since you are likely to have a few large segments, ord would be fastest, then as you moved through the many small segments, orddem would likely work best. Is largest to smallest best though? You do get to map onto smaller term[] arrays as you go, but that causes more fallback. You are also likely to be carrying more hits in the queue into the nextreader right? From smallest to largest you likely have fewer hits to map as you hit the big segments, and more room to fit in for less fallback. So the question is, what obvious piece am I missing

          Largest to smallest, you fill the queue faster earlier. So more to convert as you hit all the other segments - but I guess that will be heavily mitigated by on demand.. You will convert slot.min, and if nothing beats it, I guess thats it...so not so bad actually. And if you go smallest to largest, I guess the queue wont be full, so there will be more 'wins' into the queue, which will cause more conversions over the small segments...in which case for a ton of them and a big queue, largest to smallest seems better. Still feel like I'm missing something, but I guess I have convinced myself largest to smallest is the way to go.

          I'm probably not the first to wish there were more hours in the day...

          I'll put up a patch with the better testing soon just in case.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Given how different the results are, depending on how many segments index has, queue size, how many hits search gets, etc., I think we need a dynamic solution, meaning in certain situations (many hits, small queue depth, small number of large segments) you use ORD but other times you use ORDDEM. Sounds interesting... So I'm thinking setNextReader should return a new comparator? Often it would simply return itself, but if it deems it worthwhile to switch eg from ORD to ORDDEM it would switch to ORDDEM and return that. I like that I think. Only other option I see off hand is a comparator that can do both, but not as clean and probably adds a check in tightly looped code. Then I also thought of a wild possible change: when doing searching, it'd be best to visit the segments from largest to smallest, doing ORD in the beginning and switching to ORDDEM at some point. So, could we do this? I think we only "require" in-order docs within a segment, so could we switch up segment order. We'd need to fix setNextReader API to take in reader & docBase. Would that work? I think this could work well. Since you are likely to have a few large segments, ord would be fastest, then as you moved through the many small segments, orddem would likely work best. Is largest to smallest best though? You do get to map onto smaller term[] arrays as you go, but that causes more fallback. You are also likely to be carrying more hits in the queue into the nextreader right? From smallest to largest you likely have fewer hits to map as you hit the big segments, and more room to fit in for less fallback. So the question is, what obvious piece am I missing Largest to smallest, you fill the queue faster earlier. So more to convert as you hit all the other segments - but I guess that will be heavily mitigated by on demand.. You will convert slot.min, and if nothing beats it, I guess thats it...so not so bad actually. And if you go smallest to largest, I guess the queue wont be full, so there will be more 'wins' into the queue, which will cause more conversions over the small segments...in which case for a ton of them and a big queue, largest to smallest seems better. Still feel like I'm missing something, but I guess I have convinced myself largest to smallest is the way to go. I'm probably not the first to wish there were more hours in the day... I'll put up a patch with the better testing soon just in case.
          Hide
          mikemccand Michael McCandless added a comment -

          Given how different the results are, depending on how many segments
          index has, queue size, how many hits search gets, etc., I think we
          need a dynamic solution, meaning in certain situations (many hits,
          small queue depth, small number of large segments) you use ORD but
          other times you use ORDDEM.

          So I'm thinking setNextReader should return a new comparator? Often
          it would simply return itself, but if it deems it worthwhile to switch
          eg from ORD to ORDDEM it would switch to ORDDEM and return that.

          EG for real-time search we may have a tail of a zillion small
          segments...

          Then I also thought of a wild possible change: when doing searching,
          it'd be best to visit the segments from largest to smallest, doing ORD
          in the beginning and switching to ORDDEM at some point. So, could we
          do this? I think we only "require" in-order docs within a segment, so
          could we switch up segment order. We'd need to fix setNextReader API
          to take in reader & docBase. Would that work?

          Show
          mikemccand Michael McCandless added a comment - Given how different the results are, depending on how many segments index has, queue size, how many hits search gets, etc., I think we need a dynamic solution, meaning in certain situations (many hits, small queue depth, small number of large segments) you use ORD but other times you use ORDDEM. So I'm thinking setNextReader should return a new comparator? Often it would simply return itself, but if it deems it worthwhile to switch eg from ORD to ORDDEM it would switch to ORDDEM and return that. EG for real-time search we may have a tail of a zillion small segments... Then I also thought of a wild possible change: when doing searching, it'd be best to visit the segments from largest to smallest, doing ORD in the beginning and switching to ORDDEM at some point. So, could we do this? I think we only "require" in-order docs within a segment, so could we switch up segment order. We'd need to fix setNextReader API to take in reader & docBase. Would that work?
          Hide
          mikemccand Michael McCandless added a comment -

          Current results with topN=1000:

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 1000 2000000 0.9 10.0 0.8 11.4 14.0%
          1 simple country ord text 1000 2000000 0.9 10.0 0.6 13.7 37.0%
          1 simple country ordval text 1000 2000000 0.9 10.0 0.6 13.3 33.0%
          1 simple country orddem text 1000 2000000 0.9 10.0 0.7 13.1 31.0%
          1 wiki title val 147 1000 4984 2.1 740.6 2.3 381.8 -48.4%
          1 wiki title ord 147 1000 4984 2.1 740.6 2.1 905.2 22.2%
          1 wiki title ordval 147 1000 4984 2.1 740.6 2.1 906.4 22.4%
          1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 834.5 12.7%
          1 wiki title val text 1000 97191 2.1 108.7 2.4 32.9 -69.7%
          1 wiki title ord text 1000 97191 2.1 108.7 2.1 124.6 14.6%
          1 wiki title ordval text 1000 97191 2.1 108.7 2.1 123.7 13.8%
          1 wiki title orddem text 1000 97191 2.1 108.7 2.1 119.9 10.3%
          1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.6 -72.7%
          1 wiki title ord 1 1000 386435 2.1 46.2 2.1 60.3 30.5%
          1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 59.3 28.4%
          1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 57.9 25.3%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 1000 2000000 0.8 9.7 0.7 11.2 15.5%
          10 simple country ord text 1000 2000000 0.8 9.7 0.6 13.5 39.2%
          10 simple country ordval text 1000 2000000 0.8 9.7 0.6 13.0 34.0%
          10 simple country orddem text 1000 2000000 0.8 9.7 0.6 12.8 32.0%
          10 wiki title val 147 1000 4984 12.7 664.2 2.4 417.3 -37.2%
          10 wiki title ord 147 1000 4984 12.7 664.2 2.2 58.3 -91.2%
          10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 72.2 -89.1%
          10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 92.5 -86.1%
          10 wiki title val text 1000 97191 12.6 100.4 2.4 33.5 -66.6%
          10 wiki title ord text 1000 97191 12.6 100.4 2.3 65.3 -35.0%
          10 wiki title ordval text 1000 97191 12.6 100.4 2.2 78.7 -21.6%
          10 wiki title orddem text 1000 97191 12.6 100.4 2.2 79.8 -20.5%
          10 wiki title val 1 1000 386435 12.7 42.4 2.5 14.6 -65.6%
          10 wiki title ord 1 1000 386435 12.7 42.4 2.7 46.2 9.0%
          10 wiki title ordval 1 1000 386435 12.7 42.4 2.1 51.1 20.5%
          10 wiki title orddem 1 1000 386435 12.7 42.4 2.1 51.5 21.5%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 1000 2000000 1.0 8.5 1.3 9.2 8.2%
          100 simple country ord text 1000 2000000 1.0 8.5 0.6 10.3 21.2%
          100 simple country ordval text 1000 2000000 1.0 8.5 0.6 10.0 17.6%
          100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.6 24.7%
          100 wiki title val 147 1000 4984 93.8 442.8 2.9 245.2 -44.6%
          100 wiki title ord 147 1000 4984 93.8 442.8 2.3 12.0 -97.3%
          100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 18.0 -95.9%
          100 wiki title orddem 147 1000 4984 93.8 442.8 2.1 58.2 -86.9%
          100 wiki title val text 1000 97191 93.4 88.0 2.5 33.5 -61.9%
          100 wiki title ord text 1000 97191 93.4 88.0 2.3 17.6 -80.0%
          100 wiki title ordval text 1000 97191 93.4 88.0 2.2 29.8 -66.1%
          100 wiki title orddem text 1000 97191 93.4 88.0 2.2 56.6 -35.7%
          100 wiki title val 1 1000 386435 92.8 41.0 2.6 14.9 -63.7%
          100 wiki title ord 1 1000 386435 92.8 41.0 2.7 16.5 -59.8%
          100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 28.6 -30.2%
          100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 44.1 7.6%
          Show
          mikemccand Michael McCandless added a comment - Current results with topN=1000: numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 1000 2000000 0.9 10.0 0.8 11.4 14.0% 1 simple country ord text 1000 2000000 0.9 10.0 0.6 13.7 37.0% 1 simple country ordval text 1000 2000000 0.9 10.0 0.6 13.3 33.0% 1 simple country orddem text 1000 2000000 0.9 10.0 0.7 13.1 31.0% 1 wiki title val 147 1000 4984 2.1 740.6 2.3 381.8 -48.4% 1 wiki title ord 147 1000 4984 2.1 740.6 2.1 905.2 22.2% 1 wiki title ordval 147 1000 4984 2.1 740.6 2.1 906.4 22.4% 1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 834.5 12.7% 1 wiki title val text 1000 97191 2.1 108.7 2.4 32.9 -69.7% 1 wiki title ord text 1000 97191 2.1 108.7 2.1 124.6 14.6% 1 wiki title ordval text 1000 97191 2.1 108.7 2.1 123.7 13.8% 1 wiki title orddem text 1000 97191 2.1 108.7 2.1 119.9 10.3% 1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.6 -72.7% 1 wiki title ord 1 1000 386435 2.1 46.2 2.1 60.3 30.5% 1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 59.3 28.4% 1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 57.9 25.3% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 1000 2000000 0.8 9.7 0.7 11.2 15.5% 10 simple country ord text 1000 2000000 0.8 9.7 0.6 13.5 39.2% 10 simple country ordval text 1000 2000000 0.8 9.7 0.6 13.0 34.0% 10 simple country orddem text 1000 2000000 0.8 9.7 0.6 12.8 32.0% 10 wiki title val 147 1000 4984 12.7 664.2 2.4 417.3 -37.2% 10 wiki title ord 147 1000 4984 12.7 664.2 2.2 58.3 -91.2% 10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 72.2 -89.1% 10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 92.5 -86.1% 10 wiki title val text 1000 97191 12.6 100.4 2.4 33.5 -66.6% 10 wiki title ord text 1000 97191 12.6 100.4 2.3 65.3 -35.0% 10 wiki title ordval text 1000 97191 12.6 100.4 2.2 78.7 -21.6% 10 wiki title orddem text 1000 97191 12.6 100.4 2.2 79.8 -20.5% 10 wiki title val 1 1000 386435 12.7 42.4 2.5 14.6 -65.6% 10 wiki title ord 1 1000 386435 12.7 42.4 2.7 46.2 9.0% 10 wiki title ordval 1 1000 386435 12.7 42.4 2.1 51.1 20.5% 10 wiki title orddem 1 1000 386435 12.7 42.4 2.1 51.5 21.5% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 1000 2000000 1.0 8.5 1.3 9.2 8.2% 100 simple country ord text 1000 2000000 1.0 8.5 0.6 10.3 21.2% 100 simple country ordval text 1000 2000000 1.0 8.5 0.6 10.0 17.6% 100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.6 24.7% 100 wiki title val 147 1000 4984 93.8 442.8 2.9 245.2 -44.6% 100 wiki title ord 147 1000 4984 93.8 442.8 2.3 12.0 -97.3% 100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 18.0 -95.9% 100 wiki title orddem 147 1000 4984 93.8 442.8 2.1 58.2 -86.9% 100 wiki title val text 1000 97191 93.4 88.0 2.5 33.5 -61.9% 100 wiki title ord text 1000 97191 93.4 88.0 2.3 17.6 -80.0% 100 wiki title ordval text 1000 97191 93.4 88.0 2.2 29.8 -66.1% 100 wiki title orddem text 1000 97191 93.4 88.0 2.2 56.6 -35.7% 100 wiki title val 1 1000 386435 92.8 41.0 2.6 14.9 -63.7% 100 wiki title ord 1 1000 386435 92.8 41.0 2.7 16.5 -59.8% 100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 28.6 -30.2% 100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 44.1 7.6%
          Hide
          mikemccand Michael McCandless added a comment -

          > If we changed to to track the maxSlot it should work... but I'm not
          > sure this is worthwhile, since it only speeds up already super-fast
          > searches and slightly hurts slow searches.

          OK I realized we could do this with very little added cost, by passing in "numFilledSlots" to FieldComparator.setNextReader. I'm attaching the patch with this.

          Show
          mikemccand Michael McCandless added a comment - > If we changed to to track the maxSlot it should work... but I'm not > sure this is worthwhile, since it only speeds up already super-fast > searches and slightly hurts slow searches. OK I realized we could do this with very little added cost, by passing in "numFilledSlots" to FieldComparator.setNextReader. I'm attaching the patch with this.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Just as a reminder to myself - we also need a custom comparator test.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Just as a reminder to myself - we also need a custom comparator test.
          Hide
          mikemccand Michael McCandless added a comment -

          New patch attached:

          • I moved maxScore tracking up into TopFieldCollector to save a
            method call per collect().
          • Deprecated TopDocs.get/setMaxScore()
          • Inlined the common "only 1 comparator" case
          • Some small optimizations
          • Fixed a bug in reverse sorting & short-circuit testing
          • Renamed a few attrs

          In addition to fixing TestSort to test multi-segment index where
          totalHits is much greater than topN, we should include reverse sorting
          (to tickle the bug I just found) as well as sorting by 2 or more
          fields. Mark do you want to take a stab at this? Given how many
          sneaky bugs we're uncovering just by staring at the code (for long
          enough) I'd like to increase test coverage....

          Here're the topN=10 results with this patch:

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 10 2000000 0.6 10.7 0.7 11.9 11.2%
          1 simple country ord text 10 2000000 0.6 10.7 0.6 14.2 32.7%
          1 simple country ordval text 10 2000000 0.6 10.7 0.6 14.3 33.6%
          1 simple country orddem text 10 2000000 0.6 10.7 0.6 13.4 25.2%
          1 wiki title val 147 10 4984 2.1 3743.8 2.4 2451.8 -34.5%
          1 wiki title ord 147 10 4984 2.1 3743.8 2.1 4459.4 19.1%
          1 wiki title ordval 147 10 4984 2.1 3743.8 2.1 4478.2 19.6%
          1 wiki title orddem 147 10 4984 2.1 3743.8 2.0 4233.9 13.1%
          1 wiki title val text 10 97191 2.1 144.2 2.4 38.9 -73.0%
          1 wiki title ord text 10 97191 2.1 144.2 2.1 165.0 14.4%
          1 wiki title ordval text 10 97191 2.1 144.2 2.1 159.9 10.9%
          1 wiki title orddem text 10 97191 2.1 144.2 2.1 161.4 11.9%
          1 wiki title val 1 10 386435 2.1 51.2 2.4 13.5 -73.6%
          1 wiki title ord 1 10 386435 2.1 51.2 2.1 67.1 31.1%
          1 wiki title ordval 1 10 386435 2.1 51.2 2.1 66.6 30.1%
          1 wiki title orddem 1 10 386435 2.1 51.2 2.1 64.7 26.4%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 10 2000000 0.7 10.4 0.7 11.6 11.5%
          10 simple country ord text 10 2000000 0.7 10.4 0.5 13.9 33.7%
          10 simple country ordval text 10 2000000 0.7 10.4 0.5 14.0 34.6%
          10 simple country orddem text 10 2000000 0.7 10.4 0.5 13.2 26.9%
          10 wiki title val 147 10 4984 12.5 3004.5 2.6 1695.3 -43.6%
          10 wiki title ord 147 10 4984 12.5 3004.5 2.1 3072.8 2.3%
          10 wiki title ordval 147 10 4984 12.5 3004.5 2.1 3328.7 10.8%
          10 wiki title orddem 147 10 4984 12.5 3004.5 2.1 3295.1 9.7%
          10 wiki title val text 10 97191 12.7 139.4 2.4 38.7 -72.2%
          10 wiki title ord text 10 97191 12.7 139.4 2.1 158.9 14.0%
          10 wiki title ordval text 10 97191 12.7 139.4 2.1 161.7 16.0%
          10 wiki title orddem text 10 97191 12.7 139.4 2.1 157.7 13.1%
          10 wiki title val 1 10 386435 12.7 50.3 2.5 15.6 -69.0%
          10 wiki title ord 1 10 386435 12.7 50.3 2.1 65.4 30.0%
          10 wiki title ordval 1 10 386435 12.7 50.3 2.1 66.4 32.0%
          10 wiki title orddem 1 10 386435 12.7 50.3 2.1 63.5 26.2%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 10 2000000 1.0 8.8 3.1 9.5 8.0%
          100 simple country ord text 10 2000000 1.0 8.8 0.6 11.4 29.5%
          100 simple country ordval text 10 2000000 1.0 8.8 0.6 11.3 28.4%
          100 simple country orddem text 10 2000000 1.0 8.8 0.6 11.0 25.0%
          100 wiki title val 147 10 4984 94.6 1066.9 3.7 456.8 -57.2%
          100 wiki title ord 147 10 4984 94.6 1066.9 2.1 522.2 -51.1%
          100 wiki title ordval 147 10 4984 94.6 1066.9 2.1 667.1 -37.5%
          100 wiki title orddem 147 10 4984 94.6 1066.9 2.1 781.7 -26.7%
          100 wiki title val text 10 97191 94.9 110.2 2.8 38.4 -65.2%
          100 wiki title ord text 10 97191 94.9 110.2 2.1 123.0 11.6%
          100 wiki title ordval text 10 97191 94.9 110.2 2.2 127.3 15.5%
          100 wiki title orddem text 10 97191 94.9 110.2 2.2 126.8 15.1%
          100 wiki title val 1 10 386435 94.3 47.9 2.8 15.8 -67.0%
          100 wiki title ord 1 10 386435 94.3 47.9 2.1 59.8 24.8%
          100 wiki title ordval 1 10 386435 94.3 47.9 2.2 60.8 26.9%
          100 wiki title orddem 1 10 386435 94.3 47.9 2.2 59.0 23.2%
          Show
          mikemccand Michael McCandless added a comment - New patch attached: I moved maxScore tracking up into TopFieldCollector to save a method call per collect(). Deprecated TopDocs.get/setMaxScore() Inlined the common "only 1 comparator" case Some small optimizations Fixed a bug in reverse sorting & short-circuit testing Renamed a few attrs In addition to fixing TestSort to test multi-segment index where totalHits is much greater than topN, we should include reverse sorting (to tickle the bug I just found) as well as sorting by 2 or more fields. Mark do you want to take a stab at this? Given how many sneaky bugs we're uncovering just by staring at the code (for long enough) I'd like to increase test coverage.... Here're the topN=10 results with this patch: numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 10 2000000 0.6 10.7 0.7 11.9 11.2% 1 simple country ord text 10 2000000 0.6 10.7 0.6 14.2 32.7% 1 simple country ordval text 10 2000000 0.6 10.7 0.6 14.3 33.6% 1 simple country orddem text 10 2000000 0.6 10.7 0.6 13.4 25.2% 1 wiki title val 147 10 4984 2.1 3743.8 2.4 2451.8 -34.5% 1 wiki title ord 147 10 4984 2.1 3743.8 2.1 4459.4 19.1% 1 wiki title ordval 147 10 4984 2.1 3743.8 2.1 4478.2 19.6% 1 wiki title orddem 147 10 4984 2.1 3743.8 2.0 4233.9 13.1% 1 wiki title val text 10 97191 2.1 144.2 2.4 38.9 -73.0% 1 wiki title ord text 10 97191 2.1 144.2 2.1 165.0 14.4% 1 wiki title ordval text 10 97191 2.1 144.2 2.1 159.9 10.9% 1 wiki title orddem text 10 97191 2.1 144.2 2.1 161.4 11.9% 1 wiki title val 1 10 386435 2.1 51.2 2.4 13.5 -73.6% 1 wiki title ord 1 10 386435 2.1 51.2 2.1 67.1 31.1% 1 wiki title ordval 1 10 386435 2.1 51.2 2.1 66.6 30.1% 1 wiki title orddem 1 10 386435 2.1 51.2 2.1 64.7 26.4% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 10 2000000 0.7 10.4 0.7 11.6 11.5% 10 simple country ord text 10 2000000 0.7 10.4 0.5 13.9 33.7% 10 simple country ordval text 10 2000000 0.7 10.4 0.5 14.0 34.6% 10 simple country orddem text 10 2000000 0.7 10.4 0.5 13.2 26.9% 10 wiki title val 147 10 4984 12.5 3004.5 2.6 1695.3 -43.6% 10 wiki title ord 147 10 4984 12.5 3004.5 2.1 3072.8 2.3% 10 wiki title ordval 147 10 4984 12.5 3004.5 2.1 3328.7 10.8% 10 wiki title orddem 147 10 4984 12.5 3004.5 2.1 3295.1 9.7% 10 wiki title val text 10 97191 12.7 139.4 2.4 38.7 -72.2% 10 wiki title ord text 10 97191 12.7 139.4 2.1 158.9 14.0% 10 wiki title ordval text 10 97191 12.7 139.4 2.1 161.7 16.0% 10 wiki title orddem text 10 97191 12.7 139.4 2.1 157.7 13.1% 10 wiki title val 1 10 386435 12.7 50.3 2.5 15.6 -69.0% 10 wiki title ord 1 10 386435 12.7 50.3 2.1 65.4 30.0% 10 wiki title ordval 1 10 386435 12.7 50.3 2.1 66.4 32.0% 10 wiki title orddem 1 10 386435 12.7 50.3 2.1 63.5 26.2% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 10 2000000 1.0 8.8 3.1 9.5 8.0% 100 simple country ord text 10 2000000 1.0 8.8 0.6 11.4 29.5% 100 simple country ordval text 10 2000000 1.0 8.8 0.6 11.3 28.4% 100 simple country orddem text 10 2000000 1.0 8.8 0.6 11.0 25.0% 100 wiki title val 147 10 4984 94.6 1066.9 3.7 456.8 -57.2% 100 wiki title ord 147 10 4984 94.6 1066.9 2.1 522.2 -51.1% 100 wiki title ordval 147 10 4984 94.6 1066.9 2.1 667.1 -37.5% 100 wiki title orddem 147 10 4984 94.6 1066.9 2.1 781.7 -26.7% 100 wiki title val text 10 97191 94.9 110.2 2.8 38.4 -65.2% 100 wiki title ord text 10 97191 94.9 110.2 2.1 123.0 11.6% 100 wiki title ordval text 10 97191 94.9 110.2 2.2 127.3 15.5% 100 wiki title orddem text 10 97191 94.9 110.2 2.2 126.8 15.1% 100 wiki title val 1 10 386435 94.3 47.9 2.8 15.8 -67.0% 100 wiki title ord 1 10 386435 94.3 47.9 2.1 59.8 24.8% 100 wiki title ordval 1 10 386435 94.3 47.9 2.2 60.8 26.9% 100 wiki title orddem 1 10 386435 94.3 47.9 2.2 59.0 23.2%
          Hide
          mikemccand Michael McCandless added a comment -

          > In the case where 3 hits come from a Reader, but you ask for 1000 back, that will run through that loop 1000 times, but you only need to convert 3 right?

          Well, it's tricky... and indeed all tests pass with the bug (which is
          spooky – I think we need to add cases to TestSort where 1) the index
          has many segments, and 2) the number of hits is much greater than the
          queue size), but I'm pretty sure it's a bug.

          You're right: it'd be nice to only visit the "used" slots in the queue
          on advancing to each reader. During the "startup transient" (when
          collector has not yet seen enough hits to fill its queue), the slot
          indeed increases one at a time, and you could at that point use it to
          efficiently visit only the used slots.

          Howevever, after startup transient, the pqueue will then track the
          weakest entry in the queue, which can occur in any of the slots, and
          when a hit that beats that weakest entry arrives, it will call copy()
          into that slot.

          So the slot passed to copy is now a "relatively random" value. For a
          1000 sized queue whose slots are full, you might get a copy into slot
          242. In this case we were incorrectly setting "this.slot" to 242 and
          then only converting the first 242 entries.

          If we changed to to track the maxSlot it should work... but I'm not
          sure this is worthwhile, since it only speeds up already super-fast
          searches and slightly hurts slow searches.

          Show
          mikemccand Michael McCandless added a comment - > In the case where 3 hits come from a Reader, but you ask for 1000 back, that will run through that loop 1000 times, but you only need to convert 3 right? Well, it's tricky... and indeed all tests pass with the bug (which is spooky – I think we need to add cases to TestSort where 1) the index has many segments, and 2) the number of hits is much greater than the queue size), but I'm pretty sure it's a bug. You're right: it'd be nice to only visit the "used" slots in the queue on advancing to each reader. During the "startup transient" (when collector has not yet seen enough hits to fill its queue), the slot indeed increases one at a time, and you could at that point use it to efficiently visit only the used slots. Howevever, after startup transient, the pqueue will then track the weakest entry in the queue, which can occur in any of the slots, and when a hit that beats that weakest entry arrives, it will call copy() into that slot. So the slot passed to copy is now a "relatively random" value. For a 1000 sized queue whose slots are full, you might get a copy into slot 242. In this case we were incorrectly setting "this.slot" to 242 and then only converting the first 242 entries. If we changed to to track the maxSlot it should work... but I'm not sure this is worthwhile, since it only speeds up already super-fast searches and slightly hurts slow searches.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Hmm...I'm going to have to break down and think hard about the slot issue. You have switched it to values.length - here is what I'm thinking: In the case where 3 hits come from a Reader, but you ask for 1000 back, that will run through that loop 1000 times, but you only need to convert 3 right? This is what I was attempting - what test are you running to see the failure? The idea for me was that if you only have 20 hits so far on a top 1000 search, you only need to hit that loop to convert 20 values to the new Reader, rather 1000 everytime (though you spin fast hitting the ==null). I'm not sure - xmas shopping has burned my whats left of my brain cells. I'll let it stew, perhaps run some code, and come back.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Hmm...I'm going to have to break down and think hard about the slot issue. You have switched it to values.length - here is what I'm thinking: In the case where 3 hits come from a Reader, but you ask for 1000 back, that will run through that loop 1000 times, but you only need to convert 3 right? This is what I was attempting - what test are you running to see the failure? The idea for me was that if you only have 20 hits so far on a top 1000 search, you only need to hit that loop to convert 20 values to the new Reader, rather 1000 everytime (though you spin fast hitting the ==null). I'm not sure - xmas shopping has burned my whats left of my brain cells. I'll let it stew, perhaps run some code, and come back.
          Hide
          mikemccand Michael McCandless added a comment -

          Results with topN=10:

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 10 2000000 0.6 10.7 0.7 11.7 9.3%
          1 simple country ord text 10 2000000 0.6 10.7 0.6 13.8 29.0%
          1 simple country ordval text 10 2000000 0.6 10.7 0.6 13.3 24.3%
          1 simple country orddem text 10 2000000 0.6 10.7 0.7 13.0 21.5%
          1 wiki title val 147 10 4984 2.1 3743.8 2.3 2441.8 -34.8%
          1 wiki title ord 147 10 4984 2.1 3743.8 2.0 4426.2 18.2%
          1 wiki title ordval 147 10 4984 2.1 3743.8 2.1 4352.7 16.3%
          1 wiki title orddem 147 10 4984 2.1 3743.8 2.0 4063.6 8.5%
          1 wiki title val text 10 97191 2.1 144.2 2.3 39.1 -72.9%
          1 wiki title ord text 10 97191 2.1 144.2 2.1 164.5 14.1%
          1 wiki title ordval text 10 97191 2.1 144.2 2.1 162.6 12.8%
          1 wiki title orddem text 10 97191 2.1 144.2 2.1 157.3 9.1%
          1 wiki title val 1 10 386435 2.1 51.2 2.4 13.6 -73.4%
          1 wiki title ord 1 10 386435 2.1 51.2 2.1 65.7 28.3%
          1 wiki title ordval 1 10 386435 2.1 51.2 2.1 64.7 26.4%
          1 wiki title orddem 1 10 386435 2.1 51.2 2.1 60.4 18.0%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 10 2000000 0.7 10.4 0.6 11.6 11.5%
          10 simple country ord text 10 2000000 0.7 10.4 0.6 13.6 30.8%
          10 simple country ordval text 10 2000000 0.7 10.4 0.6 13.1 26.0%
          10 simple country orddem text 10 2000000 0.7 10.4 0.6 13.1 26.0%
          10 wiki title val 147 10 4984 12.5 3004.5 2.5 1732.9 -42.3%
          10 wiki title ord 147 10 4984 12.5 3004.5 2.1 3067.2 2.1%
          10 wiki title ordval 147 10 4984 12.5 3004.5 2.1 3283.5 9.3%
          10 wiki title orddem 147 10 4984 12.5 3004.5 2.1 3237.9 7.8%
          10 wiki title val text 10 97191 12.7 139.4 2.4 38.6 -72.3%
          10 wiki title ord text 10 97191 12.7 139.4 2.1 159.5 14.4%
          10 wiki title ordval text 10 97191 12.7 139.4 2.1 160.5 15.1%
          10 wiki title orddem text 10 97191 12.7 139.4 2.1 154.0 10.5%
          10 wiki title val 1 10 386435 12.7 50.3 2.5 15.6 -69.0%
          10 wiki title ord 1 10 386435 12.7 50.3 2.1 64.8 28.8%
          10 wiki title ordval 1 10 386435 12.7 50.3 2.1 64.0 27.2%
          10 wiki title orddem 1 10 386435 12.7 50.3 2.1 59.4 18.1%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 10 2000000 1.0 8.8 2.8 9.4 6.8%
          100 simple country ord text 10 2000000 1.0 8.8 0.6 11.1 26.1%
          100 simple country ordval text 10 2000000 1.0 8.8 0.6 10.7 21.6%
          100 simple country orddem text 10 2000000 1.0 8.8 0.6 10.7 21.6%
          100 wiki title val 147 10 4984 94.6 1066.9 3.3 454.8 -57.4%
          100 wiki title ord 147 10 4984 94.6 1066.9 2.1 519.2 -51.3%
          100 wiki title ordval 147 10 4984 94.6 1066.9 2.1 692.5 -35.1%
          100 wiki title orddem 147 10 4984 94.6 1066.9 2.1 778.1 -27.1%
          100 wiki title val text 10 97191 94.9 110.2 2.7 38.9 -64.7%
          100 wiki title ord text 10 97191 94.9 110.2 2.1 122.8 11.4%
          100 wiki title ordval text 10 97191 94.9 110.2 2.1 126.3 14.6%
          100 wiki title orddem text 10 97191 94.9 110.2 2.1 124.7 13.2%
          100 wiki title val 1 10 386435 94.3 47.9 2.8 15.8 -67.0%
          100 wiki title ord 1 10 386435 94.3 47.9 2.1 59.1 23.4%
          100 wiki title ordval 1 10 386435 94.3 47.9 2.2 58.9 23.0%
          100 wiki title orddem 1 10 386435 94.3 47.9 2.2 54.9 14.6%
          Show
          mikemccand Michael McCandless added a comment - Results with topN=10: numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 10 2000000 0.6 10.7 0.7 11.7 9.3% 1 simple country ord text 10 2000000 0.6 10.7 0.6 13.8 29.0% 1 simple country ordval text 10 2000000 0.6 10.7 0.6 13.3 24.3% 1 simple country orddem text 10 2000000 0.6 10.7 0.7 13.0 21.5% 1 wiki title val 147 10 4984 2.1 3743.8 2.3 2441.8 -34.8% 1 wiki title ord 147 10 4984 2.1 3743.8 2.0 4426.2 18.2% 1 wiki title ordval 147 10 4984 2.1 3743.8 2.1 4352.7 16.3% 1 wiki title orddem 147 10 4984 2.1 3743.8 2.0 4063.6 8.5% 1 wiki title val text 10 97191 2.1 144.2 2.3 39.1 -72.9% 1 wiki title ord text 10 97191 2.1 144.2 2.1 164.5 14.1% 1 wiki title ordval text 10 97191 2.1 144.2 2.1 162.6 12.8% 1 wiki title orddem text 10 97191 2.1 144.2 2.1 157.3 9.1% 1 wiki title val 1 10 386435 2.1 51.2 2.4 13.6 -73.4% 1 wiki title ord 1 10 386435 2.1 51.2 2.1 65.7 28.3% 1 wiki title ordval 1 10 386435 2.1 51.2 2.1 64.7 26.4% 1 wiki title orddem 1 10 386435 2.1 51.2 2.1 60.4 18.0% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 10 2000000 0.7 10.4 0.6 11.6 11.5% 10 simple country ord text 10 2000000 0.7 10.4 0.6 13.6 30.8% 10 simple country ordval text 10 2000000 0.7 10.4 0.6 13.1 26.0% 10 simple country orddem text 10 2000000 0.7 10.4 0.6 13.1 26.0% 10 wiki title val 147 10 4984 12.5 3004.5 2.5 1732.9 -42.3% 10 wiki title ord 147 10 4984 12.5 3004.5 2.1 3067.2 2.1% 10 wiki title ordval 147 10 4984 12.5 3004.5 2.1 3283.5 9.3% 10 wiki title orddem 147 10 4984 12.5 3004.5 2.1 3237.9 7.8% 10 wiki title val text 10 97191 12.7 139.4 2.4 38.6 -72.3% 10 wiki title ord text 10 97191 12.7 139.4 2.1 159.5 14.4% 10 wiki title ordval text 10 97191 12.7 139.4 2.1 160.5 15.1% 10 wiki title orddem text 10 97191 12.7 139.4 2.1 154.0 10.5% 10 wiki title val 1 10 386435 12.7 50.3 2.5 15.6 -69.0% 10 wiki title ord 1 10 386435 12.7 50.3 2.1 64.8 28.8% 10 wiki title ordval 1 10 386435 12.7 50.3 2.1 64.0 27.2% 10 wiki title orddem 1 10 386435 12.7 50.3 2.1 59.4 18.1% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 10 2000000 1.0 8.8 2.8 9.4 6.8% 100 simple country ord text 10 2000000 1.0 8.8 0.6 11.1 26.1% 100 simple country ordval text 10 2000000 1.0 8.8 0.6 10.7 21.6% 100 simple country orddem text 10 2000000 1.0 8.8 0.6 10.7 21.6% 100 wiki title val 147 10 4984 94.6 1066.9 3.3 454.8 -57.4% 100 wiki title ord 147 10 4984 94.6 1066.9 2.1 519.2 -51.3% 100 wiki title ordval 147 10 4984 94.6 1066.9 2.1 692.5 -35.1% 100 wiki title orddem 147 10 4984 94.6 1066.9 2.1 778.1 -27.1% 100 wiki title val text 10 97191 94.9 110.2 2.7 38.9 -64.7% 100 wiki title ord text 10 97191 94.9 110.2 2.1 122.8 11.4% 100 wiki title ordval text 10 97191 94.9 110.2 2.1 126.3 14.6% 100 wiki title orddem text 10 97191 94.9 110.2 2.1 124.7 13.2% 100 wiki title val 1 10 386435 94.3 47.9 2.8 15.8 -67.0% 100 wiki title ord 1 10 386435 94.3 47.9 2.1 59.1 23.4% 100 wiki title ordval 1 10 386435 94.3 47.9 2.2 58.9 23.0% 100 wiki title orddem 1 10 386435 94.3 47.9 2.2 54.9 14.6%
          Hide
          mikemccand Michael McCandless added a comment -

          OK new patch attached:

          • Fixed the bug (I had added) in StringOrdValOnDemComparator that
            was doing too much work on first segment (especially skewed
            1-segment index results).
          • Removed "this.slot = slot" from StringOrdComparator,
            StringOrdValComparator and StringOrdValOnDemWinComparator. I
            think this was a bug that was causing setNextReader to not convert
            enough of the queue in my tests. Unfortunately, this makes
            performance worse for these classes.
          • Tweaked null checks for string values to be tiny bit faster if there
            are nulls.

          New benchmark results for topN=1000;

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 1000 2000000 0.9 10.0 0.7 11.1 11.0%
          1 simple country ord text 1000 2000000 0.9 10.0 0.7 13.3 33.0%
          1 simple country ordval text 1000 2000000 0.9 10.0 0.7 12.7 27.0%
          1 simple country orddem text 1000 2000000 0.9 10.0 0.6 12.5 25.0%
          1 wiki title val 147 1000 4984 2.1 740.6 2.3 369.3 -50.1%
          1 wiki title ord 147 1000 4984 2.1 740.6 2.1 808.1 9.1%
          1 wiki title ordval 147 1000 4984 2.1 740.6 2.0 815.7 10.1%
          1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 731.7 -1.2%
          1 wiki title val text 1000 97191 2.1 108.7 2.4 32.6 -70.0%
          1 wiki title ord text 1000 97191 2.1 108.7 2.1 121.1 11.4%
          1 wiki title ordval text 1000 97191 2.1 108.7 2.1 119.0 9.5%
          1 wiki title orddem text 1000 97191 2.1 108.7 2.1 114.8 5.6%
          1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.7 -72.5%
          1 wiki title ord 1 1000 386435 2.1 46.2 2.1 58.5 26.6%
          1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 56.7 22.7%
          1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 55.2 19.5%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 1000 2000000 0.8 9.7 0.6 11.0 13.4%
          10 simple country ord text 1000 2000000 0.8 9.7 0.6 13.1 35.1%
          10 simple country ordval text 1000 2000000 0.8 9.7 0.6 12.5 28.9%
          10 simple country orddem text 1000 2000000 0.8 9.7 0.6 12.5 28.9%
          10 wiki title val 147 1000 4984 12.7 664.2 2.4 382.9 -42.4%
          10 wiki title ord 147 1000 4984 12.7 664.2 2.2 57.9 -91.3%
          10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 71.5 -89.2%
          10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 91.1 -86.3%
          10 wiki title val text 1000 97191 12.6 100.4 2.4 33.4 -66.7%
          10 wiki title ord text 1000 97191 12.6 100.4 2.2 62.9 -37.4%
          10 wiki title ordval text 1000 97191 12.6 100.4 2.2 75.9 -24.4%
          10 wiki title orddem text 1000 97191 12.6 100.4 2.2 79.6 -20.7%
          10 wiki title val 1 1000 386435 12.7 42.4 2.4 14.7 -65.3%
          10 wiki title ord 1 1000 386435 12.7 42.4 2.7 45.2 6.6%
          10 wiki title ordval 1 1000 386435 12.7 42.4 2.1 48.5 14.4%
          10 wiki title orddem 1 1000 386435 12.7 42.4 2.2 50.2 18.4%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 1000 2000000 1.0 8.5 0.7 9.2 8.2%
          100 simple country ord text 1000 2000000 1.0 8.5 0.6 10.1 18.8%
          100 simple country ordval text 1000 2000000 1.0 8.5 0.6 9.7 14.1%
          100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.3 21.2%
          100 wiki title val 147 1000 4984 93.8 442.8 2.3 240.7 -45.6%
          100 wiki title ord 147 1000 4984 93.8 442.8 2.3 12.3 -97.2%
          100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 18.4 -95.8%
          100 wiki title orddem 147 1000 4984 93.8 442.8 2.1 57.7 -87.0%
          100 wiki title val text 1000 97191 93.4 88.0 2.3 33.3 -62.2%
          100 wiki title ord text 1000 97191 93.4 88.0 2.3 17.7 -79.9%
          100 wiki title ordval text 1000 97191 93.4 88.0 2.2 27.8 -68.4%
          100 wiki title orddem text 1000 97191 93.4 88.0 2.2 54.4 -38.2%
          100 wiki title val 1 1000 386435 92.8 41.0 2.4 14.8 -63.9%
          100 wiki title ord 1 1000 386435 92.8 41.0 2.7 16.6 -59.5%
          100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 27.9 -32.0%
          100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 43.2 5.4%
          Show
          mikemccand Michael McCandless added a comment - OK new patch attached: Fixed the bug (I had added) in StringOrdValOnDemComparator that was doing too much work on first segment (especially skewed 1-segment index results). Removed "this.slot = slot" from StringOrdComparator, StringOrdValComparator and StringOrdValOnDemWinComparator. I think this was a bug that was causing setNextReader to not convert enough of the queue in my tests. Unfortunately, this makes performance worse for these classes. Tweaked null checks for string values to be tiny bit faster if there are nulls. New benchmark results for topN=1000; numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 1000 2000000 0.9 10.0 0.7 11.1 11.0% 1 simple country ord text 1000 2000000 0.9 10.0 0.7 13.3 33.0% 1 simple country ordval text 1000 2000000 0.9 10.0 0.7 12.7 27.0% 1 simple country orddem text 1000 2000000 0.9 10.0 0.6 12.5 25.0% 1 wiki title val 147 1000 4984 2.1 740.6 2.3 369.3 -50.1% 1 wiki title ord 147 1000 4984 2.1 740.6 2.1 808.1 9.1% 1 wiki title ordval 147 1000 4984 2.1 740.6 2.0 815.7 10.1% 1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 731.7 -1.2% 1 wiki title val text 1000 97191 2.1 108.7 2.4 32.6 -70.0% 1 wiki title ord text 1000 97191 2.1 108.7 2.1 121.1 11.4% 1 wiki title ordval text 1000 97191 2.1 108.7 2.1 119.0 9.5% 1 wiki title orddem text 1000 97191 2.1 108.7 2.1 114.8 5.6% 1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.7 -72.5% 1 wiki title ord 1 1000 386435 2.1 46.2 2.1 58.5 26.6% 1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 56.7 22.7% 1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 55.2 19.5% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 1000 2000000 0.8 9.7 0.6 11.0 13.4% 10 simple country ord text 1000 2000000 0.8 9.7 0.6 13.1 35.1% 10 simple country ordval text 1000 2000000 0.8 9.7 0.6 12.5 28.9% 10 simple country orddem text 1000 2000000 0.8 9.7 0.6 12.5 28.9% 10 wiki title val 147 1000 4984 12.7 664.2 2.4 382.9 -42.4% 10 wiki title ord 147 1000 4984 12.7 664.2 2.2 57.9 -91.3% 10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 71.5 -89.2% 10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 91.1 -86.3% 10 wiki title val text 1000 97191 12.6 100.4 2.4 33.4 -66.7% 10 wiki title ord text 1000 97191 12.6 100.4 2.2 62.9 -37.4% 10 wiki title ordval text 1000 97191 12.6 100.4 2.2 75.9 -24.4% 10 wiki title orddem text 1000 97191 12.6 100.4 2.2 79.6 -20.7% 10 wiki title val 1 1000 386435 12.7 42.4 2.4 14.7 -65.3% 10 wiki title ord 1 1000 386435 12.7 42.4 2.7 45.2 6.6% 10 wiki title ordval 1 1000 386435 12.7 42.4 2.1 48.5 14.4% 10 wiki title orddem 1 1000 386435 12.7 42.4 2.2 50.2 18.4% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 1000 2000000 1.0 8.5 0.7 9.2 8.2% 100 simple country ord text 1000 2000000 1.0 8.5 0.6 10.1 18.8% 100 simple country ordval text 1000 2000000 1.0 8.5 0.6 9.7 14.1% 100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.3 21.2% 100 wiki title val 147 1000 4984 93.8 442.8 2.3 240.7 -45.6% 100 wiki title ord 147 1000 4984 93.8 442.8 2.3 12.3 -97.2% 100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 18.4 -95.8% 100 wiki title orddem 147 1000 4984 93.8 442.8 2.1 57.7 -87.0% 100 wiki title val text 1000 97191 93.4 88.0 2.3 33.3 -62.2% 100 wiki title ord text 1000 97191 93.4 88.0 2.3 17.7 -79.9% 100 wiki title ordval text 1000 97191 93.4 88.0 2.2 27.8 -68.4% 100 wiki title orddem text 1000 97191 93.4 88.0 2.2 54.4 -38.2% 100 wiki title val 1 1000 386435 92.8 41.0 2.4 14.8 -63.9% 100 wiki title ord 1 1000 386435 92.8 41.0 2.7 16.6 -59.5% 100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 27.9 -32.0% 100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 43.2 5.4%
          Hide
          mikemccand Michael McCandless added a comment -

          Sorry, disregard those results above... I think there's a bug in the on-demand comparator.

          Show
          mikemccand Michael McCandless added a comment - Sorry, disregard those results above... I think there's a bug in the on-demand comparator.
          Hide
          mikemccand Michael McCandless added a comment -

          New patch attached:

          • Made IndexWriter.getSegmentCount package protected again
          • Fixed StringOrdComparator to use no doubles
          • Added "API is experimental" warnings to the new APIs
          • Deprecated FieldValueHitQueue.getMaxScore – I think in 3.0 we
            should remove this (with Hits)?
          • Tweaked StringOrdValOnDemComparator to avoid Arrays.fill in
            setNextReader
          • Fixed FieldValueHitQueue to actually pull the right "on demand"
            comparator (it was incorrectly pulling StringOrdValComparator)

          I re-ran perf tests:

          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          1 simple country val text 1000 2000000 0.9 10.0 0.8 11.1 11.0%
          1 simple country ord text 1000 2000000 0.9 10.0 0.7 13.0 30.0%
          1 simple country ordval text 1000 2000000 0.9 10.0 0.7 12.5 25.0%
          1 simple country orddem text 1000 2000000 0.9 10.0 0.7 12.5 25.0%
          1 wiki title val 147 1000 4984 2.1 740.6 2.4 375.7 -49.3%
          1 wiki title ord 147 1000 4984 2.1 740.6 2.1 811.7 9.6%
          1 wiki title ordval 147 1000 4984 2.1 740.6 2.1 806.6 8.9%
          1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 207.6 -72.0%
          1 wiki title val text 1000 97191 2.1 108.7 2.4 32.5 -70.1%
          1 wiki title ord text 1000 97191 2.1 108.7 2.1 121.4 11.7%
          1 wiki title ordval text 1000 97191 2.1 108.7 2.1 119.1 9.6%
          1 wiki title orddem text 1000 97191 2.1 108.7 2.1 90.9 -16.4%
          1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.8 -72.3%
          1 wiki title ord 1 1000 386435 2.1 46.2 2.1 58.6 26.8%
          1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 55.7 20.6%
          1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 50.1 8.4%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          10 simple country val text 1000 2000000 0.8 9.7 0.7 11.0 13.4%
          10 simple country ord text 1000 2000000 0.8 9.7 0.6 13.0 34.0%
          10 simple country ordval text 1000 2000000 0.8 9.7 0.6 12.4 27.8%
          10 simple country orddem text 1000 2000000 0.8 9.7 0.6 12.5 28.9%
          10 wiki title val 147 1000 4984 12.7 664.2 2.5 383.8 -42.2%
          10 wiki title ord 147 1000 4984 12.7 664.2 2.1 86.2 -87.0%
          10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 104.0 -84.3%
          10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 77.0 -88.4%
          10 wiki title val text 1000 97191 12.6 100.4 2.4 33.3 -66.8%
          10 wiki title ord text 1000 97191 12.6 100.4 2.2 80.0 -20.3%
          10 wiki title ordval text 1000 97191 12.6 100.4 2.2 90.3 -10.1%
          10 wiki title orddem text 1000 97191 12.6 100.4 2.1 72.1 -28.2%
          10 wiki title val 1 1000 386435 12.7 42.4 2.5 14.7 -65.3%
          10 wiki title ord 1 1000 386435 12.7 42.4 2.6 50.2 18.4%
          10 wiki title ordval 1 1000 386435 12.7 42.4 2.2 51.3 21.0%
          10 wiki title orddem 1 1000 386435 12.7 42.4 2.2 47.3 11.6%
          numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg
          100 simple country val text 1000 2000000 1.0 8.5 2.1 9.2 8.2%
          100 simple country ord text 1000 2000000 1.0 8.5 0.6 10.7 25.9%
          100 simple country ordval text 1000 2000000 1.0 8.5 0.6 10.3 21.2%
          100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.2 20.0%
          100 wiki title val 147 1000 4984 93.8 442.8 3.6 238.8 -46.1%
          100 wiki title ord 147 1000 4984 93.8 442.8 2.3 19.9 -95.5%
          100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 28.0 -93.7%
          100 wiki title orddem 147 1000 4984 93.8 442.8 2.2 54.1 -87.8%
          100 wiki title val text 1000 97191 93.4 88.0 3.1 33.1 -62.4%
          100 wiki title ord text 1000 97191 93.4 88.0 2.3 27.8 -68.4%
          100 wiki title ordval text 1000 97191 93.4 88.0 2.2 40.9 -53.5%
          100 wiki title orddem text 1000 97191 93.4 88.0 2.2 53.3 -39.4%
          100 wiki title val 1 1000 386435 92.8 41.0 3.2 14.7 -64.1%
          100 wiki title ord 1 1000 386435 92.8 41.0 2.7 25.3 -38.3%
          100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 33.8 -17.6%
          100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 42.7 4.1%

          Haven't digested these results yet...

          Show
          mikemccand Michael McCandless added a comment - New patch attached: Made IndexWriter.getSegmentCount package protected again Fixed StringOrdComparator to use no doubles Added "API is experimental" warnings to the new APIs Deprecated FieldValueHitQueue.getMaxScore – I think in 3.0 we should remove this (with Hits)? Tweaked StringOrdValOnDemComparator to avoid Arrays.fill in setNextReader Fixed FieldValueHitQueue to actually pull the right "on demand" comparator (it was incorrectly pulling StringOrdValComparator) I re-ran perf tests: numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 1 simple country val text 1000 2000000 0.9 10.0 0.8 11.1 11.0% 1 simple country ord text 1000 2000000 0.9 10.0 0.7 13.0 30.0% 1 simple country ordval text 1000 2000000 0.9 10.0 0.7 12.5 25.0% 1 simple country orddem text 1000 2000000 0.9 10.0 0.7 12.5 25.0% 1 wiki title val 147 1000 4984 2.1 740.6 2.4 375.7 -49.3% 1 wiki title ord 147 1000 4984 2.1 740.6 2.1 811.7 9.6% 1 wiki title ordval 147 1000 4984 2.1 740.6 2.1 806.6 8.9% 1 wiki title orddem 147 1000 4984 2.1 740.6 2.1 207.6 -72.0% 1 wiki title val text 1000 97191 2.1 108.7 2.4 32.5 -70.1% 1 wiki title ord text 1000 97191 2.1 108.7 2.1 121.4 11.7% 1 wiki title ordval text 1000 97191 2.1 108.7 2.1 119.1 9.6% 1 wiki title orddem text 1000 97191 2.1 108.7 2.1 90.9 -16.4% 1 wiki title val 1 1000 386435 2.1 46.2 2.4 12.8 -72.3% 1 wiki title ord 1 1000 386435 2.1 46.2 2.1 58.6 26.8% 1 wiki title ordval 1 1000 386435 2.1 46.2 2.1 55.7 20.6% 1 wiki title orddem 1 1000 386435 2.1 46.2 2.1 50.1 8.4% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 10 simple country val text 1000 2000000 0.8 9.7 0.7 11.0 13.4% 10 simple country ord text 1000 2000000 0.8 9.7 0.6 13.0 34.0% 10 simple country ordval text 1000 2000000 0.8 9.7 0.6 12.4 27.8% 10 simple country orddem text 1000 2000000 0.8 9.7 0.6 12.5 28.9% 10 wiki title val 147 1000 4984 12.7 664.2 2.5 383.8 -42.2% 10 wiki title ord 147 1000 4984 12.7 664.2 2.1 86.2 -87.0% 10 wiki title ordval 147 1000 4984 12.7 664.2 2.1 104.0 -84.3% 10 wiki title orddem 147 1000 4984 12.7 664.2 2.1 77.0 -88.4% 10 wiki title val text 1000 97191 12.6 100.4 2.4 33.3 -66.8% 10 wiki title ord text 1000 97191 12.6 100.4 2.2 80.0 -20.3% 10 wiki title ordval text 1000 97191 12.6 100.4 2.2 90.3 -10.1% 10 wiki title orddem text 1000 97191 12.6 100.4 2.1 72.1 -28.2% 10 wiki title val 1 1000 386435 12.7 42.4 2.5 14.7 -65.3% 10 wiki title ord 1 1000 386435 12.7 42.4 2.6 50.2 18.4% 10 wiki title ordval 1 1000 386435 12.7 42.4 2.2 51.3 21.0% 10 wiki title orddem 1 1000 386435 12.7 42.4 2.2 47.3 11.6% numSeg index sortBy method query topN hits warm qps warmnew qpsnew pctg 100 simple country val text 1000 2000000 1.0 8.5 2.1 9.2 8.2% 100 simple country ord text 1000 2000000 1.0 8.5 0.6 10.7 25.9% 100 simple country ordval text 1000 2000000 1.0 8.5 0.6 10.3 21.2% 100 simple country orddem text 1000 2000000 1.0 8.5 0.6 10.2 20.0% 100 wiki title val 147 1000 4984 93.8 442.8 3.6 238.8 -46.1% 100 wiki title ord 147 1000 4984 93.8 442.8 2.3 19.9 -95.5% 100 wiki title ordval 147 1000 4984 93.8 442.8 2.2 28.0 -93.7% 100 wiki title orddem 147 1000 4984 93.8 442.8 2.2 54.1 -87.8% 100 wiki title val text 1000 97191 93.4 88.0 3.1 33.1 -62.4% 100 wiki title ord text 1000 97191 93.4 88.0 2.3 27.8 -68.4% 100 wiki title ordval text 1000 97191 93.4 88.0 2.2 40.9 -53.5% 100 wiki title orddem text 1000 97191 93.4 88.0 2.2 53.3 -39.4% 100 wiki title val 1 1000 386435 92.8 41.0 3.2 14.7 -64.1% 100 wiki title ord 1 1000 386435 92.8 41.0 2.7 25.3 -38.3% 100 wiki title ordval 1 1000 386435 92.8 41.0 2.2 33.8 -17.6% 100 wiki title orddem 1 1000 386435 92.8 41.0 2.2 42.7 4.1% Haven't digested these results yet...
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Was just thinking about the window convert (which I doubt is worth considering anyway) - I wouldnt bench it...I forgot to make it check for conversion as its working on the window. It just checks the window pivot point. Creates lots of extra work.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Was just thinking about the window convert (which I doubt is worth considering anyway) - I wouldnt bench it...I forgot to make it check for conversion as its working on the window. It just checks the window pivot point. Creates lots of extra work.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Again with the two $id issues fixed.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Again with the two $id issues fixed.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Escaped back to the computer. Here is the latest. I went down a few dead ends trying to improve things, and I don't think I found anything significant.

          There are the following String sort types:

          SortField.STRING_VAL : by value

          SortField.STRING_ORD: by ordinal fall back to subord

          SortField.STRING_ORD_VAL: by ordinal fall back to value

          SortField.STRING_ORD_VAL_DEM: by ordinal fall back to value, convert on demand

          and just for kicks:

          SortField.STRING_ORD_VAL_DEM_WIN: by ordinal fall back to value, convert a window of 20 in each direction of n on demand

          SortField.STRING is set to STRING_ORD_VAL

          Show
          markrmiller@gmail.com Mark Miller added a comment - Escaped back to the computer. Here is the latest. I went down a few dead ends trying to improve things, and I don't think I found anything significant. There are the following String sort types: SortField.STRING_VAL : by value SortField.STRING_ORD: by ordinal fall back to subord SortField.STRING_ORD_VAL: by ordinal fall back to value SortField.STRING_ORD_VAL_DEM: by ordinal fall back to value, convert on demand and just for kicks: SortField.STRING_ORD_VAL_DEM_WIN: by ordinal fall back to value, convert a window of 20 in each direction of n on demand SortField.STRING is set to STRING_ORD_VAL
          Hide
          mikemccand Michael McCandless added a comment -

          OK wanna post a new patch w/ all these fixes? I'll re-test, and add on-demand queue. If perf cost of that one is still too high, we may want to offer a queue that pulls MultiReader's FieldCache values, but uses the slot-based pqueue (we would have to fix this insanely slow warm time if we do that...). I bet for very large queue sizes that'd give the best performance, at the obvious expense of reopen cost.

          Show
          mikemccand Michael McCandless added a comment - OK wanna post a new patch w/ all these fixes? I'll re-test, and add on-demand queue. If perf cost of that one is still too high, we may want to offer a queue that pulls MultiReader's FieldCache values, but uses the slot-based pqueue (we would have to fix this insanely slow warm time if we do that...). I bet for very large queue sizes that'd give the best performance, at the obvious expense of reopen cost.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Thanks. I've got a convert on demand version to give a try too.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Thanks. I've got a convert on demand version to give a try too.
          Hide
          mikemccand Michael McCandless added a comment -

          Another fix to do: in StringOrdValComparator.compare (the one that takes slot, doc, score), it does not fallback to String value comparison.

          Show
          mikemccand Michael McCandless added a comment - Another fix to do: in StringOrdValComparator.compare (the one that takes slot, doc, score), it does not fallback to String value comparison.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Yeah, wow. I've actually fixed that (though I didn't notice doing so). I've made a half dozen little tweaks trying to eek out some speed and maybe trigger some ideas. Nothing thats made much of a difference yet. We can subtract instead of compare and other tiny in the noise stuff - hoping I trigger a bigger idea though.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Yeah, wow. I've actually fixed that (though I didn't notice doing so). I've made a half dozen little tweaks trying to eek out some speed and maybe trigger some ideas. Nothing thats made much of a difference yet. We can subtract instead of compare and other tiny in the noise stuff - hoping I trigger a bigger idea though.
          Hide
          mikemccand Michael McCandless added a comment -

          One more small thing: StringOrdValComparator.compare uses double ord1 and double ord2 when comparing 2 slots, but in fact they should just be ints?

          Show
          mikemccand Michael McCandless added a comment - One more small thing: StringOrdValComparator.compare uses double ord1 and double ord2 when comparing 2 slots, but in fact they should just be ints?
          Hide
          mikemccand Michael McCandless added a comment -

          I set the queue size to 1000 and reran the tests:

          numSeg index sortBy query topN hits warm qps warmnew qpsnew pctg
          1 wiki score 147 1000 4984 0.3 1356.8 0.3 1361.3 0.3%
          1 wiki score text 1000 97191 0.3 224.0 0.3 223.0 -0.4%
          1 wiki score 1 1000 386435 0.3 73.6 0.3 72.8 -1.1%
          1 wiki doc 147 1000 4984 0.3 1527.0 0.3 1475.0 -3.4%
          1 wiki doc text 1000 97191 0.3 182.5 0.3 235.4 29.0%
          1 wiki doc 1 1000 386435 0.3 50.6 0.3 67.7 33.8%
          1 wiki doc <all> 1000 2000000 0.1 22.1 0.1 37.8 71.0%
          1 simple int text 1000 2000000 0.7 10.1 0.7 12.8 26.7%
          1 simple int <all> 1000 2000000 0.6 19.0 0.6 30.5 60.5%
          1 simple country text 1000 2000000 0.9 10.1 0.7 12.5 23.8%
          1 simple country <all> 1000 2000000 0.9 19.5 0.6 29.1 49.2%
          1 wiki title 147 1000 4984 4.0 733.1 2.0 732.2 -0.1%
          1 wiki title text 1000 97191 4.1 109.1 2.1 114.7 5.1%
          1 wiki title 1 1000 386435 4.1 47.1 2.1 55.4 17.6%
          1 wiki title <all> 1000 2000000 4.1 19.4 2.1 30.5 57.2%
          numSeg index sortBy query topN hits warm qps warmnew qpsnew pctg
          10 wiki score 147 1000 4984 0.3 1259.4 0.3 1274.0 1.2%
          10 wiki score text 1000 97191 0.4 215.2 0.4 220.0 2.2%
          10 wiki score 1 1000 386435 0.4 69.6 0.4 72.0 3.4%
          10 wiki doc 147 1000 4984 0.3 1409.0 0.3 1394.7 -1.0%
          10 wiki doc text 1000 97191 0.4 192.0 0.4 232.5 21.1%
          10 wiki doc 1 1000 386435 0.4 53.0 0.4 66.3 25.1%
          10 wiki doc <all> 1000 2000000 0.1 11.9 0.1 37.5 215.1%
          10 simple int text 1000 2000000 1.2 9.8 0.6 12.8 30.6%
          10 simple int <all> 1000 2000000 1.2 11.0 0.8 30.2 174.5%
          10 simple country text 1000 2000000 1.1 9.8 0.6 12.4 26.5%
          10 simple country <all> 1000 2000000 1.1 11.0 0.5 29.1 164.5%
          10 wiki title 147 1000 4984 26.0 655.2 2.1 84.7 -87.1%
          10 wiki title text 1000 97191 26.3 100.4 2.2 77.8 -22.5%
          10 wiki title 1 1000 386435 26.0 42.3 2.6 48.4 14.4%
          10 wiki title <all> 1000 2000000 26.1 10.9 2.6 28.5 161.5%
          numSeg index sortBy query topN hits warm qps warmnew qpsnew pctg
          100 wiki score 147 1000 4984 0.4 704.1 0.5 677.5 -3.8%
          100 wiki score text 1000 97191 0.4 169.5 0.5 186.0 9.7%
          100 wiki score 1 1000 386435 0.4 56.5 0.5 67.9 20.2%
          100 wiki doc 147 1000 4984 0.4 785.0 0.4 724.0 -7.8%
          100 wiki doc text 1000 97191 0.4 159.9 0.4 204.7 28.0%
          100 wiki doc 1 1000 386435 0.4 44.9 0.4 64.8 44.3%
          100 wiki doc <all> 1000 2000000 0.2 7.8 0.1 40.4 417.9%
          100 simple int text 1000 2000000 3.3 8.4 1.4 10.3 22.6%
          100 simple int <all> 1000 2000000 3.4 7.4 1.1 32.4 337.8%
          100 simple country text 1000 2000000 1.4 8.6 0.7 10.0 16.3%
          100 simple country <all> 1000 2000000 1.5 7.3 0.6 28.6 291.8%
          100 wiki title 147 1000 4984 189.0 446.3 2.4 19.8 -95.6%
          100 wiki title text 1000 97191 188.5 87.7 2.3 27.5 -68.6%
          100 wiki title 1 1000 386435 190.4 41.1 2.7 24.6 -40.1%
          100 wiki title <all> 1000 2000000 189.2 7.4 3.0 18.4 148.6%

          Performance clearly gets worse for queries not hitting many docs, and
          with a large queue, against an index with a large number of
          segments, and sorting by a unique String field (like title).

          The slowdown for "147" at 100 segments is quite bad.

          So... I wonder how often users of Lucene set a very large queue size
          (to do some sort of post filtering, which could be more efficiently
          done as a real Filter, but...). I think it may a non-trivial number,
          so... what to do?

          EG we could offer a different collector that's better optimized
          towards collecting a large topN (probably doing the toplevel
          FieldCache that's done today)? Or, we could explore a hybrid approach
          whereby a slot is only switched to the current segment when it's first
          visited again (instead of updating all of them on switching readers)?
          Or... something else?

          Show
          mikemccand Michael McCandless added a comment - I set the queue size to 1000 and reran the tests: numSeg index sortBy query topN hits warm qps warmnew qpsnew pctg 1 wiki score 147 1000 4984 0.3 1356.8 0.3 1361.3 0.3% 1 wiki score text 1000 97191 0.3 224.0 0.3 223.0 -0.4% 1 wiki score 1 1000 386435 0.3 73.6 0.3 72.8 -1.1% 1 wiki doc 147 1000 4984 0.3 1527.0 0.3 1475.0 -3.4% 1 wiki doc text 1000 97191 0.3 182.5 0.3 235.4 29.0% 1 wiki doc 1 1000 386435 0.3 50.6 0.3 67.7 33.8% 1 wiki doc <all> 1000 2000000 0.1 22.1 0.1 37.8 71.0% 1 simple int text 1000 2000000 0.7 10.1 0.7 12.8 26.7% 1 simple int <all> 1000 2000000 0.6 19.0 0.6 30.5 60.5% 1 simple country text 1000 2000000 0.9 10.1 0.7 12.5 23.8% 1 simple country <all> 1000 2000000 0.9 19.5 0.6 29.1 49.2% 1 wiki title 147 1000 4984 4.0 733.1 2.0 732.2 -0.1% 1 wiki title text 1000 97191 4.1 109.1 2.1 114.7 5.1% 1 wiki title 1 1000 386435 4.1 47.1 2.1 55.4 17.6% 1 wiki title <all> 1000 2000000 4.1 19.4 2.1 30.5 57.2% numSeg index sortBy query topN hits warm qps warmnew qpsnew pctg 10 wiki score 147 1000 4984 0.3 1259.4 0.3 1274.0 1.2% 10 wiki score text 1000 97191 0.4 215.2 0.4 220.0 2.2% 10 wiki score 1 1000 386435 0.4 69.6 0.4 72.0 3.4% 10 wiki doc 147 1000 4984 0.3 1409.0 0.3 1394.7 -1.0% 10 wiki doc text 1000 97191 0.4 192.0 0.4 232.5 21.1% 10 wiki doc 1 1000 386435 0.4 53.0 0.4 66.3 25.1% 10 wiki doc <all> 1000 2000000 0.1 11.9 0.1 37.5 215.1% 10 simple int text 1000 2000000 1.2 9.8 0.6 12.8 30.6% 10 simple int <all> 1000 2000000 1.2 11.0 0.8 30.2 174.5% 10 simple country text 1000 2000000 1.1 9.8 0.6 12.4 26.5% 10 simple country <all> 1000 2000000 1.1 11.0 0.5 29.1 164.5% 10 wiki title 147 1000 4984 26.0 655.2 2.1 84.7 -87.1% 10 wiki title text 1000 97191 26.3 100.4 2.2 77.8 -22.5% 10 wiki title 1 1000 386435 26.0 42.3 2.6 48.4 14.4% 10 wiki title <all> 1000 2000000 26.1 10.9 2.6 28.5 161.5% numSeg index sortBy query topN hits warm qps warmnew qpsnew pctg 100 wiki score 147 1000 4984 0.4 704.1 0.5 677.5 -3.8% 100 wiki score text 1000 97191 0.4 169.5 0.5 186.0 9.7% 100 wiki score 1 1000 386435 0.4 56.5 0.5 67.9 20.2% 100 wiki doc 147 1000 4984 0.4 785.0 0.4 724.0 -7.8% 100 wiki doc text 1000 97191 0.4 159.9 0.4 204.7 28.0% 100 wiki doc 1 1000 386435 0.4 44.9 0.4 64.8 44.3% 100 wiki doc <all> 1000 2000000 0.2 7.8 0.1 40.4 417.9% 100 simple int text 1000 2000000 3.3 8.4 1.4 10.3 22.6% 100 simple int <all> 1000 2000000 3.4 7.4 1.1 32.4 337.8% 100 simple country text 1000 2000000 1.4 8.6 0.7 10.0 16.3% 100 simple country <all> 1000 2000000 1.5 7.3 0.6 28.6 291.8% 100 wiki title 147 1000 4984 189.0 446.3 2.4 19.8 -95.6% 100 wiki title text 1000 97191 188.5 87.7 2.3 27.5 -68.6% 100 wiki title 1 1000 386435 190.4 41.1 2.7 24.6 -40.1% 100 wiki title <all> 1000 2000000 189.2 7.4 3.0 18.4 148.6% Performance clearly gets worse for queries not hitting many docs, and with a large queue, against an index with a large number of segments, and sorting by a unique String field (like title). The slowdown for "147" at 100 segments is quite bad. So... I wonder how often users of Lucene set a very large queue size (to do some sort of post filtering, which could be more efficiently done as a real Filter, but...). I think it may a non-trivial number, so... what to do? EG we could offer a different collector that's better optimized towards collecting a large topN (probably doing the toplevel FieldCache that's done today)? Or, we could explore a hybrid approach whereby a slot is only switched to the current segment when it's first visited again (instead of updating all of them on switching readers)? Or... something else?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Awesome results Mike, thanks! You are a wizard.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Awesome results Mike, thanks! You are a wizard.
          Hide
          mikemccand Michael McCandless added a comment -

          OK I ran a bunch of sort perf tests, on trunk & with the patch.
          (Attached the two Python sources for doing this... though they require
          some small local mods to run properly).

          Each alg is run with "java -Xms1024M -Xmx1024M -Xbatch -server" on OS
          X 10.5.5, java 1.6.0_07-b06-153.

          I use two indexes, each with 2M docs. One is docs from Wikipedia
          (labeled "wiki"), the other is SortableSimpleDocMaker docs augmented
          to include a random country field (labeled "simple"). For each I
          created 1-segment, 10-segment and 100-segment indices. I sort by
          score, doc, string (val, ord = true ord+subord, ordval = ord +
          fallback). Queue size is 10.

          I ran various queries... query "147" hits ~5k docs, query "text" hits
          ~97K docs, query "1" hits 386K docs and alldocs query hits 2M docs. qps
          is queries per sec and warm is time for first warmup query, from
          trunk. qpsnew & warmnew are with patch. pctg shows % gain in
          qps performance:

          numSeg index sortBy query topN meth hits warm qps warmnew qpsnew pctg
          1 wiki score 147 10   4984 0.2 5717.6 0.2 5627.5 -1.6%
          1 wiki score text 10   97191 0.3 340.9 0.3 348.8 2.3%
          1 wiki score 1 10   386435 0.3 86.7 0.3 89.3 3.0%
          1 wiki doc 147 10   4984 0.3 4071.7 0.3 4649.0 14.2%
          1 wiki doc text 10   97191 0.3 225.4 0.3 253.7 12.6%
          1 wiki doc 1 10   386435 0.3 56.9 0.3 65.8 15.6%
          1 wiki doc <all> 10   2000000 0.1 23.0 0.1 38.6 67.8%
          1 simple int text 10   2000000 0.7 10.7 0.7 13.5 26.2%
          1 simple int <all> 10   2000000 0.6 21.1 0.6 34.7 64.5%
          1 simple country text 10 ord 2000000 0.6 10.7 0.6 13.2 23.4%
          1 simple country text 10 ordval 2000000 0.6 10.7 0.6 13.3 24.3%
          1 simple country <all> 10 ord 2000000 0.5 20.7 0.6 32.5 57.0%
          1 simple country <all> 10 ordval 2000000 0.5 20.7 0.6 34.6 67.1%
          1 wiki title 147 10 ord 4984 2.1 3743.8 2.0 4210.5 12.5%
          1 wiki title 147 10 ordval 4984 2.1 3743.8 2.0 4288.2 14.5%
          1 wiki title text 10 ord 97191 2.1 144.2 2.1 160.3 11.2%
          1 wiki title text 10 ordval 97191 2.1 144.2 2.1 163.5 13.4%
          1 wiki title 1 10 ord 386435 2.1 51.2 2.1 63.2 23.4%
          1 wiki title 1 10 ordval 386435 2.1 51.2 2.1 64.6 26.2%
          1 wiki title <all> 10 ord 2000000 2.1 21.1 2.1 33.2 57.3%
          1 wiki title <all> 10 ordval 2000000 2.1 21.1 2.1 35.4 67.8%
          numSeg index sortBy query topN meth hits warm qps warmnew qpsnew pctg
          10 wiki score 147 10   4984 0.3 4228.3 0.3 4510.6 6.7%
          10 wiki score text 10   97191 0.3 294.7 0.3 341.5 15.9%
          10 wiki score 1 10   386435 0.4 75.0 0.4 87.0 16.0%
          10 wiki doc 147 10   4984 0.3 3332.2 0.3 4033.9 21.1%
          10 wiki doc text 10   97191 0.4 217.0 0.4 277.0 27.6%
          10 wiki doc 1 10   386435 0.4 54.6 0.4 70.5 29.1%
          10 wiki doc <all> 10   2000000 0.1 12.7 0.1 38.6 203.9%
          10 simple int text 10   2000000 1.2 10.3 0.6 13.5 31.1%
          10 simple int <all> 10   2000000 1.1 11.8 0.8 34.6 193.2%
          10 simple country text 10 ord 2000000 0.7 10.4 0.5 13.2 26.9%
          10 simple country text 10 ordval 2000000 0.7 10.4 0.5 13.3 27.9%
          10 simple country <all> 10 ord 2000000 0.7 11.5 0.5 32.5 182.6%
          10 simple country <all> 10 ordval 2000000 0.7 11.5 0.5 34.1 196.5%
          10 wiki title 147 10 ord 4984 12.5 3004.5 2.1 3124.0 4.0%
          10 wiki title 147 10 ordval 4984 12.5 3004.5 2.1 3353.5 11.6%
          10 wiki title text 10 ord 97191 12.7 139.4 2.1 156.7 12.4%
          10 wiki title text 10 ordval 97191 12.7 139.4 2.1 160.9 15.4%
          10 wiki title 1 10 ord 386435 12.7 50.3 2.1 62.3 23.9%
          10 wiki title 1 10 ordval 386435 12.7 50.3 2.1 64.1 27.4%
          10 wiki title <all> 10 ord 2000000 12.7 11.4 2.1 33.1 190.4%
          10 wiki title <all> 10 ordval 2000000 12.7 11.4 2.1 35.3 209.6%
          numSeg index sortBy query topN meth hits warm qps warmnew qpsnew pctg
          100 wiki score 147 10   4984 0.3 1282.2 1.7 1162.3 -9.4%
          100 wiki score text 10   97191 0.4 232.4 1.3 275.6 18.6%
          100 wiki score 1 10   386435 0.4 65.1 1.4 80.4 23.5%
          100 wiki doc 147 10   4984 0.4 1170.0 0.4 1132.0 -3.2%
          100 wiki doc text 10   97191 0.4 171.7 0.4 230.1 34.0%
          100 wiki doc 1 10   386435 0.4 46.7 0.4 67.9 45.4%
          100 wiki doc <all> 10   2000000 0.2 7.8 0.1 41.6 433.3%
          100 simple int text 10   2000000 3.3 8.9 4.0 11.0 23.6%
          100 simple int <all> 10   2000000 3.4 7.7 1.1 36.5 374.0%
          100 simple country text 10 ord 2000000 1.0 8.8 0.6 10.8 22.7%
          100 simple country text 10 ordval 2000000 1.0 8.8 0.6 11.0 25.0%
          100 simple country <all> 10 ord 2000000 1.0 7.6 0.5 35.0 360.5%
          100 simple country <all> 10 ordval 2000000 1.0 7.6 0.5 36.3 377.6%
          100 wiki title 147 10 ord 4984 94.6 1066.9 2.1 583.7 -45.3%
          100 wiki title 147 10 ordval 4984 94.6 1066.9 2.1 750.1 -29.7%
          100 wiki title text 10 ord 97191 94.9 110.2 2.1 122.7 11.3%
          100 wiki title text 10 ordval 97191 94.9 110.2 2.1 128.4 16.5%
          100 wiki title 1 10 ord 386435 94.3 47.9 2.1 58.2 21.5%
          100 wiki title 1 10 ordval 386435 94.3 47.9 2.1 60.1 25.5%
          100 wiki title <all> 10 ord 2000000 94.6 7.8 2.5 35.6 356.4%
          100 wiki title <all> 10 ordval 2000000 94.6 7.8 2.4 37.0 374.4%

          It's a ridiculous amount of of data to digest... but here are some
          initial thoughts:

          • These are only single term queries; I'd expect for multi term
            queries the gain would be less since net/net less %tg of the time
            is spent collecting.
          • Ord + val fallback (ordval) is generally faster than pure
            ord/subord. I think for now we should run with ord + val
            fallback? (We can leave ord + subord commented out?).
          • It's great that we see decent speedups for "sort by score" which
            is presumably the most common sort used.
          • We do get slower in certain cases (neg pctg in the rightmost
            column): all not-in-the-noise slowdowns were with query "147" on
            the 100 segment index. This query hits relatively few docs (~5K)
            So, this is expected, because the new approach spends some time
            updating its queue for each subreader. If the time spent
            searching is relatively tiny then this queue update time becomes
            relatively big. I think with larger queue size the slowdown will
            be worse. However, I think this is an acceptable tradeoff.
          • The gains for field sorting on a single segment (optimized) index
            are impressive. Generally, the more hits encountered the better
            the gains. It's amazing that we see ~67.8% gain sorting by docID,
            country, and title for alldocs query. My only guess for this is
            better cache hit rate (because we gain locality by copying values
            to local compact arrays).
          • Across the board the alldocs query shows very sizable (5X faster
            for 100 segment index; 3X faster for 10 segment index)
            improvements.
          • I didn't break out %tg difference, but warming time with the patch
            is waaaay faster than trunk when index has > 1 segment. Reopen
            time should also be fantastically faster (we are sidestepping
            something silly happing w/ FieldCache on a Multi*Reader). Warming
            on trunk takes ~95 seconds with the 100 segment index!
          Show
          mikemccand Michael McCandless added a comment - OK I ran a bunch of sort perf tests, on trunk & with the patch. (Attached the two Python sources for doing this... though they require some small local mods to run properly). Each alg is run with "java -Xms1024M -Xmx1024M -Xbatch -server" on OS X 10.5.5, java 1.6.0_07-b06-153. I use two indexes, each with 2M docs. One is docs from Wikipedia (labeled "wiki"), the other is SortableSimpleDocMaker docs augmented to include a random country field (labeled "simple"). For each I created 1-segment, 10-segment and 100-segment indices. I sort by score, doc, string (val, ord = true ord+subord, ordval = ord + fallback). Queue size is 10. I ran various queries... query "147" hits ~5k docs, query "text" hits ~97K docs, query "1" hits 386K docs and alldocs query hits 2M docs. qps is queries per sec and warm is time for first warmup query, from trunk. qpsnew & warmnew are with patch. pctg shows % gain in qps performance: numSeg index sortBy query topN meth hits warm qps warmnew qpsnew pctg 1 wiki score 147 10   4984 0.2 5717.6 0.2 5627.5 -1.6% 1 wiki score text 10   97191 0.3 340.9 0.3 348.8 2.3% 1 wiki score 1 10   386435 0.3 86.7 0.3 89.3 3.0% 1 wiki doc 147 10   4984 0.3 4071.7 0.3 4649.0 14.2% 1 wiki doc text 10   97191 0.3 225.4 0.3 253.7 12.6% 1 wiki doc 1 10   386435 0.3 56.9 0.3 65.8 15.6% 1 wiki doc <all> 10   2000000 0.1 23.0 0.1 38.6 67.8% 1 simple int text 10   2000000 0.7 10.7 0.7 13.5 26.2% 1 simple int <all> 10   2000000 0.6 21.1 0.6 34.7 64.5% 1 simple country text 10 ord 2000000 0.6 10.7 0.6 13.2 23.4% 1 simple country text 10 ordval 2000000 0.6 10.7 0.6 13.3 24.3% 1 simple country <all> 10 ord 2000000 0.5 20.7 0.6 32.5 57.0% 1 simple country <all> 10 ordval 2000000 0.5 20.7 0.6 34.6 67.1% 1 wiki title 147 10 ord 4984 2.1 3743.8 2.0 4210.5 12.5% 1 wiki title 147 10 ordval 4984 2.1 3743.8 2.0 4288.2 14.5% 1 wiki title text 10 ord 97191 2.1 144.2 2.1 160.3 11.2% 1 wiki title text 10 ordval 97191 2.1 144.2 2.1 163.5 13.4% 1 wiki title 1 10 ord 386435 2.1 51.2 2.1 63.2 23.4% 1 wiki title 1 10 ordval 386435 2.1 51.2 2.1 64.6 26.2% 1 wiki title <all> 10 ord 2000000 2.1 21.1 2.1 33.2 57.3% 1 wiki title <all> 10 ordval 2000000 2.1 21.1 2.1 35.4 67.8% numSeg index sortBy query topN meth hits warm qps warmnew qpsnew pctg 10 wiki score 147 10   4984 0.3 4228.3 0.3 4510.6 6.7% 10 wiki score text 10   97191 0.3 294.7 0.3 341.5 15.9% 10 wiki score 1 10   386435 0.4 75.0 0.4 87.0 16.0% 10 wiki doc 147 10   4984 0.3 3332.2 0.3 4033.9 21.1% 10 wiki doc text 10   97191 0.4 217.0 0.4 277.0 27.6% 10 wiki doc 1 10   386435 0.4 54.6 0.4 70.5 29.1% 10 wiki doc <all> 10   2000000 0.1 12.7 0.1 38.6 203.9% 10 simple int text 10   2000000 1.2 10.3 0.6 13.5 31.1% 10 simple int <all> 10   2000000 1.1 11.8 0.8 34.6 193.2% 10 simple country text 10 ord 2000000 0.7 10.4 0.5 13.2 26.9% 10 simple country text 10 ordval 2000000 0.7 10.4 0.5 13.3 27.9% 10 simple country <all> 10 ord 2000000 0.7 11.5 0.5 32.5 182.6% 10 simple country <all> 10 ordval 2000000 0.7 11.5 0.5 34.1 196.5% 10 wiki title 147 10 ord 4984 12.5 3004.5 2.1 3124.0 4.0% 10 wiki title 147 10 ordval 4984 12.5 3004.5 2.1 3353.5 11.6% 10 wiki title text 10 ord 97191 12.7 139.4 2.1 156.7 12.4% 10 wiki title text 10 ordval 97191 12.7 139.4 2.1 160.9 15.4% 10 wiki title 1 10 ord 386435 12.7 50.3 2.1 62.3 23.9% 10 wiki title 1 10 ordval 386435 12.7 50.3 2.1 64.1 27.4% 10 wiki title <all> 10 ord 2000000 12.7 11.4 2.1 33.1 190.4% 10 wiki title <all> 10 ordval 2000000 12.7 11.4 2.1 35.3 209.6% numSeg index sortBy query topN meth hits warm qps warmnew qpsnew pctg 100 wiki score 147 10   4984 0.3 1282.2 1.7 1162.3 -9.4% 100 wiki score text 10   97191 0.4 232.4 1.3 275.6 18.6% 100 wiki score 1 10   386435 0.4 65.1 1.4 80.4 23.5% 100 wiki doc 147 10   4984 0.4 1170.0 0.4 1132.0 -3.2% 100 wiki doc text 10   97191 0.4 171.7 0.4 230.1 34.0% 100 wiki doc 1 10   386435 0.4 46.7 0.4 67.9 45.4% 100 wiki doc <all> 10   2000000 0.2 7.8 0.1 41.6 433.3% 100 simple int text 10   2000000 3.3 8.9 4.0 11.0 23.6% 100 simple int <all> 10   2000000 3.4 7.7 1.1 36.5 374.0% 100 simple country text 10 ord 2000000 1.0 8.8 0.6 10.8 22.7% 100 simple country text 10 ordval 2000000 1.0 8.8 0.6 11.0 25.0% 100 simple country <all> 10 ord 2000000 1.0 7.6 0.5 35.0 360.5% 100 simple country <all> 10 ordval 2000000 1.0 7.6 0.5 36.3 377.6% 100 wiki title 147 10 ord 4984 94.6 1066.9 2.1 583.7 -45.3% 100 wiki title 147 10 ordval 4984 94.6 1066.9 2.1 750.1 -29.7% 100 wiki title text 10 ord 97191 94.9 110.2 2.1 122.7 11.3% 100 wiki title text 10 ordval 97191 94.9 110.2 2.1 128.4 16.5% 100 wiki title 1 10 ord 386435 94.3 47.9 2.1 58.2 21.5% 100 wiki title 1 10 ordval 386435 94.3 47.9 2.1 60.1 25.5% 100 wiki title <all> 10 ord 2000000 94.6 7.8 2.5 35.6 356.4% 100 wiki title <all> 10 ordval 2000000 94.6 7.8 2.4 37.0 374.4% It's a ridiculous amount of of data to digest... but here are some initial thoughts: These are only single term queries; I'd expect for multi term queries the gain would be less since net/net less %tg of the time is spent collecting. Ord + val fallback (ordval) is generally faster than pure ord/subord. I think for now we should run with ord + val fallback? (We can leave ord + subord commented out?). It's great that we see decent speedups for "sort by score" which is presumably the most common sort used. We do get slower in certain cases (neg pctg in the rightmost column): all not-in-the-noise slowdowns were with query "147" on the 100 segment index. This query hits relatively few docs (~5K) So, this is expected, because the new approach spends some time updating its queue for each subreader. If the time spent searching is relatively tiny then this queue update time becomes relatively big. I think with larger queue size the slowdown will be worse. However, I think this is an acceptable tradeoff. The gains for field sorting on a single segment (optimized) index are impressive. Generally, the more hits encountered the better the gains. It's amazing that we see ~67.8% gain sorting by docID, country, and title for alldocs query. My only guess for this is better cache hit rate (because we gain locality by copying values to local compact arrays). Across the board the alldocs query shows very sizable (5X faster for 100 segment index; 3X faster for 10 segment index) improvements. I didn't break out %tg difference, but warming time with the patch is waaaay faster than trunk when index has > 1 segment. Reopen time should also be fantastically faster (we are sidestepping something silly happing w/ FieldCache on a Multi*Reader). Warming on trunk takes ~95 seconds with the 100 segment index!
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          A slightly improved setNextReader for the ords case:

             public void setNextReader(IndexReader reader) throws IOException {
                
                // Map ords in queue to ords in reader
                
                StringIndex currentReaderValues = ExtendedFieldCache.EXT_DEFAULT
                    .getStringIndex(reader, field);
                
                lookup = currentReaderValues.lookup;
                order = currentReaderValues.order;
          
                if (lookup.length == 0) {
                  return;
                }
                
          //      for (int i = 0; i < lookup.length; i++) {
          //        System.out.println("i:" + i + " " + lookup[i]);
          //      }
          
                Map map = new HashMap();
                for (int i = 0; i < slot + 1; i++) {
                  String val = values[i];
                  if (val == null) {
                    continue;
                  }
          
                  int index = binarySearch(lookup, val);
          
                  if (index < 0) {
                    int ord = -index - 2;
                    Integer intOrd = Integer.valueOf(ord);
                    List slotVals = (List) map.get(intOrd);
                    if (slotVals == null) {
                      slotVals = new ArrayList();
                      slotVals.add(new SlotValPair(i, val));
                      map.put(intOrd, slotVals);
                    } else {
                      slotVals.add(new SlotValPair(i, val));
                    }
          
                    ords[i] = ord;
                  } else {
                    ords[i] = index;
                    subords[i] = 0;
                  }
                }
          
                Iterator dit = map.values().iterator();
                while (dit.hasNext()) {
                  List list = (List) dit.next();
                  if (list.size() > 1) {
                    Collections.sort(list);
          
                    for (int i = 0; i < list.size(); i++) {
                      SlotValPair svp = (SlotValPair) list.get(i);
                      subords[svp.i] = i+1;
                    }
                  } else {
                    SlotValPair svp = (SlotValPair) list.get(0);
                    subords[svp.i] = 1;
                  }
                }
          
              }
          
          Show
          markrmiller@gmail.com Mark Miller added a comment - A slightly improved setNextReader for the ords case: public void setNextReader(IndexReader reader) throws IOException { // Map ords in queue to ords in reader StringIndex currentReaderValues = ExtendedFieldCache.EXT_DEFAULT .getStringIndex(reader, field); lookup = currentReaderValues.lookup; order = currentReaderValues.order; if (lookup.length == 0) { return ; } // for ( int i = 0; i < lookup.length; i++) { // System .out.println( "i:" + i + " " + lookup[i]); // } Map map = new HashMap(); for ( int i = 0; i < slot + 1; i++) { String val = values[i]; if (val == null ) { continue ; } int index = binarySearch(lookup, val); if (index < 0) { int ord = -index - 2; Integer intOrd = Integer .valueOf(ord); List slotVals = (List) map.get(intOrd); if (slotVals == null ) { slotVals = new ArrayList(); slotVals.add( new SlotValPair(i, val)); map.put(intOrd, slotVals); } else { slotVals.add( new SlotValPair(i, val)); } ords[i] = ord; } else { ords[i] = index; subords[i] = 0; } } Iterator dit = map.values().iterator(); while (dit.hasNext()) { List list = (List) dit.next(); if (list.size() > 1) { Collections.sort(list); for ( int i = 0; i < list.size(); i++) { SlotValPair svp = (SlotValPair) list.get(i); subords[svp.i] = i+1; } } else { SlotValPair svp = (SlotValPair) list.get(0); subords[svp.i] = 1; } } }
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I left a doubled set in the ords comparator thats no longer needed. We can just use map.values() instead. I'm sure there are small win tricks we can do too - if its anywhere near competitive.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I left a doubled set in the ords comparator thats no longer needed. We can just use map.values() instead. I'm sure there are small win tricks we can do too - if its anywhere near competitive.
          Hide
          mikemccand Michael McCandless added a comment -

          Woops, you're right, sorry, I was looking at an old version...