Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      We need support in Solr for the new TrieRange Lucene functionality.

      1. SOLR-940.patch
        31 kB
        Shalin Shekhar Mangar
      2. SOLR-940.patch
        34 kB
        Shalin Shekhar Mangar
      3. SOLR-940.patch
        42 kB
        Shalin Shekhar Mangar
      4. SOLR-940.patch
        45 kB
        Shalin Shekhar Mangar
      5. SOLR-940.patch
        49 kB
        Shalin Shekhar Mangar
      6. SOLR-940.patch
        49 kB
        Shalin Shekhar Mangar
      7. SOLR-940.patch
        50 kB
        Shalin Shekhar Mangar
      8. SOLR-940.patch
        51 kB
        Shalin Shekhar Mangar
      9. SOLR-940-rangequery.patch
        6 kB
        Shalin Shekhar Mangar
      10. SOLR-940-rangequery.patch
        6 kB
        Shalin Shekhar Mangar
      11. SOLR-940-test.patch
        2 kB
        Shalin Shekhar Mangar
      12. ASF.LICENSE.NOT.GRANTED--SOLR-940-newTrieAPI.patch
        11 kB
        Uwe Schindler
      13. SOLR-940-newTrieAPI.patch
        12 kB
        Uwe Schindler
      14. SOLR-940.patch
        24 kB
        Shalin Shekhar Mangar
      15. SOLR-940.patch
        32 kB
        Shalin Shekhar Mangar
      16. SOLR-940-LUCENE-1602.patch
        3 kB
        Shalin Shekhar Mangar
      17. SOLR-940-LUCENE-1602.patch
        4 kB
        Uwe Schindler
      18. SOLR-940-LUCENE-1701.patch
        17 kB
        Uwe Schindler
      19. SOLR-940-LUCENE-1701.patch
        68 kB
        Shalin Shekhar Mangar
      20. SOLR-940-LUCENE-1701-addition.patch
        18 kB
        Uwe Schindler
      21. SOLR-940-1261-1241.patch
        83 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 794328.

          Thanks Uwe and Mike!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 794328. Thanks Uwe and Mike!
          Hide
          Uwe Schindler added a comment -

          Patch looks good!

          Show
          Uwe Schindler added a comment - Patch looks good!
          Hide
          Shalin Shekhar Mangar added a comment -

          Attached patch which combines SOLR-940, SOLR-1261 and SOLR-1241 which need to be committed together to avoid compile errors.

          I'll upgrade Lucene jars to Lucene 2.9-dev r794238.

          All tests pass. I'll commit shortly.

          Show
          Shalin Shekhar Mangar added a comment - Attached patch which combines SOLR-940 , SOLR-1261 and SOLR-1241 which need to be committed together to avoid compile errors. I'll upgrade Lucene jars to Lucene 2.9-dev r794238. All tests pass. I'll commit shortly.
          Hide
          Uwe Schindler added a comment -

          I think your problem is solved now (thanks Mike).

          If you update to latest trunk, you must also apply SOLR-1261 (rename of RangeQuery to TermRangeQuery).

          Show
          Uwe Schindler added a comment - I think your problem is solved now (thanks Mike). If you update to latest trunk, you must also apply SOLR-1261 (rename of RangeQuery to TermRangeQuery).
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Uwe!

          I'm having trouble figuring out the root cause behind the failure of QueryElevationComponentTest. When elevation is enabled, it seems to be sorting by score desc even if score asc is specified. I've written a testcase which I'll post to java-user to get some info on what could be going wrong.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Uwe! I'm having trouble figuring out the root cause behind the failure of QueryElevationComponentTest. When elevation is enabled, it seems to be sorting by score desc even if score asc is specified. I've written a testcase which I'll post to java-user to get some info on what could be going wrong.
          Hide
          Uwe Schindler added a comment -

          Same patch updated, but uses the new feature of TrieRange to specify any large precStep to index only one token (uses now Integer.MAX_VALUE as precStep for the query tokenizer).

          Show
          Uwe Schindler added a comment - Same patch updated, but uses the new feature of TrieRange to specify any large precStep to index only one token (uses now Integer.MAX_VALUE as precStep for the query tokenizer).
          Hide
          Uwe Schindler added a comment -

          Hi Shalin,

          here is an additional patch (but only for the trie parts), that is more intelligent and also uses NumericTokenStream for the query time factory. Your previous patch must be applied, then revert the changes in analysis.TrieXxxxTokenizerFactory and TrieField. Then apply the patch, which removes the old factories and creates a new one TrieTokenizerFactory. It should compile, but not really tested (it is hard to apply all your changes). If there are compile errors, they can be easily fixed

          The idea is to use the same tokenstream for query time analysis. To only produce the highest precision token needed for that, it is simply using a precisionStep of 32 for int/float and 64 for long/double/date of the former TrieIndexTokenizerFactory. No magic with KeywordTokenizer needed. NumericUtils, which is a expert Lucene class (not really public) is not needed anymore.

          Show
          Uwe Schindler added a comment - Hi Shalin, here is an additional patch (but only for the trie parts), that is more intelligent and also uses NumericTokenStream for the query time factory. Your previous patch must be applied, then revert the changes in analysis.TrieXxxxTokenizerFactory and TrieField. Then apply the patch, which removes the old factories and creates a new one TrieTokenizerFactory. It should compile, but not really tested (it is hard to apply all your changes). If there are compile errors, they can be easily fixed The idea is to use the same tokenstream for query time analysis. To only produce the highest precision token needed for that, it is simply using a precisionStep of 32 for int/float and 64 for long/double/date of the former TrieIndexTokenizerFactory. No magic with KeywordTokenizer needed. NumericUtils, which is a expert Lucene class (not really public) is not needed anymore.
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch includes Uwe's last patch, changes related to LUCENE-1614 and SOLR-1241.

          The QueryElevationComponentTest fails with Lucene trunk which I'll look into.

          Show
          Shalin Shekhar Mangar added a comment - This patch includes Uwe's last patch, changes related to LUCENE-1614 and SOLR-1241 . The QueryElevationComponentTest fails with Lucene trunk which I'll look into.
          Hide
          Shalin Shekhar Mangar added a comment -

          Yes, it now works with RangeQuery/Filter (as before), NumericRangeQuery/Filter and FieldCacheRangeFilter.

          Super, Thanks!

          Show
          Shalin Shekhar Mangar added a comment - Yes, it now works with RangeQuery/Filter (as before), NumericRangeQuery/Filter and FieldCacheRangeFilter. Super, Thanks!
          Hide
          Uwe Schindler added a comment -

          Yes, it now works with RangeQuery/Filter (as before), NumericRangeQuery/Filter and FieldCacheRangeFilter.

          I will fix the strange usage of Term instance when we deprecate the old RangeQuery in favour of TermRangeQuery & Co. (LUCENE-1713).
          The current check in RangeQuery ony prevents you to create a RangeQuery using the Term instances (instead of field, string, string), where both are null (because with both terms entirely null, no field name is available).

          Show
          Uwe Schindler added a comment - Yes, it now works with RangeQuery/Filter (as before), NumericRangeQuery/Filter and FieldCacheRangeFilter. I will fix the strange usage of Term instance when we deprecate the old RangeQuery in favour of TermRangeQuery & Co. ( LUCENE-1713 ). The current check in RangeQuery ony prevents you to create a RangeQuery using the Term instances (instead of field, string, string), where both are null (because with both terms entirely null, no field name is available).
          Hide
          Uwe Schindler added a comment -

          I think you are fixing it the wrong way.

          You misunderstood, I meant:
          I fix it, that it is clear what it really does. I will not change RangeQuerys behaviour, I will remove the whole internal Term handling in LUCENE-1713 and only use String field, lower, upper. Then it is clear how it works. The current code has this strange behaviour (how it handles Term instances) because of the retrofitting of RangeQuery to MultiTermQuery.

          Show
          Uwe Schindler added a comment - I think you are fixing it the wrong way. You misunderstood, I meant: I fix it, that it is clear what it really does. I will not change RangeQuerys behaviour, I will remove the whole internal Term handling in LUCENE-1713 and only use String field, lower, upper. Then it is clear how it works. The current code has this strange behaviour (how it handles Term instances) because of the retrofitting of RangeQuery to MultiTermQuery.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          I think you are fixing it the wrong way.

          Why should it not be allowed? This is something which has worked since a long time. I don't think it is a bug and it is useful at times.

          Sorry I posted too soon.

          Reading your comment again, I guess that you are indeed going to support such queries?

          Show
          Shalin Shekhar Mangar added a comment - - edited I think you are fixing it the wrong way. Why should it not be allowed? This is something which has worked since a long time. I don't think it is a bug and it is useful at times. Sorry I posted too soon. Reading your comment again, I guess that you are indeed going to support such queries?
          Hide
          Uwe Schindler added a comment -

          Fixed in Lucene trunk rev 789692. The strange null handling in RangeQuery (which caused by change) will be fixed together in LUCENE-1713, when RangeQuery will be deprecated and renamed.

          Show
          Uwe Schindler added a comment - Fixed in Lucene trunk rev 789692. The strange null handling in RangeQuery (which caused by change) will be fixed together in LUCENE-1713 , when RangeQuery will be deprecated and renamed.
          Hide
          Uwe Schindler added a comment -

          You are right, but normally a new Term(field,null) should be not allowed. The init method should normally prevent this, but only checks for the terms ==null. The RangeTermEnum is then positioned on the null term (should be "").

          I will change this back (also in FieldCacheRangeFilter) and fix the wrong logic of RangeQuery to clearly support it.

          Show
          Uwe Schindler added a comment - You are right, but normally a new Term(field,null) should be not allowed. The init method should normally prevent this, but only checks for the terms ==null. The RangeTermEnum is then positioned on the null term (should be ""). I will change this back (also in FieldCacheRangeFilter) and fix the wrong logic of RangeQuery to clearly support it.
          Hide
          Shalin Shekhar Mangar added a comment -

          the reason was, that all other range filters in lucene core do not allow this.

          If you look at RangeQuery constructor, it creates a new Term instance (even for null lower and upper) so an open ended search executes fine.

          In general one should use a MatchAllDocsQuery in this case, as it is more performant

          But a MatchAllDocsQuery is not equivalent to this when some documents do not have a value for this field. For example, fq=: AND -f:[* TO *] will match all documents which do not have a value for field f.

          Show
          Shalin Shekhar Mangar added a comment - the reason was, that all other range filters in lucene core do not allow this. If you look at RangeQuery constructor, it creates a new Term instance (even for null lower and upper) so an open ended search executes fine. In general one should use a MatchAllDocsQuery in this case, as it is more performant But a MatchAllDocsQuery is not equivalent to this when some documents do not have a value for this field. For example, fq= : AND -f: [* TO *] will match all documents which do not have a value for field f.
          Hide
          Uwe Schindler added a comment - - edited

          Oh, this was intended.

          the reason was, that all other range filters in lucene core do not allow this. In general one should use a MatchAllDocsQuery in this case, as it is more performant.
          I could enable it again, but I have to think about the other range queries and filters then.

          How do you handle that with other range queries?

          Show
          Uwe Schindler added a comment - - edited Oh, this was intended. the reason was, that all other range filters in lucene core do not allow this. In general one should use a MatchAllDocsQuery in this case, as it is more performant. I could enable it again, but I have to think about the other range queries and filters then. How do you handle that with other range queries?
          Hide
          Shalin Shekhar Mangar added a comment -

          Uwe, is there a reason to disallow fully open ranges?

          With the previous IntTrieRangeFilter, I could do a query for field:[* TO *] but this is not allowed anymore because NumericRangeQuery can take only one of the boundaries as null but not both.

          Show
          Shalin Shekhar Mangar added a comment - Uwe, is there a reason to disallow fully open ranges? With the previous IntTrieRangeFilter, I could do a query for field: [* TO *] but this is not allowed anymore because NumericRangeQuery can take only one of the boundaries as null but not both.
          Hide
          Michael McCandless added a comment -

          OK try again? Maybe 3rd time's the charm...

          Show
          Michael McCandless added a comment - OK try again? Maybe 3rd time's the charm...
          Hide
          Michael McCandless added a comment -

          Sigh. I'll go reopen LUCENE-1630!

          Show
          Michael McCandless added a comment - Sigh. I'll go reopen LUCENE-1630 !
          Hide
          Shalin Shekhar Mangar added a comment -

          OK I just committed a fix for LUCENE-1630 that should fix that exception

          Thanks Mike. I upgraded to lucene trunk and something is still not right. Now I see a StackOverflowException:

          java.lang.StackOverflowError
          at org.apache.solr.search.function.FunctionQuery.rewrite(FunctionQuery.java:50)
          at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:291)
          at org.apache.lucene.search.Query.queryWeight(Query.java:125)
          at org.apache.lucene.search.Query.weight(Query.java:117)
          at org.apache.lucene.search.Query.createQueryWeight(Query.java:108)
          at org.apache.lucene.search.Query.queryWeight(Query.java:126)
          at org.apache.lucene.search.Query.weight(Query.java:117)
          at org.apache.lucene.search.Query.createQueryWeight(Query.java:108)
          at org.apache.lucene.search.Query.queryWeight(Query.java:126)
          at org.apache.lucene.search.Query.weight(Query.java:117)

          Show
          Shalin Shekhar Mangar added a comment - OK I just committed a fix for LUCENE-1630 that should fix that exception Thanks Mike. I upgraded to lucene trunk and something is still not right. Now I see a StackOverflowException: java.lang.StackOverflowError at org.apache.solr.search.function.FunctionQuery.rewrite(FunctionQuery.java:50) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:291) at org.apache.lucene.search.Query.queryWeight(Query.java:125) at org.apache.lucene.search.Query.weight(Query.java:117) at org.apache.lucene.search.Query.createQueryWeight(Query.java:108) at org.apache.lucene.search.Query.queryWeight(Query.java:126) at org.apache.lucene.search.Query.weight(Query.java:117) at org.apache.lucene.search.Query.createQueryWeight(Query.java:108) at org.apache.lucene.search.Query.queryWeight(Query.java:126) at org.apache.lucene.search.Query.weight(Query.java:117)
          Hide
          Michael McCandless added a comment -

          OK I just committed a fix for LUCENE-1630 that should fix that exception.

          Show
          Michael McCandless added a comment - OK I just committed a fix for LUCENE-1630 that should fix that exception.
          Hide
          Michael McCandless added a comment -

          Shalin I think that exception you got is a break in back-compat. Sorry I'm reopening LUCENE-1630 to fix it...

          Show
          Michael McCandless added a comment - Shalin I think that exception you got is a break in back-compat. Sorry I'm reopening LUCENE-1630 to fix it...
          Hide
          Uwe Schindler added a comment -

          Regarding Collector#acceptsDocsOutOfOrder, I think we need to

          1. Return true when we do not need scores, otherwise false.
          2. DocSetCollector and DocSetDelegateCollector collect in order so we return false
            It'd be great if someone who know more about this stuff can confirm.

          My explanation without guarantee: If you set it to true or false depends on your collector not on the type of query or sorting or you need scores. It gives the query engine a hint, if it is possible to deliver the doc ids out of order.

          Simple case is the example in the Collector JavaDocs: if you just mark the docids in an OpenBitSet, the order is irrelevant (bitset is not faster/slower when it does not get the docs in correct order). On the other hand collectors like TopDocs and so on can be optimized to be faster when the docs come in order. One example would be: if you read stored fields of documents using the setNextReader() given indexReader, it may be good to have the docs in order to avoid back/forward seeking all the time.

          I'm also seeing this exception in many tests (DisMaxRequestHandlerTest, TestTrie, TestDistributedSearch) which, I guess, are related to LUCENE-1630

          I think, this is because you have a custom query type which implements an own weight. There are possibilities to fix this using a wrapper, not sure.

          Show
          Uwe Schindler added a comment - Regarding Collector#acceptsDocsOutOfOrder, I think we need to Return true when we do not need scores, otherwise false. DocSetCollector and DocSetDelegateCollector collect in order so we return false It'd be great if someone who know more about this stuff can confirm. My explanation without guarantee: If you set it to true or false depends on your collector not on the type of query or sorting or you need scores. It gives the query engine a hint, if it is possible to deliver the doc ids out of order. Simple case is the example in the Collector JavaDocs: if you just mark the docids in an OpenBitSet, the order is irrelevant (bitset is not faster/slower when it does not get the docs in correct order). On the other hand collectors like TopDocs and so on can be optimized to be faster when the docs come in order. One example would be: if you read stored fields of documents using the setNextReader() given indexReader, it may be good to have the docs in order to avoid back/forward seeking all the time. I'm also seeing this exception in many tests (DisMaxRequestHandlerTest, TestTrie, TestDistributedSearch) which, I guess, are related to LUCENE-1630 I think, this is because you have a custom query type which implements an own weight. There are possibilities to fix this using a wrapper, not sure.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Uwe!

          Regarding Collector#acceptsDocsOutOfOrder, I think we need to

          1. Return true when we do not need scores, otherwise false.
          2. DocSetCollector and DocSetDelegateCollector collect in order so we return false

          It'd be great if someone who know more about this stuff can confirm.

          SOLR-1241 must also be committed together with this issue to avoid compile errors.

          I'm also seeing this exception in many tests (DisMaxRequestHandlerTest, TestTrie, TestDistributedSearch) which, I guess, are related to LUCENE-1630

          SEVERE: java.lang.UnsupportedOperationException
          at org.apache.lucene.search.Query.createQueryWeight(Query.java:102)
          at org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:185)
          at org.apache.lucene.search.BooleanQuery.createQueryWeight(BooleanQuery.java:401)
          at org.apache.lucene.search.Query.queryWeight(Query.java:120)
          at org.apache.lucene.search.Searcher.createQueryWeight(Searcher.java:237)
          at org.apache.lucene.search.Searcher.search(Searcher.java:173)
          at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1103)
          at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
          at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
          at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176)
          at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:1290)

          I'll try to have another look tomorrow.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Uwe! Regarding Collector#acceptsDocsOutOfOrder, I think we need to Return true when we do not need scores, otherwise false. DocSetCollector and DocSetDelegateCollector collect in order so we return false It'd be great if someone who know more about this stuff can confirm. SOLR-1241 must also be committed together with this issue to avoid compile errors. I'm also seeing this exception in many tests (DisMaxRequestHandlerTest, TestTrie, TestDistributedSearch) which, I guess, are related to LUCENE-1630 SEVERE: java.lang.UnsupportedOperationException at org.apache.lucene.search.Query.createQueryWeight(Query.java:102) at org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:185) at org.apache.lucene.search.BooleanQuery.createQueryWeight(BooleanQuery.java:401) at org.apache.lucene.search.Query.queryWeight(Query.java:120) at org.apache.lucene.search.Searcher.createQueryWeight(Searcher.java:237) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1103) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:176) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1290) I'll try to have another look tomorrow.
          Hide
          Uwe Schindler added a comment -

          Patch with changes for new Trie API in Lucene Core, the term "trie" does not appear anymore in Lucene (its now NumericRangeQuery, NumericTokenStream, NumericField, NumericUtils). This patch only contains changes for Trie and FieldCache/ExtendedFieldCache merging (as this affects trie, ExtendedFieldCache was deprecated in Lucene and merged into FieldCache. LongParsers now extend FieldCache.LongParser, for backwards compatibility there is a ExFieldCache.LongParser, too, but the new TrieAPI cannot handle this. So all occurences to ExtendedFieldCache must be removed from Solr)

          The latest changes to Collector (new abstract method handleDocsOutOfOrder()) are not handled!!! Patch is therefore untested, but should work.

          There is also FSDirectory-Factory of Solr changed to use the new FSDirectory.open() call that is the same like your factory (chooses dir dependent on platform).

          Show
          Uwe Schindler added a comment - Patch with changes for new Trie API in Lucene Core, the term "trie" does not appear anymore in Lucene (its now NumericRangeQuery, NumericTokenStream, NumericField, NumericUtils). This patch only contains changes for Trie and FieldCache/ExtendedFieldCache merging (as this affects trie, ExtendedFieldCache was deprecated in Lucene and merged into FieldCache. LongParsers now extend FieldCache.LongParser, for backwards compatibility there is a ExFieldCache.LongParser, too, but the new TrieAPI cannot handle this. So all occurences to ExtendedFieldCache must be removed from Solr) The latest changes to Collector (new abstract method handleDocsOutOfOrder()) are not handled!!! Patch is therefore untested, but should work. There is also FSDirectory-Factory of Solr changed to use the new FSDirectory.open() call that is the same like your factory (chooses dir dependent on platform).
          Hide
          Uwe Schindler added a comment -

          The first part of the move to core is done, when the second part (LUCENE-1701) is done, I will post a patch!

          Show
          Uwe Schindler added a comment - The first part of the move to core is done, when the second part ( LUCENE-1701 ) is done, I will post a patch!
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 768240.

          I also added a method SolrIndexSearcher#search(Weight, Filter, Collector) to fix a compile error.

          Thanks Uwe!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 768240. I also added a method SolrIndexSearcher#search(Weight, Filter, Collector) to fix a compile error. Thanks Uwe!
          Hide
          Uwe Schindler added a comment -

          I modified the patch a little bit to also include an updated documentation about sorting and function queries.

          Show
          Uwe Schindler added a comment - I modified the patch a little bit to also include an updated documentation about sorting and function queries.
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch to incorporate LUCENE-1602

          Need to upgrade Lucene jars before we can commit this.

          Show
          Shalin Shekhar Mangar added a comment - Patch to incorporate LUCENE-1602 Need to upgrade Lucene jars before we can commit this.
          Hide
          Shalin Shekhar Mangar added a comment -

          Lets keep this issue open until trie is in core.

          Show
          Shalin Shekhar Mangar added a comment - Lets keep this issue open until trie is in core.
          Hide
          Uwe Schindler added a comment -

          Again a change....
          *TrieRangeQuery is now available as separate class, *TrieRangeFilter is not needed for Solr range queries (LUCENE-1602). It has now equal sematics liek RangeQuery and can also be switched between constant score and boolean query rewrite.
          The next change will be the move to core, package renames and a possibly new name NumericRangeQuery in Lucene core (see java-dev@lucene discussions). Stay tuned.

          Show
          Uwe Schindler added a comment - Again a change.... *TrieRangeQuery is now available as separate class, *TrieRangeFilter is not needed for Solr range queries ( LUCENE-1602 ). It has now equal sematics liek RangeQuery and can also be switched between constant score and boolean query rewrite. The next change will be the move to core, package renames and a possibly new name NumericRangeQuery in Lucene core (see java-dev@lucene discussions). Stay tuned.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 764291.

          Thanks Uwe!

          1. Updating Lucene jars
          2. Updating Trie field types per Lucene's changes
          3. Adding ReverseStringFilterFactory
          4. Fix compile errors related to LUCENE-1500

          Committing all the above changes in one go to avoid compile errors due to Lucene API updates (except for ReverseStringFilterFactory).

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 764291. Thanks Uwe! Updating Lucene jars Updating Trie field types per Lucene's changes Adding ReverseStringFilterFactory Fix compile errors related to LUCENE-1500 Committing all the above changes in one go to avoid compile errors due to Lucene API updates (except for ReverseStringFilterFactory).
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. Added Tests for Sorting on all trie type fields
          2. Return a LongFieldSource for trie date types
          3. Added Tests for function queries on all trie type fields
          4. Upgraded Lucene jars to r764281
          5. Created new ReverseStringFilterFactory for ReverseStringFilter through ant stub-factories

          All tests pass.

          This patch also contains changes for SOLR-1079 and LUCENE-1500. These are enough changes for one issue. I'll commit this shortly and then we can deal with sorting in distributed search through a new issue.

          Show
          Shalin Shekhar Mangar added a comment - Changes: Added Tests for Sorting on all trie type fields Return a LongFieldSource for trie date types Added Tests for function queries on all trie type fields Upgraded Lucene jars to r764281 Created new ReverseStringFilterFactory for ReverseStringFilter through ant stub-factories All tests pass. This patch also contains changes for SOLR-1079 and LUCENE-1500 . These are enough changes for one issue. I'll commit this shortly and then we can deal with sorting in distributed search through a new issue.
          Hide
          Uwe Schindler added a comment -

          I'm also not very familiar with that code in QueryComponent but I guess that is executed only when field-sort-values are requested (for distributed search). I wrote tests for sorting and it works fine! So I think the problem will only be during Distributed Search. I'll modify TestDistributedSearch to test sorting of trie fields to be sure. If it doesn't, I'll open another issue to replace the deprecated ScoreDocComparator with FieldComparator.

          OK. If distributed search does not work, the problems are bigger: The problem is not the comparator alone, the problem is the FieldCache. The distributed search should fill the values into FieldCache and then let the comparator do the work. Comparing lucenes code with the solr ones shows, that there are some parts of LUCENE-1478 missing. The Comparators use the default parser instead of the one given in SortField.getParser() to parse the values (when retrieving FieldCache.getInts() & Co).

          I am not really sure, why Solr needs to duplicate the sorting code from Lucene? Maybe this is no longer needed? In this case, everything would be ok when removed.

          Show
          Uwe Schindler added a comment - I'm also not very familiar with that code in QueryComponent but I guess that is executed only when field-sort-values are requested (for distributed search). I wrote tests for sorting and it works fine! So I think the problem will only be during Distributed Search. I'll modify TestDistributedSearch to test sorting of trie fields to be sure. If it doesn't, I'll open another issue to replace the deprecated ScoreDocComparator with FieldComparator. OK. If distributed search does not work, the problems are bigger: The problem is not the comparator alone, the problem is the FieldCache. The distributed search should fill the values into FieldCache and then let the comparator do the work. Comparing lucenes code with the solr ones shows, that there are some parts of LUCENE-1478 missing. The Comparators use the default parser instead of the one given in SortField.getParser() to parse the values (when retrieving FieldCache.getInts() & Co). I am not really sure, why Solr needs to duplicate the sorting code from Lucene? Maybe this is no longer needed? In this case, everything would be ok when removed.
          Hide
          Shalin Shekhar Mangar added a comment -

          One note to sorting:
          I am not really sure, if sorting works with Solr. The Sortfield returned by TrieUtils.getSortField contains an own parser (new feature in Lucene 2.9). When looking through the solr code, searching for SortField in trunk, I noticed, that QueryComponent has own comparators and FieldCache code (duplicating the Lucene code), and ignoring the parser given in SortField (the parser is not passed to FieldCache.getInts() & Co.).

          If this is the case, it will simply not work. As I do not know anything about the internals of Solr and what QueryComponent does, so can you create a test-case that tests sorting of trie fields?

          I'm also not very familiar with that code in QueryComponent but I guess that is executed only when field-sort-values are requested (for distributed search). I wrote tests for sorting and it works fine! So I think the problem will only be during Distributed Search. I'll modify TestDistributedSearch to test sorting of trie fields to be sure. If it doesn't, I'll open another issue to replace the deprecated ScoreDocComparator with FieldComparator.

          Updated patch, that supports ValueSource (currently not for Date Trie fields, I do not know how this should work, the orginal DateField uses a StringIndex as ValueSource, which is not possible for trie date fields, as no parser available and if using the standard string index, would fail because of more than one term/doc). Some tests for function queries are needed (especially as Double and FloatParser are not tested by Lucene at the moment), maybe change a test for conventional XxxFields to do the same test with a trie field.

          I'll write tests for these as well. But trie date is just a trie long field so we should be able to use a LongFieldSource for this, right?

          Show
          Shalin Shekhar Mangar added a comment - One note to sorting: I am not really sure, if sorting works with Solr. The Sortfield returned by TrieUtils.getSortField contains an own parser (new feature in Lucene 2.9). When looking through the solr code, searching for SortField in trunk, I noticed, that QueryComponent has own comparators and FieldCache code (duplicating the Lucene code), and ignoring the parser given in SortField (the parser is not passed to FieldCache.getInts() & Co.). If this is the case, it will simply not work. As I do not know anything about the internals of Solr and what QueryComponent does, so can you create a test-case that tests sorting of trie fields? I'm also not very familiar with that code in QueryComponent but I guess that is executed only when field-sort-values are requested (for distributed search). I wrote tests for sorting and it works fine! So I think the problem will only be during Distributed Search. I'll modify TestDistributedSearch to test sorting of trie fields to be sure. If it doesn't, I'll open another issue to replace the deprecated ScoreDocComparator with FieldComparator. Updated patch, that supports ValueSource (currently not for Date Trie fields, I do not know how this should work, the orginal DateField uses a StringIndex as ValueSource, which is not possible for trie date fields, as no parser available and if using the standard string index, would fail because of more than one term/doc). Some tests for function queries are needed (especially as Double and FloatParser are not tested by Lucene at the moment), maybe change a test for conventional XxxFields to do the same test with a trie field. I'll write tests for these as well. But trie date is just a trie long field so we should be able to use a LongFieldSource for this, right?
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch includes all of Uwe's changes in addition to SOLR-1079 and another change to SolrHighlighter to accomodate LUCENE-1500.

          All tests pass.

          Show
          Shalin Shekhar Mangar added a comment - This patch includes all of Uwe's changes in addition to SOLR-1079 and another change to SolrHighlighter to accomodate LUCENE-1500 . All tests pass.
          Hide
          Uwe Schindler added a comment -

          I attached a patch to SOLR-1079 to fix the QueryComponent problem (remove the StringFieldable).

          Show
          Uwe Schindler added a comment - I attached a patch to SOLR-1079 to fix the QueryComponent problem (remove the StringFieldable).
          Hide
          Uwe Schindler added a comment -

          One note to sorting:
          I am not really sure, if sorting works with Solr. The Sortfield returned by TrieUtils.getSortField contains an own parser (new feature in Lucene 2.9). When looking through the solr code, searching for SortField in trunk, I noticed, that QueryComponent has own comparators and FieldCache code (duplicating the Lucene code), and ignoring the parser given in SortField (the parser is not passed to FieldCache.getInts() & Co.).

          If this is the case, it will simply not work. As I do not know anything about the internals of Solr and what QueryComponent does, so can you create a test-case that tests sorting of trie fields?

          By the way: In QueryComponent is a package-private StringFieldable just to convert the strings. Why not simply use a conventional Field instance to do this, why implement the whole interface? You can do everything done with this StringFieldable with Field, too. This is the problem of the omitTf thing: the interface changed again in Lucene 2.9, needing a change in this class. Replacing this by a simple reuseable Field instance solves the interface problem completely.

          Show
          Uwe Schindler added a comment - One note to sorting: I am not really sure, if sorting works with Solr. The Sortfield returned by TrieUtils.getSortField contains an own parser (new feature in Lucene 2.9). When looking through the solr code, searching for SortField in trunk, I noticed, that QueryComponent has own comparators and FieldCache code (duplicating the Lucene code), and ignoring the parser given in SortField (the parser is not passed to FieldCache.getInts() & Co.). If this is the case, it will simply not work. As I do not know anything about the internals of Solr and what QueryComponent does, so can you create a test-case that tests sorting of trie fields? By the way: In QueryComponent is a package-private StringFieldable just to convert the strings. Why not simply use a conventional Field instance to do this, why implement the whole interface? You can do everything done with this StringFieldable with Field, too. This is the problem of the omitTf thing: the interface changed again in Lucene 2.9, needing a change in this class. Replacing this by a simple reuseable Field instance solves the interface problem completely.
          Hide
          Shalin Shekhar Mangar added a comment -

          Re-opening to incorporate changes in Lucene.

          Show
          Shalin Shekhar Mangar added a comment - Re-opening to incorporate changes in Lucene.
          Hide
          Uwe Schindler added a comment -

          The change is now committed in Lucene trunk!
          Shalin: Can you reopen this issue (I cannot do this), to not forget about it?

          Show
          Uwe Schindler added a comment - The change is now committed in Lucene trunk! Shalin: Can you reopen this issue (I cannot do this), to not forget about it?
          Hide
          Uwe Schindler added a comment -

          Updated patch, that supports ValueSource (currently not for Date Trie fields, I do not know how this should work, the orginal DateField uses a StringIndex as ValueSource, which is not possible for trie date fields, as no parser available and if using the standard string index, would fail because of more than one term/doc). Some tests for function queries are needed (especially as Double and FloatParser are not tested by Lucene at the moment), maybe change a test for conventional XxxFields to do the same test with a trie field.

          Show
          Uwe Schindler added a comment - Updated patch, that supports ValueSource (currently not for Date Trie fields, I do not know how this should work, the orginal DateField uses a StringIndex as ValueSource, which is not possible for trie date fields, as no parser available and if using the standard string index, would fail because of more than one term/doc). Some tests for function queries are needed (especially as Double and FloatParser are not tested by Lucene at the moment), maybe change a test for conventional XxxFields to do the same test with a trie field.
          Hide
          Shalin Shekhar Mangar added a comment -

          Did the LUCENE-1582 patch apply to Lucene correctly?

          Yes, that one applies fine. I think going ahead with LUCENE-1582, SOLR-1079 and then looking at this patch will make things easier.

          Show
          Shalin Shekhar Mangar added a comment - Did the LUCENE-1582 patch apply to Lucene correctly? Yes, that one applies fine. I think going ahead with LUCENE-1582 , SOLR-1079 and then looking at this patch will make things easier.
          Hide
          Uwe Schindler added a comment -

          I'm having trouble applying the patch:

          I created the patch from the SVN trunk checkout yesterday. Maybe it is in windows-format with CR-LF. For me it applies cleanly using TortoiseSVN merge function.

          Did the LUCENE-1582 patch apply to Lucene correctly?

          Show
          Uwe Schindler added a comment - I'm having trouble applying the patch: I created the patch from the SVN trunk checkout yesterday. Maybe it is in windows-format with CR-LF. For me it applies cleanly using TortoiseSVN merge function. Did the LUCENE-1582 patch apply to Lucene correctly?
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Uwe!

          I'm having trouble applying the patch:

          shalinsmangar@shalinsmangar-laptop:~/work/oss/solr-trunk$ patch --dry-run -p0 < /home/shalinsmangar/Desktop/SOLR-940-newTrieAPI.patch 
          (Stripping trailing CRs from patch.)
          patching file example/solr/conf/schema.xml
          (Stripping trailing CRs from patch.)
          patching file src/java/org/apache/solr/analysis/TrieIndexTokenizerFactory.java
          Hunk #3 FAILED at 51.
          1 out of 3 hunks FAILED -- saving rejects to file src/java/org/apache/solr/analysis/TrieIndexTokenizerFactory.java.rej
          (Stripping trailing CRs from patch.)
          patching file src/java/org/apache/solr/analysis/TrieQueryTokenizerFactory.java
          (Stripping trailing CRs from patch.)
          patching file src/java/org/apache/solr/schema/TrieField.java
          

          No biggie, I'll take care of it.

          I forget to mention: with LUCENE-1582 and this patch, sorting now works for trie fields.

          That is great news!

          About function queries: If they use the "normal" field cache (long, int, double, float) with the supplied trie parser

          The function query stuff does use FieldCache but through the ValueSource abstraction. It should be possible by creating a TrieValueSource which uses the trie field cache parsers when creating the value source.

          By the way, the change needed for compilation with the new Lucene JARs is the omitTf thing (SOLR-1079).

          Ok, I think we can commit that first as soon as there is consensus on the name.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Uwe! I'm having trouble applying the patch: shalinsmangar@shalinsmangar-laptop:~/work/oss/solr-trunk$ patch --dry-run -p0 < /home/shalinsmangar/Desktop/SOLR-940-newTrieAPI.patch (Stripping trailing CRs from patch.) patching file example/solr/conf/schema.xml (Stripping trailing CRs from patch.) patching file src/java/org/apache/solr/analysis/TrieIndexTokenizerFactory.java Hunk #3 FAILED at 51. 1 out of 3 hunks FAILED -- saving rejects to file src/java/org/apache/solr/analysis/TrieIndexTokenizerFactory.java.rej (Stripping trailing CRs from patch.) patching file src/java/org/apache/solr/analysis/TrieQueryTokenizerFactory.java (Stripping trailing CRs from patch.) patching file src/java/org/apache/solr/schema/TrieField.java No biggie, I'll take care of it. I forget to mention: with LUCENE-1582 and this patch, sorting now works for trie fields. That is great news! About function queries: If they use the "normal" field cache (long, int, double, float) with the supplied trie parser The function query stuff does use FieldCache but through the ValueSource abstraction. It should be possible by creating a TrieValueSource which uses the trie field cache parsers when creating the value source. By the way, the change needed for compilation with the new Lucene JARs is the omitTf thing ( SOLR-1079 ). Ok, I think we can commit that first as soon as there is consensus on the name.
          Hide
          Uwe Schindler added a comment -

          I forget to mention: with LUCENE-1582 and this patch, sorting now works for trie fields. I changed the schema.xml in the patch to note this.

          About function queries: If they use the "normal" field cache (long, int, double, float) with the supplied trie parser (as the trie SortField factory does), it would work. The parser for the nurmeric values is also separately available in TrieUtils. But I do not know, how to enable this in Solr (SortField support is available through the schema), maybe you can do this, or change the comments.

          By the way, the change needed for compilation with the new Lucene JARs is the omitTf thing (SOLR-1079), I have done this in my local checkout to be able to create this patch.

          Show
          Uwe Schindler added a comment - I forget to mention: with LUCENE-1582 and this patch, sorting now works for trie fields. I changed the schema.xml in the patch to note this. About function queries: If they use the "normal" field cache (long, int, double, float) with the supplied trie parser (as the trie SortField factory does), it would work. The parser for the nurmeric values is also separately available in TrieUtils. But I do not know, how to enable this in Solr (SortField support is available through the schema), maybe you can do this, or change the comments. By the way, the change needed for compilation with the new Lucene JARs is the omitTf thing ( SOLR-1079 ), I have done this in my local checkout to be able to create this patch.
          Hide
          Uwe Schindler added a comment -

          This patch modifies Solr support for trie fields to the new Trie API (not committed until now).
          This class simplifies the TokenizerFactories (no Solr-internal indexing Tokenizer needed anymore as trie API supplies TokenStream). The TrieQueryTokenizerFactory was simplified to use KeywordTokenizer instead of implementing an own one (this change can be left of, if you like your solution more).
          For this to compile and work, the latest trunk builds of Lucene must be placed in lib and another small change because of a change in Fieldable interface must be added (not included in patch).

          Show
          Uwe Schindler added a comment - This patch modifies Solr support for trie fields to the new Trie API (not committed until now). This class simplifies the TokenizerFactories (no Solr-internal indexing Tokenizer needed anymore as trie API supplies TokenStream). The TrieQueryTokenizerFactory was simplified to use KeywordTokenizer instead of implementing an own one (this change can be left of, if you like your solution more). For this to compile and work, the latest trunk builds of Lucene must be placed in lib and another small change because of a change in Fieldable interface must be added (not included in patch).
          Hide
          Uwe Schindler added a comment -

          I created a new issue LUCENE-1582 to fix the sorting problem and also support a TokenStream directly by trieCodeLong/Int(). The API will change, but this would be simplification for the Solr implementation (as the TokenStream can be directly used) and is more memory efficient.

          Show
          Uwe Schindler added a comment - I created a new issue LUCENE-1582 to fix the sorting problem and also support a TokenStream directly by trieCodeLong/Int(). The API will change, but this would be simplification for the Solr implementation (as the TokenStream can be directly used) and is more memory efficient.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 752823.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 752823.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Changing test to index and search for NOW/DAY TO NOW/DAY+10DAYS otherwise the millisecond precision fails the test intermittently.

          I'll commit this shortly.

          Show
          Shalin Shekhar Mangar added a comment - - edited Changing test to index and search for NOW/DAY TO NOW/DAY+10DAYS otherwise the millisecond precision fails the test intermittently. I'll commit this shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 752785.

          Fixed a single char bug in the previous patch at FieldType.getRangeQuery.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 752785. Fixed a single char bug in the previous patch at FieldType.getRangeQuery.
          Hide
          Shalin Shekhar Mangar added a comment -

          From Hoss on solr-dev about the last patch:

          I don't think treating "*" as special is something FieldType (or
          TrieField) should do – that's specific to the syntax of the QueryParser.
          The FieldType classes should treat the string as a string. (otherwise if i
          write a new QueryParser where * isn't a special character and use some
          syntax like "phoneNumber < *69" i'm screwed.
          "*69" as the

          I also think having a single "inclusive" boolean is a bad idea.

          I would javadoc that the lower/upper bounds can be null, and have
          SolrQueryParser pass null when it sees "*" in the syntax. we should also
          be explicit in the javadocs about what combinations of inclusion booleans
          and null values are allowed so that subclasses know what to expect

          In this patch:

          1. FieldType no longer treats '*' specially
          2. SolrQueryParser passes null for '*'
          3. Single inclusive parameter replaced by two parameters – minInclusive and maxInclusive
          4. Javadoc updated to mention that nulls are allowed for part1 and part2, SolrQueryParser passes null for '*' character and same (true) values for minInclusive and maxInclusive. However other QueryParsers may have different semantics.
          5. Corresponding changes to TrieField

          I'll commit shortly.

          Show
          Shalin Shekhar Mangar added a comment - From Hoss on solr-dev about the last patch: I don't think treating "*" as special is something FieldType (or TrieField) should do – that's specific to the syntax of the QueryParser. The FieldType classes should treat the string as a string. (otherwise if i write a new QueryParser where * isn't a special character and use some syntax like "phoneNumber < *69" i'm screwed. "*69" as the I also think having a single "inclusive" boolean is a bad idea. I would javadoc that the lower/upper bounds can be null, and have SolrQueryParser pass null when it sees "*" in the syntax. we should also be explicit in the javadocs about what combinations of inclusion booleans and null values are allowed so that subclasses know what to expect In this patch: FieldType no longer treats '*' specially SolrQueryParser passes null for '*' Single inclusive parameter replaced by two parameters – minInclusive and maxInclusive Javadoc updated to mention that nulls are allowed for part1 and part2, SolrQueryParser passes null for '*' character and same (true) values for minInclusive and maxInclusive. However other QueryParsers may have different semantics. Corresponding changes to TrieField I'll commit shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 752596.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 752596.
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Adding FieldType.getRangeQuery method which uses the ConstantScore version of RangeQuery.
          2. TrieField overrides it to provide its own implementation.
          3. SolrQueryParser uses fieldType.getRangeQuery

          I'll commit this shortly.

          Show
          Shalin Shekhar Mangar added a comment - Adding FieldType.getRangeQuery method which uses the ConstantScore version of RangeQuery. TrieField overrides it to provide its own implementation. SolrQueryParser uses fieldType.getRangeQuery I'll commit this shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          Instead of explicitly testing for TrieField in the QueryParser, how about adding a FieldType.getRangeQuery()?

          Sounds good. I'll give a patch.

          Show
          Shalin Shekhar Mangar added a comment - Instead of explicitly testing for TrieField in the QueryParser, how about adding a FieldType.getRangeQuery()? Sounds good. I'll give a patch.
          Hide
          Yonik Seeley added a comment -

          Instead of explicitly testing for TrieField in the QueryParser, how about adding a
          FieldType.getRangeQuery()? We'll need that anyway in the future to support value source range query, etc.

          Show
          Yonik Seeley added a comment - Instead of explicitly testing for TrieField in the QueryParser, how about adding a FieldType.getRangeQuery()? We'll need that anyway in the future to support value source range query, etc.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 752562.

          Thanks Uwe for the ideas and the reviews!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 752562. Thanks Uwe for the ideas and the reviews!
          Hide
          Shalin Shekhar Mangar added a comment -

          Updating javadocs to note that trie fields cannot be used in function queries. No other changes.

          What do people feel about committing this patch?

          Another thought - If we can write a ValueSource for trie fields whose DocValues return only the first indexed term, we should be able to use function queries. Will this be too expensive if Lucene does not support building such field caches for us?

          If this can be done then basic sorting would be possible through function queries (though they would be part of the score). However one still would not be able to use trie fields in the sort parameter (or mix their sorting with non-numeric fields).

          Show
          Shalin Shekhar Mangar added a comment - Updating javadocs to note that trie fields cannot be used in function queries. No other changes. What do people feel about committing this patch? Another thought - If we can write a ValueSource for trie fields whose DocValues return only the first indexed term, we should be able to use function queries. Will this be too expensive if Lucene does not support building such field caches for us? If this can be done then basic sorting would be possible through function queries (though they would be part of the score). However one still would not be able to use trie fields in the sort parameter (or mix their sorting with non-numeric fields).
          Hide
          Shalin Shekhar Mangar added a comment -

          The last patch was incorrect. Uploading the correct patch.

          Show
          Shalin Shekhar Mangar added a comment - The last patch was incorrect. Uploading the correct patch.
          Hide
          Uwe Schindler added a comment -

          The patch is the same as before, maybe you uploaded the wrong one.

          Show
          Uwe Schindler added a comment - The patch is the same as before, maybe you uploaded the wrong one.
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. Support term queries for trie dates
          2. Update test for term queries on dates
          3. Throw SolrException for unknown trie type in switch (actually this can not happen because the enum has a fixed number of types and we are using all of them).
          Show
          Shalin Shekhar Mangar added a comment - Changes: Support term queries for trie dates Update test for term queries on dates Throw SolrException for unknown trie type in switch (actually this can not happen because the enum has a fixed number of types and we are using all of them).
          Hide
          Shalin Shekhar Mangar added a comment -

          When looking through the code, I found out that TrieQueryTokenizer is missing Date support, nothing else!

          Ah right, I forgot that term queries won't work without it. I'll add it.

          And I would always throw an IllegalArgumentException in the default case of all switch(type) statements. This helps finding such errors faster.

          Good point. Will do that too.

          Thanks!

          Show
          Shalin Shekhar Mangar added a comment - When looking through the code, I found out that TrieQueryTokenizer is missing Date support, nothing else! Ah right, I forgot that term queries won't work without it. I'll add it. And I would always throw an IllegalArgumentException in the default case of all switch(type) statements. This helps finding such errors faster. Good point. Will do that too. Thanks!
          Hide
          Uwe Schindler added a comment -

          Cool!
          When looking through the code, I found out that TrieQueryTokenizer is missing Date support, nothing else! And I would always throw an IllegalArgumentException in the default case of all switch(type) statements. This helps finding such errors faster.

          Show
          Uwe Schindler added a comment - Cool! When looking through the code, I found out that TrieQueryTokenizer is missing Date support, nothing else! And I would always throw an IllegalArgumentException in the default case of all switch(type) statements. This helps finding such errors faster.
          Hide
          Shalin Shekhar Mangar added a comment -

          Please ignore my comment about toObject I made earlier. It is not necessary.

          Changes:

          1. Added TrieField as a known type in BinaryResponseWriter so that TrieField.toObject is serialized
          2. Changes to example schema with documentation
          3. Updated javadocs
          4. Use TrieUtils.getLongSortField for dates too
          5. Remove hardcoded isMultivalued in TrieField

          This is a good time for folks to take this out for a spin

          Show
          Shalin Shekhar Mangar added a comment - Please ignore my comment about toObject I made earlier. It is not necessary. Changes: Added TrieField as a known type in BinaryResponseWriter so that TrieField.toObject is serialized Changes to example schema with documentation Updated javadocs Use TrieUtils.getLongSortField for dates too Remove hardcoded isMultivalued in TrieField This is a good time for folks to take this out for a spin
          Hide
          Shalin Shekhar Mangar added a comment -

          Hmm, I think the TriField.toObject is not correct. We need to use TrieUtils to convert the prefix coded form back to int/float/long etc. Also, we need to add the TrieField as a known type for the binary response format.

          Show
          Shalin Shekhar Mangar added a comment - Hmm, I think the TriField.toObject is not correct. We need to use TrieUtils to convert the prefix coded form back to int/float/long etc. Also, we need to add the TrieField as a known type for the binary response format.
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. Support for date types
          2. TrieField and TrieIndexTokenizer keep a static instance of DateField class whose parseMath and toObject methods are used. This makes sure that all date format related semantics as well as the DateMath syntax works as usual with trie dates.
          3. Updated test for date type

          TODO:

          1. Update example schema
          2. Update wiki
          3. Commit?
          Show
          Shalin Shekhar Mangar added a comment - Changes: Support for date types TrieField and TrieIndexTokenizer keep a static instance of DateField class whose parseMath and toObject methods are used. This makes sure that all date format related semantics as well as the DateMath syntax works as usual with trie dates. Updated test for date type TODO: Update example schema Update wiki Commit?
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Uwe for spotting those problems. The latest patch should take care of these issues.

          For trie fields it would be good, to have something like "sorting on the first term of the document".

          Hmm, yeah. This looks like the easiest solution.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Uwe for spotting those problems. The latest patch should take care of these issues. For trie fields it would be good, to have something like "sorting on the first term of the document". Hmm, yeah. This looks like the easiest solution.
          Hide
          Shalin Shekhar Mangar added a comment -

          Changes:

          1. Adding support for open ranges
          2. Changed precision step in test schema.xml to 4
          3. Renamed TrieTokenizerFactory to TrieIndexTokenizerFactory
          4. Added a TrieQueryTokenizerFactory which converts query token to xxxToPrefixCoded form. Now term queries (in q or fq) are supported
          5. Updated tests for open ranges and term queries
          6. Minor javadoc updates

          TODO:

          1. Date support
          2. Wiki updates
          3. Example schema updates
          Show
          Shalin Shekhar Mangar added a comment - Changes: Adding support for open ranges Changed precision step in test schema.xml to 4 Renamed TrieTokenizerFactory to TrieIndexTokenizerFactory Added a TrieQueryTokenizerFactory which converts query token to xxxToPrefixCoded form. Now term queries (in q or fq) are supported Updated tests for open ranges and term queries Minor javadoc updates TODO: Date support Wiki updates Example schema updates
          Hide
          Uwe Schindler added a comment -

          About the sorting problem:

          As already discussed in the original TrieRange issue, the sorting is a problem for trie encoded fields. The problem is, that the current FieldCache has two problems:

          • it stores the last term (the last term in the TermEnum!) in the cache
          • it throws an exception, when the number of term in one field > the number of docs (I think this was the case)

          For trie fields it would be good, to have something like "sorting on the first term of the document". This would be conformant with TrieRange, as the first term in trieCodeXxx() is always the highest precision one (and also in your tokenizer). I think, we should discuss more in LUCENE-1372, where this sorting problem is discussed. If it would be fixed before 2.9, I could remove the whole multi-field parts out of TrieRange API and only support one field name (with what I would be really happy). Then you can index all trie terms in one field and sort on it (if the order of generated trie terms is preserved through the whole indexing and TermDocs array (which is not really simple for the field cache to handle).

          Show
          Uwe Schindler added a comment - About the sorting problem: As already discussed in the original TrieRange issue, the sorting is a problem for trie encoded fields. The problem is, that the current FieldCache has two problems: it stores the last term (the last term in the TermEnum!) in the cache it throws an exception, when the number of term in one field > the number of docs (I think this was the case) For trie fields it would be good, to have something like "sorting on the first term of the document". This would be conformant with TrieRange, as the first term in trieCodeXxx() is always the highest precision one (and also in your tokenizer). I think, we should discuss more in LUCENE-1372 , where this sorting problem is discussed. If it would be fixed before 2.9, I could remove the whole multi-field parts out of TrieRange API and only support one field name (with what I would be really happy). Then you can index all trie terms in one field and sort on it (if the order of generated trie terms is preserved through the whole indexing and TermDocs array (which is not really simple for the field cache to handle).
          Hide
          Uwe Schindler added a comment -

          Looks cool, great!
          I have no Solr installed here to test in large scale, but from what I see, It seems sophisticated. I have only seen these points:

          • Missing support for half-open ranges with "*" (just add the test for "*" and pass null to TrieRangeFilter)
          • The example with a different configured precisionStep should use a precisionStep < 8 [16 is a possible value, but useless,because of number of terms. The possible number of terms increses dramatically with higher precision steps (factor 2^precisionStep). Javadocs should note, that 32/64 should be used for no additional trie fields]
          • Date support should be trivial, too.
          • Does it work with the tokenizer for standard term queries? e.g. somebody asks for all documents containing the long value x, but not using a TrieRange for that (this works, but can solr handle this?), is the value correctly tokenized? The problem here maybe that during parsing the query, the analyzer is used and generates a "OR" BolleanQuery of all terms incl lower precisions. Or is for the query another tokenizer used (but then this tokenizer should just generate one term using XxxxToPrefixCoded (without shift).
          Show
          Uwe Schindler added a comment - Looks cool, great! I have no Solr installed here to test in large scale, but from what I see, It seems sophisticated. I have only seen these points: Missing support for half-open ranges with "*" (just add the test for "*" and pass null to TrieRangeFilter) The example with a different configured precisionStep should use a precisionStep < 8 [16 is a possible value, but useless,because of number of terms. The possible number of terms increses dramatically with higher precision steps (factor 2^precisionStep). Javadocs should note, that 32/64 should be used for no additional trie fields] Date support should be trivial, too. Does it work with the tokenizer for standard term queries? e.g. somebody asks for all documents containing the long value x, but not using a TrieRange for that (this works, but can solr handle this?), is the value correctly tokenized? The problem here maybe that during parsing the query, the analyzer is used and generates a "OR" BolleanQuery of all terms incl lower precisions. Or is for the query another tokenizer used (but then this tokenizer should just generate one term using XxxxToPrefixCoded (without shift).
          Hide
          Shalin Shekhar Mangar added a comment -

          If the precisionStep is configureable, you can simply use 32 (for ints) or 64 (for longs) to not create additional precisions.

          That's great, I'll document this on the wiki.

          In queryParser you use: FieldType ft = schema.getFieldType(field); So if you have the FieldType, why are you not able to extract the precisionStep from the schema?

          Yes, done, must have been the late night effect

          For future usage, you could use TrieUtils.get[Int|Long]SortField for FieldType.getSortField instead of using SortField.String. If the problem with more than one field name is solved, sorting works using the Trie-SortField using the correct parser.

          Done too

          Show
          Shalin Shekhar Mangar added a comment - If the precisionStep is configureable, you can simply use 32 (for ints) or 64 (for longs) to not create additional precisions. That's great, I'll document this on the wiki. In queryParser you use: FieldType ft = schema.getFieldType(field); So if you have the FieldType, why are you not able to extract the precisionStep from the schema? Yes, done, must have been the late night effect For future usage, you could use TrieUtils.get [Int|Long] SortField for FieldType.getSortField instead of using SortField.String. If the problem with more than one field name is solved, sorting works using the Trie-SortField using the correct parser. Done too
          Hide
          Shalin Shekhar Mangar added a comment -

          New patch with the following changes:

          1. Supports int, float, long, double
          2. There are no separate classes for each type (too much boilerplate code), instead they are folded into one – TrieField
          3. Same as above for Tokenizer - TrieTokenizerFactory
          4. In the schema, one needs to specify an additional attribute 'type' when declaring the field type, example:
            <fieldType name="tdouble" class="solr.TrieField" type="double" omitNorms="true"
            positionIncrementGap="0" indexed="true" stored="false" />
            
          5. Precision step is now configurable and can be specified in field type declaration, example:
            <fieldType name="tdouble16" class="solr.TrieField" type="double" precisionStep="16"
             omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" />
            
          6. Test expanded for float, long, double types

          TODO:

          1. Date type
          2. More javadocs?
          3. Update wiki
          4. Changes to example schema
          Show
          Shalin Shekhar Mangar added a comment - New patch with the following changes: Supports int, float, long, double There are no separate classes for each type (too much boilerplate code), instead they are folded into one – TrieField Same as above for Tokenizer - TrieTokenizerFactory In the schema, one needs to specify an additional attribute 'type' when declaring the field type, example: <fieldType name= "tdouble" class= "solr.TrieField" type= "double" omitNorms= "true" positionIncrementGap= "0" indexed= "true" stored= "false" /> Precision step is now configurable and can be specified in field type declaration, example: <fieldType name= "tdouble16" class= "solr.TrieField" type= "double" precisionStep= "16" omitNorms= "true" positionIncrementGap= "0" indexed= "true" stored= "false" /> Test expanded for float, long, double types TODO: Date type More javadocs? Update wiki Changes to example schema
          Hide
          Uwe Schindler added a comment -

          Just one question:
          In queryParser you use: FieldType ft = schema.getFieldType(field); So if you have the FieldType, why are you not able to extract the precisionStep from the schema? The user would only have a problem, if he changes the precision step in the schema, but with a fixed schema, that contains the precisionStep as a parameter, you should be able to search indexed data. If you change the schema, you have to reindex (or use a precisionStep that is a multiple of the original one, see trie Javadoc: if you have indexed with step 2, you can search without problems using step 4)

          By the way: For future usage, you could use TrieUtils.get[Int|Long]SortField for FieldType.getSortField instead of using SortField.String. If the problem with more than one field name is solved, sorting works using the Trie-SortField using the correct parser.

          Show
          Uwe Schindler added a comment - Just one question: In queryParser you use: FieldType ft = schema.getFieldType(field); So if you have the FieldType, why are you not able to extract the precisionStep from the schema? The user would only have a problem, if he changes the precision step in the schema, but with a fixed schema, that contains the precisionStep as a parameter, you should be able to search indexed data. If you change the schema, you have to reindex (or use a precisionStep that is a multiple of the original one, see trie Javadoc: if you have indexed with step 2, you can search without problems using step 4) By the way: For future usage, you could use TrieUtils.get [Int|Long] SortField for FieldType.getSortField instead of using SortField.String. If the problem with more than one field name is solved, sorting works using the Trie-SortField using the correct parser.
          Hide
          Uwe Schindler added a comment -

          Assuming TrieRange does all the number mojo needed in lucene, should it eventually replace the existing number implementaions?

          Not until we can support sorting. Also, trie indexes many tokens per value, increasing the index size. Users who do not need range searches should not pay this penalty.

          If the precisionStep is configureable, you can simply use 32 (for ints) or 64 (for longs) to not create additional precisions.

          Show
          Uwe Schindler added a comment - Assuming TrieRange does all the number mojo needed in lucene, should it eventually replace the existing number implementaions? Not until we can support sorting. Also, trie indexes many tokens per value, increasing the index size. Users who do not need range searches should not pay this penalty. If the precisionStep is configureable, you can simply use 32 (for ints) or 64 (for longs) to not create additional precisions.
          Hide
          Shalin Shekhar Mangar added a comment -

          Assuming TrieRange does all the number mojo needed in lucene, should it eventually replace the existing number implementaions?

          Not until we can support sorting. Also, trie indexes many tokens per value, increasing the index size. Users who do not need range searches should not pay this penalty.

          Show
          Shalin Shekhar Mangar added a comment - Assuming TrieRange does all the number mojo needed in lucene, should it eventually replace the existing number implementaions? Not until we can support sorting. Also, trie indexes many tokens per value, increasing the index size. Users who do not need range searches should not pay this penalty.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Attaching first cut with the following changes:

          1. BaseTrieField - Base class for trie fields, hardcodes the field to be multi-valued and tokenized
          2. TrieIntField - Support for ints
          3. TrieIntTokenizer/Factory - Uses TrieUtils to create sequence of trie coded numbers for a given integer, decreasing in precision
          4. Changes to SolrQueryParser to use IntTrieRangeFilter is field is instance of TrieIntField
          5. TestTrie - Simple test for int range search
          6. src/test/test-files/conf/schema-trie.xml uses the trie int

          The precisionStep is not configurable at the moment. This is because the same precisionStep must be used for indexing (by the Tokenizer) and to create the range filter (in SolrQueryParser) and I could not find a way to share this information between the two classes.

          TODO:

          1. Support for float, long, doubles
          2. Javadocs
          3. Changes to example schema, clearly highlighting that trie fields cannot be used for sorting (one should use copyFields into a integer for sorting)

          Thanks Uwe for suggesting the tokenizer approach, works great!

          Edit - Forgot to mention that needs updated Lucene jars (trunk).

          Show
          Shalin Shekhar Mangar added a comment - - edited Attaching first cut with the following changes: BaseTrieField - Base class for trie fields, hardcodes the field to be multi-valued and tokenized TrieIntField - Support for ints TrieIntTokenizer/Factory - Uses TrieUtils to create sequence of trie coded numbers for a given integer, decreasing in precision Changes to SolrQueryParser to use IntTrieRangeFilter is field is instance of TrieIntField TestTrie - Simple test for int range search src/test/test-files/conf/schema-trie.xml uses the trie int The precisionStep is not configurable at the moment. This is because the same precisionStep must be used for indexing (by the Tokenizer) and to create the range filter (in SolrQueryParser) and I could not find a way to share this information between the two classes. TODO: Support for float, long, doubles Javadocs Changes to example schema, clearly highlighting that trie fields cannot be used for sorting (one should use copyFields into a integer for sorting) Thanks Uwe for suggesting the tokenizer approach, works great! Edit - Forgot to mention that needs updated Lucene jars (trunk).
          Hide
          Ryan McKinley added a comment -

          I have not followed this closely, so correct me if I am way off base...

          Assuming TrieRange does all the number mojo needed in lucene, should it eventually replace the existing number implementaions?

          In solr 2.0, would it make sense that int,sint,float,sfloat, etc are all implemented with TrieRange? Obviously we need to keep the existing field types for 1.X

          If this is true, should we deprecate the existing Number implementations for 1.4? perhaps just NumberUtils?

          Should changing the schema version to 1.2 trigger using the TrieRange classes rather then the NumberUtils classes? Becides supporting existing indexes, is there any reason to keep the solr number formats rather then the Trie version?

          Show
          Ryan McKinley added a comment - I have not followed this closely, so correct me if I am way off base... Assuming TrieRange does all the number mojo needed in lucene, should it eventually replace the existing number implementaions? In solr 2.0, would it make sense that int,sint,float,sfloat, etc are all implemented with TrieRange? Obviously we need to keep the existing field types for 1.X If this is true, should we deprecate the existing Number implementations for 1.4? perhaps just NumberUtils? Should changing the schema version to 1.2 trigger using the TrieRange classes rather then the NumberUtils classes? Becides supporting existing indexes, is there any reason to keep the solr number formats rather then the Trie version?
          Hide
          Shalin Shekhar Mangar added a comment -

          Using this, you could index the field (without an additional helper field and so not sortable) using the standard Lucene Fieldable mechanism. No further changes to solar on the indexing side might be needed.

          Hmm, no sort should be OK for a start. Users can be instructed to use a copyField for sorting (just like we have integer and sint in the schema). Thanks for the tip Uwe! I'll try this out and let you know if this works out well.

          Show
          Shalin Shekhar Mangar added a comment - Using this, you could index the field (without an additional helper field and so not sortable) using the standard Lucene Fieldable mechanism. No further changes to solar on the indexing side might be needed. Hmm, no sort should be OK for a start. Users can be instructed to use a copyField for sorting (just like we have integer and sint in the schema). Thanks for the tip Uwe! I'll try this out and let you know if this works out well.
          Hide
          Uwe Schindler added a comment -

          But, a tokenizer cannot add tokens in another field which is requred for the filter to work correctly.

          You can tokenize it into one field and use TrieRangeFilter with the same field name for the field and the lower precision field (second constructor). After that, search works, but you cannot sort anymore, because more than one token per document in this field.

          Show
          Uwe Schindler added a comment - But, a tokenizer cannot add tokens in another field which is requred for the filter to work correctly. You can tokenize it into one field and use TrieRangeFilter with the same field name for the field and the lower precision field (second constructor). After that, search works, but you cannot sort anymore, because more than one token per document in this field.
          Hide
          Shalin Shekhar Mangar added a comment -

          Now I understand the problem, Yonik had with the original TrieRange implementation and wanted to change the API. Your problem is, that you must be able to not just map the numerical value to one field and token. You have to index one numeric value to more than one token before indexing them.

          I was just reading Yonik's comment on java-dev to figure out what Yonik had in mind. Normally, the toInternal/toExternal methods take care of encoding/decoding. But we cannot use them because the trie encoding produces multiple tokens. That can be done through a tokenizer as you said. But, a tokenizer cannot add tokens in another field which is requred for the filter to work correctly.

          Show
          Shalin Shekhar Mangar added a comment - Now I understand the problem, Yonik had with the original TrieRange implementation and wanted to change the API. Your problem is, that you must be able to not just map the numerical value to one field and token. You have to index one numeric value to more than one token before indexing them. I was just reading Yonik's comment on java-dev to figure out what Yonik had in mind. Normally, the toInternal/toExternal methods take care of encoding/decoding. But we cannot use them because the trie encoding produces multiple tokens. That can be done through a tokenizer as you said. But, a tokenizer cannot add tokens in another field which is requred for the filter to work correctly.
          Hide
          Uwe Schindler added a comment -

          I would program this tokenizer in this way (using the old Lucene Token API):

          public class TrieTokenStream extends TokenStream/Tokenizer {
            public TrieTokenStream(long value,...) {
              this.trieVals=Arrays.asList(TrieUtils.trieCodeLong(value,...)).iterator();
            }
          
            public Token next(Token token) {
              if (!s.hasNext()) return null;
              token.reinit(trieVals.next(),0,0);
              token.setPositionIncrement(0);
              return token;
            }
          
            private final Iterator<String> trieVals;
          }
          

          Using this, you could index the field (without an additional helper field and so not sortable) using the standard Lucene Fieldable mechanism. No further changes to solar on the indexing side might be needed.

          Show
          Uwe Schindler added a comment - I would program this tokenizer in this way (using the old Lucene Token API): public class TrieTokenStream extends TokenStream/Tokenizer { public TrieTokenStream( long value,...) { this .trieVals=Arrays.asList(TrieUtils.trieCodeLong(value,...)).iterator(); } public Token next(Token token) { if (!s.hasNext()) return null ; token.reinit(trieVals.next(),0,0); token.setPositionIncrement(0); return token; } private final Iterator< String > trieVals; } Using this, you could index the field (without an additional helper field and so not sortable) using the standard Lucene Fieldable mechanism. No further changes to solar on the indexing side might be needed.
          Hide
          Uwe Schindler added a comment -

          Just an idea (that came to me...): How about creating a TokenStream that returns the results of TrieUtils.trieCode[Long|Int]() with TokenIncrement 0. You should be able to search this with TrieRangeFilter (using the same field name for the highest and lower precision trie fields).

          The difficulty is in identifying what type of tokenizer was used (TrieInt, TrieLong etc.) to index the field. The user will need to use the localparam syntax explicitly for us to use IntTrieRangeFilter e.g fq=

          Unknown macro: {trieint}

          tint:[10 TO 100]. I would like to avoid the use of such syntax as far as possible. Creating the field type may be more work than this option, but it can help us use the correct Filter and SortField automatically.

          Now I understand the problem, Yonik had with the original TrieRange implementation and wanted to change the API. Your problem is, that you must be able to not just map the numerical value to one field and token. You have to index one numeric value to more than one token before indexing them.

          My idea was, to just use create a FieldType subclass for indexing TrieRangeFilter and overwrite the getAnalyzer() and getQueryAnalyzer() methods. The analyzer would get the numerical value and create tokens from it. Normally, it would be only one token for numerical values that is converted using the toXXXX methods in FieldType. But now you have to create more than one token (one for each precision). This could be done by the analyzer that is returned by FieldType. This analyzer does really nothing, only returns a Tokenizer that does not really tokenize, it just returns Tokens containing the prefix encoded values of the given String converted to the numeric value in different precisions (using TrieUtils.trieCodeLong()).

          Show
          Uwe Schindler added a comment - Just an idea (that came to me...): How about creating a TokenStream that returns the results of TrieUtils.trieCode [Long|Int] () with TokenIncrement 0. You should be able to search this with TrieRangeFilter (using the same field name for the highest and lower precision trie fields). The difficulty is in identifying what type of tokenizer was used (TrieInt, TrieLong etc.) to index the field. The user will need to use the localparam syntax explicitly for us to use IntTrieRangeFilter e.g fq= Unknown macro: {trieint} tint: [10 TO 100] . I would like to avoid the use of such syntax as far as possible. Creating the field type may be more work than this option, but it can help us use the correct Filter and SortField automatically. Now I understand the problem, Yonik had with the original TrieRange implementation and wanted to change the API. Your problem is, that you must be able to not just map the numerical value to one field and token. You have to index one numeric value to more than one token before indexing them. My idea was, to just use create a FieldType subclass for indexing TrieRangeFilter and overwrite the getAnalyzer() and getQueryAnalyzer() methods. The analyzer would get the numerical value and create tokens from it. Normally, it would be only one token for numerical values that is converted using the toXXXX methods in FieldType. But now you have to create more than one token (one for each precision). This could be done by the analyzer that is returned by FieldType. This analyzer does really nothing, only returns a Tokenizer that does not really tokenize, it just returns Tokens containing the prefix encoded values of the given String converted to the numeric value in different precisions (using TrieUtils.trieCodeLong()).
          Hide
          Shalin Shekhar Mangar added a comment -

          Just an idea (that came to me...): How about creating a TokenStream that returns the results of TrieUtils.trieCode[Long|Int]() with TokenIncrement 0. You should be able to search this with TrieRangeFilter (using the same field name for the highest and lower precision trie fields).

          The difficulty is in identifying what type of tokenizer was used (TrieInt, TrieLong etc.) to index the field. The user will need to use the localparam syntax explicitly for us to use IntTrieRangeFilter e.g fq=

          {trieint}

          tint:[10 TO 100]. I would like to avoid the use of such syntax as far as possible. Creating the field type may be more work than this option, but it can help us use the correct Filter and SortField automatically.

          And how about using this for floats, doubles, and dates (which also have corresponding Solr field types)? You could create field descriptions for that too (subclasses of TrieIntField and TrieLongField), to be able to index these types using trie.

          Yes, we should support those too.

          By the way, when looking through the schema code, I found out, that with Lucene trunk, it is now also possible to sort the "SortableLongField" & others using the new SortField ctors that LUCENE-1478 introduced. Currently these fields are sorted by SortField.STRING, whcih is inefficient. Just as a side-note.

          Thanks for the pointing this out. I'll take a look at this too.

          Show
          Shalin Shekhar Mangar added a comment - Just an idea (that came to me...): How about creating a TokenStream that returns the results of TrieUtils.trieCode [Long|Int] () with TokenIncrement 0. You should be able to search this with TrieRangeFilter (using the same field name for the highest and lower precision trie fields). The difficulty is in identifying what type of tokenizer was used (TrieInt, TrieLong etc.) to index the field. The user will need to use the localparam syntax explicitly for us to use IntTrieRangeFilter e.g fq= {trieint} tint: [10 TO 100] . I would like to avoid the use of such syntax as far as possible. Creating the field type may be more work than this option, but it can help us use the correct Filter and SortField automatically. And how about using this for floats, doubles, and dates (which also have corresponding Solr field types)? You could create field descriptions for that too (subclasses of TrieIntField and TrieLongField), to be able to index these types using trie. Yes, we should support those too. By the way, when looking through the schema code, I found out, that with Lucene trunk, it is now also possible to sort the "SortableLongField" & others using the new SortField ctors that LUCENE-1478 introduced. Currently these fields are sorted by SortField.STRING, whcih is inefficient. Just as a side-note. Thanks for the pointing this out. I'll take a look at this too.
          Hide
          Uwe Schindler added a comment -

          Just an idea (that came to me...): How about creating a TokenStream that returns the results of TrieUtils.trieCode[Long|Int]() with TokenIncrement 0. You should be able to search this with TrieRangeFilter (using the same field name for the highest and lower precision trie fields).

          Show
          Uwe Schindler added a comment - Just an idea (that came to me...): How about creating a TokenStream that returns the results of TrieUtils.trieCode [Long|Int] () with TokenIncrement 0. You should be able to search this with TrieRangeFilter (using the same field name for the highest and lower precision trie fields).
          Hide
          Uwe Schindler added a comment -

          By the way, when looking through the schema code, I found out, that with Lucene trunk, it is now also possible to sort the "SortableLongField" & others using the new SortField ctors that LUCENE-1478 introduced. Currently these fields are sorted by SortField.STRING, whcih is inefficient. Just as a side-note.

          Show
          Uwe Schindler added a comment - By the way, when looking through the schema code, I found out, that with Lucene trunk, it is now also possible to sort the "SortableLongField" & others using the new SortField ctors that LUCENE-1478 introduced. Currently these fields are sorted by SortField.STRING, whcih is inefficient. Just as a side-note.
          Hide
          Uwe Schindler added a comment -

          Yes, that seems to be the right way. I'll create TrieIntField and TrieLongField. We can use the implicit helper field or have it as a configuration option in schema.xml. We'd also need changes to the SolrQueryParser so that range queries on such fields are handled correctly.

          And how about using this for floats, doubles, and dates (which also have corresponding Solr field types)? You could create field descriptions for that too (subclasses of TrieIntField and TrieLongField), to be able to index these types using trie.

          Show
          Uwe Schindler added a comment - Yes, that seems to be the right way. I'll create TrieIntField and TrieLongField. We can use the implicit helper field or have it as a configuration option in schema.xml. We'd also need changes to the SolrQueryParser so that range queries on such fields are handled correctly. And how about using this for floats, doubles, and dates (which also have corresponding Solr field types)? You could create field descriptions for that too (subclasses of TrieIntField and TrieLongField), to be able to index these types using trie.
          Hide
          Shalin Shekhar Mangar added a comment -

          Just a question: Do you need help implementing (working power), or is the documentation not yet understandable for a beginner? I added some indexing and query examples in the package overview, but maybe it is not so easy for others to understand. Maybe we can improve the documentation.

          I meant that I have only started looking at this so I may have questions later

          Maybe you should add new types "trie-long",... and index them using TrieUtils.

          Yes, that seems to be the right way. I'll create TrieIntField and TrieLongField. We can use the implicit helper field or have it as a configuration option in schema.xml. We'd also need changes to the SolrQueryParser so that range queries on such fields are handled correctly.

          I'll try to have a patch by tomorrow.

          Show
          Shalin Shekhar Mangar added a comment - Just a question: Do you need help implementing (working power), or is the documentation not yet understandable for a beginner? I added some indexing and query examples in the package overview, but maybe it is not so easy for others to understand. Maybe we can improve the documentation. I meant that I have only started looking at this so I may have questions later Maybe you should add new types "trie-long",... and index them using TrieUtils. Yes, that seems to be the right way. I'll create TrieIntField and TrieLongField. We can use the implicit helper field or have it as a configuration option in schema.xml. We'd also need changes to the SolrQueryParser so that range queries on such fields are handled correctly. I'll try to have a patch by tomorrow.
          Hide
          Uwe Schindler added a comment -

          So, I'll definitely need some help. My first priority is to get it working in a simple way, then add more configuration/tuning options depending on feedback

          Just a question: Do you need help implementing (working power), or is the documentation not yet understandable for a beginner? I added some indexing and query examples in the package overview, but maybe it is not so easy for others to understand. Maybe we can improve the documentation.

          I am not so familar with Solr internals, but as I understand you have datatypes and field configurations in your XML documents. Maybe you should add new types "trie-long",... and index them using TrieUtils. I will check out svn trunk of Solr and look into it. In the first step, I would only use the APIs taking one field name (which creates the internal helper field ending in "#trie", that would automatically be created but "invisible" to the user). This ensures simplicity and the possibility to sort efficient using the SortField factory from TrieUtils (without custom sort comparators and so on).

          Show
          Uwe Schindler added a comment - So, I'll definitely need some help. My first priority is to get it working in a simple way, then add more configuration/tuning options depending on feedback Just a question: Do you need help implementing (working power), or is the documentation not yet understandable for a beginner? I added some indexing and query examples in the package overview, but maybe it is not so easy for others to understand. Maybe we can improve the documentation. I am not so familar with Solr internals, but as I understand you have datatypes and field configurations in your XML documents. Maybe you should add new types "trie-long",... and index them using TrieUtils. I will check out svn trunk of Solr and look into it. In the first step, I would only use the APIs taking one field name (which creates the internal helper field ending in "#trie", that would automatically be created but "invisible" to the user). This ensures simplicity and the possibility to sort efficient using the SortField factory from TrieUtils (without custom sort comparators and so on).
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Uwe! I have just started to look at the API, the discussion in LUCENE-1470 and on the mailing list. So, I'll definitely need some help. My first priority is to get it working in a simple way, then add more configuration/tuning options depending on feedback.

          As for, LUCENE-1541, I'm yet to get to that. Probably others may have more thoughts on that.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Uwe! I have just started to look at the API, the discussion in LUCENE-1470 and on the mailing list. So, I'll definitely need some help. My first priority is to get it working in a simple way, then add more configuration/tuning options depending on feedback. As for, LUCENE-1541 , I'm yet to get to that. Probably others may have more thoughts on that.
          Hide
          Uwe Schindler added a comment -

          Cool, I am open for queries and requests about the API and can help where applicable. What do the Solr people think about LUCENE-1541? I keep it open, but I think it makes things to complicated.

          Show
          Uwe Schindler added a comment - Cool, I am open for queries and requests about the API and can help where applicable. What do the Solr people think about LUCENE-1541 ? I keep it open, but I think it makes things to complicated.
          Hide
          Shalin Shekhar Mangar added a comment -

          Great! I need to upgrade the Lucene jars to get the new updated Trie API.

          Show
          Shalin Shekhar Mangar added a comment - Great! I need to upgrade the Lucene jars to get the new updated Trie API.
          Hide
          Yonik Seeley added a comment -

          I haven't started to work on it - go for it!

          Show
          Yonik Seeley added a comment - I haven't started to work on it - go for it!
          Hide
          Shalin Shekhar Mangar added a comment -

          Yonik, are you working on this? If not, I can start.

          Show
          Shalin Shekhar Mangar added a comment - Yonik, are you working on this? If not, I can start.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Yonik Seeley
            • Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development