Lucene - Core
  1. Lucene - Core
  2. LUCENE-1768

NumericRange support for new query parser

    Details

    • Lucene Fields:
      New

      Description

      It would be good to specify some type of "schema" for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like "[1.567..*]" or "(1.787..19.5]".

      There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest.

      The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ).

      Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates.

      1. week-8.patch
        29 kB
        Vinicius Barros
      2. week-7.patch
        97 kB
        Vinicius Barros
      3. week5-6.patch
        108 kB
        Vinicius Barros
      4. week4.patch
        82 kB
        Vinicius Barros
      5. week3.patch
        76 kB
        Vinicius Barros
      6. week2.patch
        39 kB
        Vinicius Barros
      7. week15_for_trunk.patch
        61 kB
        Vinicius Barros
      8. week15_for_lucene_3x.patch
        2 kB
        Vinicius Barros
      9. week-14.patch
        61 kB
        Vinicius Barros
      10. week11-13_for_lucene_3x.patch
        102 kB
        Vinicius Barros
      11. week11-13_for_lucene_3x.patch
        108 kB
        Uwe Schindler
      12. week1.patch
        24 kB
        Vinicius Barros
      13. TestNumericQueryParser-fix.patch
        6 kB
        Vinicius Barros
      14. TestNumericQueryParser-fix.patch
        8 kB
        Uwe Schindler
      15. TestNumericQueryParser-fix.patch
        9 kB
        Uwe Schindler
      16. TestNumericQueryParser-fix.patch
        10 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Luis Alves added a comment -

          Uwe,

          Thanks for creating the jira issue.

          Can you add some simple query examples.
          What would be the lucene Query objects for those queries, if it was produce by a QP that supported that feature.

          Also elaborate what is the current expect behavior for those queries.

          If you can write a junit with one or 2 indexed docs,
          and a lucene Query that retrives just one of those docs and not the other
          without using the queryparser, that would be helpful.

          Show
          Luis Alves added a comment - Uwe, Thanks for creating the jira issue. Can you add some simple query examples. What would be the lucene Query objects for those queries, if it was produce by a QP that supported that feature. Also elaborate what is the current expect behavior for those queries. If you can write a junit with one or 2 indexed docs, and a lucene Query that retrives just one of those docs and not the other without using the queryparser, that would be helpful.
          Hide
          Uwe Schindler added a comment - - edited

          Luis,
          I will post an example of queries and the constructed Query objects when I am back from holidays (Thursday+). In principle the syntax would be the same like for normal range queries, only that the min/max arguments may be double, float, int, long or dates. You would create instances of NumericRangeQuery from it using one of the static factories for each data type (for dates a conversion to long using Date.getTime() would be done). The datatype must be somehow predefined for the field names using some type of schema (per field).. Open ends use "*" and the [], (), {} would define if incl. NumericRangeQuery is a subclass of MultiTermQuery so the rewrite method also applies to this query. For NRQ there is also a config parameter precisionStep which default value is 4, but should be also configureable per-field together with the data type.

          Example code for creating the NRQ are in the JavaDocs and there are 2 JUnits in trunk (TestNumericRangeQuery*) showing how it is used. Also the new LIA2 contains a chapter about it.

          Show
          Uwe Schindler added a comment - - edited Luis, I will post an example of queries and the constructed Query objects when I am back from holidays (Thursday+). In principle the syntax would be the same like for normal range queries, only that the min/max arguments may be double, float, int, long or dates. You would create instances of NumericRangeQuery from it using one of the static factories for each data type (for dates a conversion to long using Date.getTime() would be done). The datatype must be somehow predefined for the field names using some type of schema (per field).. Open ends use "*" and the [], (), {} would define if incl. NumericRangeQuery is a subclass of MultiTermQuery so the rewrite method also applies to this query. For NRQ there is also a config parameter precisionStep which default value is 4, but should be also configureable per-field together with the data type. Example code for creating the NRQ are in the JavaDocs and there are 2 JUnits in trunk (TestNumericRangeQuery*) showing how it is used. Also the new LIA2 contains a chapter about it.
          Hide
          Uwe Schindler added a comment -

          I think, this should be in 2.9. Any Chance to do this. In my Opinion, it should be not so hard. I will prepare something tomorrow.

          Show
          Uwe Schindler added a comment - I think, this should be in 2.9. Any Chance to do this. In my Opinion, it should be not so hard. I will prepare something tomorrow.
          Hide
          Yonik Seeley added a comment -

          I think, this should be in 2.9.

          The standard way in the past was for the app to simply override getRangeQuery() to handle different fields differently.
          This still seems the most flexible.

          Show
          Yonik Seeley added a comment - I think, this should be in 2.9. The standard way in the past was for the app to simply override getRangeQuery() to handle different fields differently. This still seems the most flexible.
          Hide
          Luis Alves added a comment -

          You could still do something similar by simply override RangeQueryNodeBuilder.build(QueryNode queryNode), but this is not clean (it is kind of a hack).

          A clean implementation would allow the user to configure the field types (which the "new flexible queryparser" does).
          I'm new to NumericRange Queries and Rangequeries in general, but here is what I think it should look like.

          Here is a seudo java example:

              final String defaultField = "default";
              final String monthField = "month";
              final String hourField = "hour";
              final String distanceField = "distance";
              final String moneyField = "money";
          
              Map<CharSequence, RangeTools.Type> rangeTypes =  new HashMap<CharSequence, RangeTools.Type>();
              
              // set a field specific range type per field
              rangeTypes.put(monthField, new RangeTools.Type(RangeUtils.DATE, DateTools.Resolution.MONTH) );
              rangeTypes.put(hourField, new RangeUtils.Type(RangeUtils.DATE,  DateTools.Resolution.HOUR) );
              rangeTypes.put(distanceField, RangeUtils.getType(RangeUtils.NUMERIC,  RangeUtils.NumericType.LONG, NumericUtils.PRECISION_STEP_DEFAULT) );
              rangeTypes.put(moneyField, RangeUtils.getType(RangeUtils.NUMERIC,  RangeUtils.NumericType.Type.FLOAT, NumericUtils.PRECISION_STEP_DEFAULT) );
          
              StandardQueryParser qp = new StandardQueryParser();
          
              // set default range type to Int default precision
              qp.setDefaultRangeType(RangeUtils.getType(RangeUtils.NUMERIC,  RangeUtils.NumericType.INT, NumericUtils.PRECISION_STEP_DEFAULT));
          
              // set field range types
              qp.setRangeTypes(rangeTypes);
          
             Query q = qp.parser(" month:[01/01/2004 TO 01/01/2005]  distance:[1000 to 2000] money: [23.50 to 50.99]");
          
          
          Show
          Luis Alves added a comment - You could still do something similar by simply override RangeQueryNodeBuilder.build(QueryNode queryNode), but this is not clean (it is kind of a hack). A clean implementation would allow the user to configure the field types (which the "new flexible queryparser" does). I'm new to NumericRange Queries and Rangequeries in general, but here is what I think it should look like. Here is a seudo java example: final String defaultField = " default " ; final String monthField = "month" ; final String hourField = "hour" ; final String distanceField = "distance" ; final String moneyField = "money" ; Map<CharSequence, RangeTools.Type> rangeTypes = new HashMap<CharSequence, RangeTools.Type>(); // set a field specific range type per field rangeTypes.put(monthField, new RangeTools.Type(RangeUtils.DATE, DateTools.Resolution.MONTH) ); rangeTypes.put(hourField, new RangeUtils.Type(RangeUtils.DATE, DateTools.Resolution.HOUR) ); rangeTypes.put(distanceField, RangeUtils.getType(RangeUtils.NUMERIC, RangeUtils.NumericType.LONG, NumericUtils.PRECISION_STEP_DEFAULT) ); rangeTypes.put(moneyField, RangeUtils.getType(RangeUtils.NUMERIC, RangeUtils.NumericType.Type.FLOAT, NumericUtils.PRECISION_STEP_DEFAULT) ); StandardQueryParser qp = new StandardQueryParser(); // set default range type to Int default precision qp.setDefaultRangeType(RangeUtils.getType(RangeUtils.NUMERIC, RangeUtils.NumericType.INT, NumericUtils.PRECISION_STEP_DEFAULT)); // set field range types qp.setRangeTypes(rangeTypes); Query q = qp.parser( " month:[01/01/2004 TO 01/01/2005] distance:[1000 to 2000] money: [23.50 to 50.99]" );
          Hide
          Yonik Seeley added a comment -

          It feels like going that route would add much code and complexity.

          If the user already knows how to create a range query in code, it's much more straightforward to just do

          if ("money".equals(field)) return new NumericRangeQuery(field,...)
          else return super.getRangeQuery(field,...)
          
          Show
          Yonik Seeley added a comment - It feels like going that route would add much code and complexity. If the user already knows how to create a range query in code, it's much more straightforward to just do if ( "money" .equals(field)) return new NumericRangeQuery(field,...) else return super .getRangeQuery(field,...)
          Hide
          Michael McCandless added a comment -

          You could still do something similar by simply override RangeQueryNodeBuilder.build(QueryNode queryNode), but this is not clean (it is kind of a hack).

          What's the cleaner way to do this? EG could I make my own ParametricRangeQueryNodeProcessor, subclassing the current one in the "standard.processors" package, that overrides postProcessNode to do its own conversion?

          Show
          Michael McCandless added a comment - You could still do something similar by simply override RangeQueryNodeBuilder.build(QueryNode queryNode), but this is not clean (it is kind of a hack). What's the cleaner way to do this? EG could I make my own ParametricRangeQueryNodeProcessor, subclassing the current one in the "standard.processors" package, that overrides postProcessNode to do its own conversion?
          Hide
          Luis Alves added a comment -

          Hi Yonik,

          As I said before you can do that in the RangeQueryNodeBuilder.build(QueryNode queryNode),
          but it's ugly and this is not what we intended when using the "new flexible query parser".

          The "new flexible query parser" does not follow the concept of method overwriting has the old one.
          So solutions that worked in the old queryparser, like overwriting a method, have to be implemented
          using a programmatic way.

          Your approach requires creating a new class, overwrite a method.
          you still need to create a instance of your QueryParser and is not reusable.

          Here is a sample of what your approach is:

          
          Class YonikQueryParser extends QueryParser{
          
            Query getRangeQuery(field,...) {
              if ("money".equals(field)) return new NumericRangeQuery(field,...)
              else return super.getRangeQuery(field,...)
            }
          }
          
          ...
           QueryParser yqp = new YonikQueryParser(...);
          yqp.parser(query);
          

          vs

          What I am proposing:

              Map<CharSequence, RangeTools.Type> rangeTypes =  new HashMap<CharSequence, RangeTools.Type>();
              
              rangeTypes.put("money", RangeUtils.getType(RangeUtils.NUMERIC,  RangeUtils.NumericType.Type.FLOAT, NumericUtils.PRECISION_STEP_DEFAULT) );
          
              StandardQueryParser qp = new StandardQueryParser();
              qp.setRangeTypes(rangeTypes);
          
              qp.parser(query);
          

          The second approach is programmatic does not require a new class,
          or the overwrite of a method and is reusable by other users, and it's
          backward compatible, meaning we can integrate this on the current
          "Flexible query parser" and deliver this feature on 2.9 without affecting
          any current usecase.

          Your approach is not compatible, it does require new class, and is not programmatic,
          It's not reusable by other users (we can't commit your code to lucene),
          since fields are hard-coded.

          Also the approach I proposing is very similar to setFieldsBoost setDateResolution,
          already available on the old QP and the new flexible query parser.

          I also want to say, that extending the old QP vs extending the "New flexible Query Parser" approaches
          are never going to be similar, they completely different implementations.

          Show
          Luis Alves added a comment - Hi Yonik, As I said before you can do that in the RangeQueryNodeBuilder.build(QueryNode queryNode), but it's ugly and this is not what we intended when using the "new flexible query parser". The "new flexible query parser" does not follow the concept of method overwriting has the old one. So solutions that worked in the old queryparser, like overwriting a method, have to be implemented using a programmatic way. Your approach requires creating a new class, overwrite a method. you still need to create a instance of your QueryParser and is not reusable. Here is a sample of what your approach is: Class YonikQueryParser extends QueryParser{ Query getRangeQuery(field,...) { if ( "money" .equals(field)) return new NumericRangeQuery(field,...) else return super .getRangeQuery(field,...) } } ... QueryParser yqp = new YonikQueryParser(...); yqp.parser(query); vs What I am proposing: Map<CharSequence, RangeTools.Type> rangeTypes = new HashMap<CharSequence, RangeTools.Type>(); rangeTypes.put( "money" , RangeUtils.getType(RangeUtils.NUMERIC, RangeUtils.NumericType.Type.FLOAT, NumericUtils.PRECISION_STEP_DEFAULT) ); StandardQueryParser qp = new StandardQueryParser(); qp.setRangeTypes(rangeTypes); qp.parser(query); The second approach is programmatic does not require a new class, or the overwrite of a method and is reusable by other users, and it's backward compatible, meaning we can integrate this on the current "Flexible query parser" and deliver this feature on 2.9 without affecting any current usecase. Your approach is not compatible, it does require new class, and is not programmatic, It's not reusable by other users (we can't commit your code to lucene), since fields are hard-coded. Also the approach I proposing is very similar to setFieldsBoost setDateResolution, already available on the old QP and the new flexible query parser. I also want to say, that extending the old QP vs extending the "New flexible Query Parser" approaches are never going to be similar, they completely different implementations.
          Hide
          Yonik Seeley added a comment -

          It's not reusable by other users (we can't commit your code to lucene)

          Neither is your version with rangeTypes.put("money", RangeUtils.getType(RangeUtils.NUMERIC...
          That's the application specific configuration code and doesn't need (or want) to be committed.

          Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do.

          Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around.

          Show
          Yonik Seeley added a comment - It's not reusable by other users (we can't commit your code to lucene) Neither is your version with rangeTypes.put("money", RangeUtils.getType(RangeUtils.NUMERIC... That's the application specific configuration code and doesn't need (or want) to be committed. Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do. Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around.
          Hide
          Luis Alves added a comment -

          What's the cleaner way to do this? EG could I make my own ParametricRangeQueryNodeProcessor, subclassing the current one in the "standard.processors" package, that overrides postProcessNode to do its own conversion?

          For Yonik simple requirement, you could

          Option 1 (more flexible):

          • make your own ParametricRangeQueryNodeProcessor, subclassing the current, returning NumericQueryNodes where needed
          • create a NumericQueryNode that extends RangeQueryNode (node extra code needed)
          • create a NumericQueryNodeBuilder that handles NumericQueryNodes, and set the map in StandardQueryTreeBuilder, ex: setBuilder(NumericQueryNode.class, new NumericQueryNodeBuilder()),. RangeQueryNodes will still be normally handled by the RangeQueryNodeBuilder.

          Option 2, (less flexible):

          • make your own RangeQueryNodeBuilder subclassing the current(ex: NumericQueryNodeBuilder) , set the map in StandardQueryTreeBuilder, ex: setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder())

          Option 1, implements the correct usage of the APIs. It's more flexible and "dirty work" is done in the processors pipeline.
          Option 2, is not the correct use case for the APIs, requires less code and it will work, but the builder will be performing the tasks the Processor should be doing.

          Show
          Luis Alves added a comment - What's the cleaner way to do this? EG could I make my own ParametricRangeQueryNodeProcessor, subclassing the current one in the "standard.processors" package, that overrides postProcessNode to do its own conversion? For Yonik simple requirement, you could Option 1 (more flexible): make your own ParametricRangeQueryNodeProcessor, subclassing the current, returning NumericQueryNodes where needed create a NumericQueryNode that extends RangeQueryNode (node extra code needed) create a NumericQueryNodeBuilder that handles NumericQueryNodes, and set the map in StandardQueryTreeBuilder, ex: setBuilder(NumericQueryNode.class, new NumericQueryNodeBuilder()),. RangeQueryNodes will still be normally handled by the RangeQueryNodeBuilder. Option 2, (less flexible): make your own RangeQueryNodeBuilder subclassing the current(ex: NumericQueryNodeBuilder) , set the map in StandardQueryTreeBuilder, ex: setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder()) Option 1, implements the correct usage of the APIs. It's more flexible and "dirty work" is done in the processors pipeline. Option 2, is not the correct use case for the APIs, requires less code and it will work, but the builder will be performing the tasks the Processor should be doing.
          Hide
          Luis Alves added a comment - - edited

          Neither is your version with rangeTypes.put("money", RangeUtils.getType(RangeUtils.NUMERIC...
          That's the application specific configuration code and doesn't need (or want) to be committed.

          You are correct, I was describing the use case from the user perspective.
          That code was a example how to use the API's if we implement them in the future, those API's are not currently available.

          Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do.

          I'm not sure what to say here. So I'll point to the documentation that we currently have:
          You can read https://issues.apache.org/jira/secure/attachment/12410046/QueryParser_restructure_meetup_june2009_v2.pdf
          and the java docs for
          package org.apache.lucene.queryParser.core
          class org.apache.lucene.queryParser.standard.StandardQueryParser

          You can also look at TestSpanQueryParserSimpleSample junit for another example how the API's can be used,
          in a completely different way.

          The new QueryParser was designed to be extensible,
          allow the implementation of languages extensions or different languages,
          and have reusable components like the processors and builders

          We use SyntaxParsers, Processors and Builders, all are replaceable components at runtime.
          Any user can build it's own pipeline and create new processors, builders, querynodes and integrate them
          with the existing ones to create the features they require.

          Some of the features are:

          • Syntax Tree optimization
          • Syntax Tree expansion
          • Syntax Tree validation and error reporting
          • Tokenization and normalization of the query
          • Makes it easy to create extensions
          • Support for translation of error messages
          • Allows users to plug and play processors and builders, without having to modify lucene code.
          • Allow lucene users to implement features much faster
          • Allow users to change default behavior in a easy way without having to modify lucene code.

          Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around.

          Below is the java code for option 2. It's not the recomend way to use the new queryparser,
          but is the shortest way to do what you want.

            class NumericQueryNodeBuilder extends RangeQueryNodeBuilder {
              public TermRangeQuery build(QueryNode queryNode) throws QueryNodeException {
              RangeQueryNode rangeNode = (RangeQueryNode) queryNode;
                
              if (rangeNode.getField().toString().equals("money")) {
                // do whatever you need here with queryNode.
                return new NumericRangeQuery(field,...)
              }
              else {
                  return super.build(queryNode);
                }
              }
            }
            
            public void testNewRangeQueryBuilder() throws Exception {    
              StandardQueryParser qp = new StandardQueryParser();
              QueryTreeBuilder builder = (QueryTreeBuilder)qp.getQueryBuilder();
              builder.setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder());
              
              String startDate = getLocalizedDate(2002, 1, 1, false);
              String endDate = getLocalizedDate(2002, 1, 4, false);    
              
              StandardAnalyzer oneStopAnalyzer = new StandardAnalyzer();
              qp.setAnalyzer(oneStopAnalyzer);
              
              Query a = qp.parse("date:[" + startDate + " TO " + endDate + "]", null);
              System.out.print(a);
            }
          
          Show
          Luis Alves added a comment - - edited Neither is your version with rangeTypes.put("money", RangeUtils.getType(RangeUtils.NUMERIC... That's the application specific configuration code and doesn't need (or want) to be committed. You are correct, I was describing the use case from the user perspective. That code was a example how to use the API's if we implement them in the future, those API's are not currently available. Directly instantiating the query you want is simple, ultimately configurable, and avoids adding a ton of unnecessary classes or methods that need to be kept in sync with everything that a user may want to do. I'm not sure what to say here. So I'll point to the documentation that we currently have: You can read https://issues.apache.org/jira/secure/attachment/12410046/QueryParser_restructure_meetup_june2009_v2.pdf and the java docs for package org.apache.lucene.queryParser.core class org.apache.lucene.queryParser.standard.StandardQueryParser You can also look at TestSpanQueryParserSimpleSample junit for another example how the API's can be used, in a completely different way. The new QueryParser was designed to be extensible, allow the implementation of languages extensions or different languages, and have reusable components like the processors and builders We use SyntaxParsers, Processors and Builders, all are replaceable components at runtime. Any user can build it's own pipeline and create new processors, builders, querynodes and integrate them with the existing ones to create the features they require. Some of the features are: Syntax Tree optimization Syntax Tree expansion Syntax Tree validation and error reporting Tokenization and normalization of the query Makes it easy to create extensions Support for translation of error messages Allows users to plug and play processors and builders, without having to modify lucene code. Allow lucene users to implement features much faster Allow users to change default behavior in a easy way without having to modify lucene code. Is there a simple way to provide a custom QueryBuilder for range queries (or any other query type?) I'm sure there must be, but there are so many classes in the new QP, I'm having a little difficulty finding my way around. Below is the java code for option 2. It's not the recomend way to use the new queryparser, but is the shortest way to do what you want. class NumericQueryNodeBuilder extends RangeQueryNodeBuilder { public TermRangeQuery build(QueryNode queryNode) throws QueryNodeException { RangeQueryNode rangeNode = (RangeQueryNode) queryNode; if (rangeNode.getField().toString().equals( "money" )) { // do whatever you need here with queryNode. return new NumericRangeQuery(field,...) } else { return super .build(queryNode); } } } public void testNewRangeQueryBuilder() throws Exception { StandardQueryParser qp = new StandardQueryParser(); QueryTreeBuilder builder = (QueryTreeBuilder)qp.getQueryBuilder(); builder.setBuilder(RangeQueryNode.class, new NumericQueryNodeBuilder()); String startDate = getLocalizedDate(2002, 1, 1, false ); String endDate = getLocalizedDate(2002, 1, 4, false ); StandardAnalyzer oneStopAnalyzer = new StandardAnalyzer(); qp.setAnalyzer(oneStopAnalyzer); Query a = qp.parse( "date:[" + startDate + " TO " + endDate + "]" , null ); System .out.print(a); }
          Hide
          Uwe Schindler added a comment -

          To go back to the idea why I opened the issue (and I think, this is also Mike's intention):

          From what you see on java-user, where users asking questions about how to use Lucene:
          Most users are not aware of the fact, that they can create Query classes themselves. Most examplecode on the list is just: "I have such query string and I pass it to lucene and it does not work as exspected." It is hard to explain them, that they should simply not use a query parser for their queries and just instantiate the query classes directly. For such users it is even harder to customize this query parser.

          My intention behind is: Make the RangeQueryNodeBuilder somehow configureable like Luis proposed, that you can set the type of a field (what we do not have in Lucene currently). If the type is undefined or explicite set to "string/term", create a TermRangeQuery. If it is set to any numeric type, create a NumericRangeQuery.newXxxRange(field,....).

          The same can currently be done by the original Lucene query parser, but only for dates (and it is really a hack using this DateField class). I simply want to extend it that you can say: "this field is of type 'int' and create automatically the correct range query for it." Because the old query parser is now "deprecated", I want to do it for the new one. This would also be an intention for new users to throw away the old parser and use the new one, because it can be configured easily to create numeric ranges in addition to term ranges.

          Show
          Uwe Schindler added a comment - To go back to the idea why I opened the issue (and I think, this is also Mike's intention): From what you see on java-user, where users asking questions about how to use Lucene: Most users are not aware of the fact, that they can create Query classes themselves. Most examplecode on the list is just: "I have such query string and I pass it to lucene and it does not work as exspected." It is hard to explain them, that they should simply not use a query parser for their queries and just instantiate the query classes directly. For such users it is even harder to customize this query parser. My intention behind is: Make the RangeQueryNodeBuilder somehow configureable like Luis proposed, that you can set the type of a field (what we do not have in Lucene currently). If the type is undefined or explicite set to "string/term", create a TermRangeQuery. If it is set to any numeric type, create a NumericRangeQuery.newXxxRange(field,....). The same can currently be done by the original Lucene query parser, but only for dates (and it is really a hack using this DateField class). I simply want to extend it that you can say: "this field is of type 'int' and create automatically the correct range query for it." Because the old query parser is now "deprecated", I want to do it for the new one. This would also be an intention for new users to throw away the old parser and use the new one, because it can be configured easily to create numeric ranges in addition to term ranges.
          Hide
          Michael McCandless added a comment -

          Given the complexity of customizing the new QueryParser, and given
          that numeric fields will likely be commonly used, I think this is an
          important issue. I think we should try to have the new QueryParser
          cleanly produce NumericRangeQuery, in 2.9.

          EG expecting a user to do "option 1" (the "clean", more flexible
          option) is a tall order. Simple things should be simple...

          The proposed RangeTools seems like a good approach, and I like how it
          cleanly absorbs the Date precisions that the old queryParser also
          supports.

          But we better get cracking here since 2.9 is real close....!

          Here's one side-question, about back compat promises for the new
          QueryParser: we are suggesting the users can start from all the
          building blocks in StandardQueryParser, and override the processors,
          create new nodes, builders, etc. with their own. But this is
          potentially dangerous, in that the next version of Lucene might change
          things up such that your custom code doesn't work anymore? It's alot
          like a core class being subclassed externally, and then change to the
          core class break those external subclasses.

          EG say we had not handled numerics for 2.9, and users go and do
          "option 2" (the quick & dirty, but simplest, way to get
          NumericRangeQueries out). Then, say in 3.1 we implement the proposed
          fix here ("option 1"). Suddenly, we've altered what nodes come out of
          the processor pipeline, because we've created a new NumericRangeQuery
          node, and so the builders that users had added, for the RangeQuery
          node, will no loner be invoked. How are we going to handle
          back-compat here?

          Show
          Michael McCandless added a comment - Given the complexity of customizing the new QueryParser, and given that numeric fields will likely be commonly used, I think this is an important issue. I think we should try to have the new QueryParser cleanly produce NumericRangeQuery, in 2.9. EG expecting a user to do "option 1" (the "clean", more flexible option) is a tall order. Simple things should be simple... The proposed RangeTools seems like a good approach, and I like how it cleanly absorbs the Date precisions that the old queryParser also supports. But we better get cracking here since 2.9 is real close....! Here's one side-question, about back compat promises for the new QueryParser: we are suggesting the users can start from all the building blocks in StandardQueryParser, and override the processors, create new nodes, builders, etc. with their own. But this is potentially dangerous, in that the next version of Lucene might change things up such that your custom code doesn't work anymore? It's alot like a core class being subclassed externally, and then change to the core class break those external subclasses. EG say we had not handled numerics for 2.9, and users go and do "option 2" (the quick & dirty, but simplest, way to get NumericRangeQueries out). Then, say in 3.1 we implement the proposed fix here ("option 1"). Suddenly, we've altered what nodes come out of the processor pipeline, because we've created a new NumericRangeQuery node, and so the builders that users had added, for the RangeQuery node, will no loner be invoked. How are we going to handle back-compat here?
          Hide
          Mark Miller added a comment -

          Personally, I don't think we should deprecate the standard QueryParser yet - and the new one should carry no back compat policy. It needs to be flushed out in a release before we tell users to move to it IMO. Not enough Committers have enough experience with it to promise back compat at this point I think.

          Show
          Mark Miller added a comment - Personally, I don't think we should deprecate the standard QueryParser yet - and the new one should carry no back compat policy. It needs to be flushed out in a release before we tell users to move to it IMO. Not enough Committers have enough experience with it to promise back compat at this point I think.
          Hide
          Adriano Crestani added a comment -

          The proposed RangeTools seems like a good approach, and I like how it
          cleanly absorbs the Date precisions that the old queryParser also
          supports.

          You meant DateTools, right?! I don't see so much difference to use this same approach over the "option1". You have a map based from field name to the DateTools.Resolution used for that field. Which is the same feature we want to implement on this JIRA, something you could configure how you are going to resolve the value defined on a range query based on the field name. The only difference is that we are expanding the options the user will have to resolve the values: RangeUtils.NUMERIC, RangeUtils.DATE, RangeUtils.FLOAT, etc...let me know if I missed or missunderstood something on this part.

          Here's one side-question, about back compat promises for the new
          QueryParser: we are suggesting the users can start from all the
          building blocks in StandardQueryParser, and override the processors,
          create new nodes, builders, etc. with their own. But this is
          potentially dangerous, in that the next version of Lucene might change
          things up such that your custom code doesn't work anymore? It's alot
          like a core class being subclassed externally, and then change to the
          core class break those external subclasses.

          EG say we had not handled numerics for 2.9, and users go and do
          "option 2" (the quick & dirty, but simplest, way to get
          NumericRangeQueries out). Then, say in 3.1 we implement the proposed
          fix here ("option 1"). Suddenly, we've altered what nodes come out of
          the processor pipeline, because we've created a new NumericRangeQuery
          node, and so the builders that users had added, for the RangeQuery
          node, will no loner be invoked. How are we going to handle
          back-compat here?

          I think it's already happening with the "old" QP. It used to output RangeQuery objects and now it outputs TermRangeQuery objects. How is it going to be handled buy users expecting RangeQuery objects?

          The "new" QP builder, delegates a query node based on its class to a builder, if there is no builder that knows how to build an object from that class it keeps looking up in the class hierarchy until it finds a builder that knows how to. Query nodes are supposed to be conceptual objects, they just represent some concept X, and ideally anything that fits in this concept should inherit from it, this way the user can create their own specific query nodes with no need to change how they are built (if there is no need for that). What I'm trying to say here is that if I create a node Y which extends X, I don't need to specify a new YBuilder for it, the XBuilder will be used. So, ideally, NumericRangeQueryNode should extends RangeQueryNode, the problem here is that we also need to specify a builder for the NumericRangeNode, and if the user sets a builder for RangeNode it will never be invoked for NumericRangeNode objects. Maybe it shouldn't at all, because if a new builder was specified for NumericRangeNode, it means a new kind of object should be built from it, something the user probably don't know yet, since it's a new kind of node, and his custom code needs to be updated anyway to support it.

          Howerver, there is a solution for this kind of back-compat problem (which I don't think it is). In a future release, if a new XRangeQueryNode is created, instead of set

           luceneBuilderMap.setBuilder(RangeQueryNode.class, new RangeQueryNodeBuilder());
          luceneBuilderMap.setBuilder(XRangeQueryNode.class, new XRangeQueryNodeBuilder());
          

          We could do:

          rangeBuilderMap.setBuilder(RangeQueryNode.class, new RangeQueryNodeBuilder());
          rangeBuilderMap.setBuilder(XRangeQueryNode.class, new XRangeQueryNodeBuilder());
          
          // then
          
          luceneBuilderMap.setBuilder(RangeQueryNode.class, rangeBuilderMap);
          

          This way, if the user reset the RangeQueryNode builder to its own builder, it will still be called for XRangeQueryNode and RangeQueryNode objects.

          Let me know if there is any question about what I just described.

          Show
          Adriano Crestani added a comment - The proposed RangeTools seems like a good approach, and I like how it cleanly absorbs the Date precisions that the old queryParser also supports. You meant DateTools, right?! I don't see so much difference to use this same approach over the "option1". You have a map based from field name to the DateTools.Resolution used for that field. Which is the same feature we want to implement on this JIRA, something you could configure how you are going to resolve the value defined on a range query based on the field name. The only difference is that we are expanding the options the user will have to resolve the values: RangeUtils.NUMERIC, RangeUtils.DATE, RangeUtils.FLOAT, etc...let me know if I missed or missunderstood something on this part. Here's one side-question, about back compat promises for the new QueryParser: we are suggesting the users can start from all the building blocks in StandardQueryParser, and override the processors, create new nodes, builders, etc. with their own. But this is potentially dangerous, in that the next version of Lucene might change things up such that your custom code doesn't work anymore? It's alot like a core class being subclassed externally, and then change to the core class break those external subclasses. EG say we had not handled numerics for 2.9, and users go and do "option 2" (the quick & dirty, but simplest, way to get NumericRangeQueries out). Then, say in 3.1 we implement the proposed fix here ("option 1"). Suddenly, we've altered what nodes come out of the processor pipeline, because we've created a new NumericRangeQuery node, and so the builders that users had added, for the RangeQuery node, will no loner be invoked. How are we going to handle back-compat here? I think it's already happening with the "old" QP. It used to output RangeQuery objects and now it outputs TermRangeQuery objects. How is it going to be handled buy users expecting RangeQuery objects? The "new" QP builder, delegates a query node based on its class to a builder, if there is no builder that knows how to build an object from that class it keeps looking up in the class hierarchy until it finds a builder that knows how to. Query nodes are supposed to be conceptual objects, they just represent some concept X, and ideally anything that fits in this concept should inherit from it, this way the user can create their own specific query nodes with no need to change how they are built (if there is no need for that). What I'm trying to say here is that if I create a node Y which extends X, I don't need to specify a new YBuilder for it, the XBuilder will be used. So, ideally, NumericRangeQueryNode should extends RangeQueryNode, the problem here is that we also need to specify a builder for the NumericRangeNode, and if the user sets a builder for RangeNode it will never be invoked for NumericRangeNode objects. Maybe it shouldn't at all, because if a new builder was specified for NumericRangeNode, it means a new kind of object should be built from it, something the user probably don't know yet, since it's a new kind of node, and his custom code needs to be updated anyway to support it. Howerver, there is a solution for this kind of back-compat problem (which I don't think it is). In a future release, if a new XRangeQueryNode is created, instead of set luceneBuilderMap.setBuilder(RangeQueryNode.class, new RangeQueryNodeBuilder()); luceneBuilderMap.setBuilder(XRangeQueryNode.class, new XRangeQueryNodeBuilder()); We could do: rangeBuilderMap.setBuilder(RangeQueryNode.class, new RangeQueryNodeBuilder()); rangeBuilderMap.setBuilder(XRangeQueryNode.class, new XRangeQueryNodeBuilder()); // then luceneBuilderMap.setBuilder(RangeQueryNode.class, rangeBuilderMap); This way, if the user reset the RangeQueryNode builder to its own builder, it will still be called for XRangeQueryNode and RangeQueryNode objects. Let me know if there is any question about what I just described.
          Hide
          Uwe Schindler added a comment -

          I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class (just a bigger enumeration with a good name, not Utils/.Tools. e.g. RangeQueryDataType). By that you can define simply the type of a range query: term, numeric-int, numeric-float, numeric-double, date-precision-xxx,... Based on this enumeration, the upper/lower terms are parsed differently and different query objects are created. We just need to list all possible combinations of data types, the user could create: We could make this class extensible, if it is a Lucene Parameter class also supporting the parsing and building: One could simply create a new constant for his specific range type and supply methods to parse and build the query in the constant's implementation (so each constant contains also code to parse/build). I am not sure how to do this with the new parser. I think of the same like the MTQRewriteMethod (final static singletons in MTQ that do the rewrite and can be passed as parameter).

          Maybe we can use this also to upgrade the old query parser if it gets not deprecated.

          I think it's already happening with the "old" QP. It used to output RangeQuery objects and now it outputs TermRangeQuery objects. How is it going to be handled buy users expecting RangeQuery objects?

          I was thinking about that, too. But here the API clearly defines, that getRangeQuery() returns a Query object without further specification. So the change was correct from the API/BW side. The change that another object is returned is documented in CHANGES.txt (as far as I know). We have here the same problem: You change the inner class implementations, but the abstract QueryParser's API is stable. The general contract when doing such things is, that you use instanceof checks before you try to cast some abstract return type to something specific, not documented.

          You have the same in various factories also in the very bw-oriented JDK: XML factories create things like SAXParser and so on. If you cast the returned objects to some special implementation class, its your problem, because you remove the abstraction and work with implementations. This happened e.g. from the change between Java 1.4 to 1.5, when the internal SAX parsers were exchanged and their class names changed. A lot of programs broke by that, because the developers casted the objects returned from factories without instanceof checks.

          Show
          Uwe Schindler added a comment - I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class (just a bigger enumeration with a good name, not Utils/ .Tools. e.g. RangeQueryDataType). By that you can define simply the type of a range query: term, numeric-int, numeric-float, numeric-double, date-precision-xxx,... Based on this enumeration, the upper/lower terms are parsed differently and different query objects are created. We just need to list all possible combinations of data types, the user could create: We could make this class extensible, if it is a Lucene Parameter class also supporting the parsing and building: One could simply create a new constant for his specific range type and supply methods to parse and build the query in the constant's implementation (so each constant contains also code to parse/build). I am not sure how to do this with the new parser. I think of the same like the MTQRewriteMethod (final static singletons in MTQ that do the rewrite and can be passed as parameter). Maybe we can use this also to upgrade the old query parser if it gets not deprecated. I think it's already happening with the "old" QP. It used to output RangeQuery objects and now it outputs TermRangeQuery objects. How is it going to be handled buy users expecting RangeQuery objects? I was thinking about that, too. But here the API clearly defines, that getRangeQuery() returns a Query object without further specification. So the change was correct from the API/BW side. The change that another object is returned is documented in CHANGES.txt (as far as I know). We have here the same problem: You change the inner class implementations, but the abstract QueryParser's API is stable. The general contract when doing such things is, that you use instanceof checks before you try to cast some abstract return type to something specific, not documented. You have the same in various factories also in the very bw-oriented JDK: XML factories create things like SAXParser and so on. If you cast the returned objects to some special implementation class, its your problem, because you remove the abstraction and work with implementations. This happened e.g. from the change between Java 1.4 to 1.5, when the internal SAX parsers were exchanged and their class names changed. A lot of programs broke by that, because the developers casted the objects returned from factories without instanceof checks.
          Hide
          Michael McCandless added a comment -

          I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class

          +1

          Howerver, there is a solution for this kind of back-compat problem (which I don't think it is).

          Actually, on reading your explanation I agree it's not really a back compat break, since the user's custom builder for RangeQueryNode would still be invoked, and the core's builder for NumericRangeQuery would handle the newly added numeric range support. I think this is reasonable.

          Show
          Michael McCandless added a comment - I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class +1 Howerver, there is a solution for this kind of back-compat problem (which I don't think it is). Actually, on reading your explanation I agree it's not really a back compat break, since the user's custom builder for RangeQueryNode would still be invoked, and the core's builder for NumericRangeQuery would handle the newly added numeric range support. I think this is reasonable.
          Hide
          Adriano Crestani added a comment -

          I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class

          +1 this way is easier for the user to config

          I was thinking about that, too. But here the API clearly defines, that getRangeQuery() returns a Query object without further specification. So the change was correct from the API/BW side. The change that another object is returned is documented in CHANGES.txt (as far as I know). We have here the same problem: You change the inner class implementations, but the abstract QueryParser's API is stable. The general contract when doing such things is, that you use instanceof checks before you try to cast some abstract return type to something specific, not documented.

          Agreed, I also think it's fine as long as it's documented

          Show
          Adriano Crestani added a comment - I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class +1 this way is easier for the user to config I was thinking about that, too. But here the API clearly defines, that getRangeQuery() returns a Query object without further specification. So the change was correct from the API/BW side. The change that another object is returned is documented in CHANGES.txt (as far as I know). We have here the same problem: You change the inner class implementations, but the abstract QueryParser's API is stable. The general contract when doing such things is, that you use instanceof checks before you try to cast some abstract return type to something specific, not documented. Agreed, I also think it's fine as long as it's documented
          Hide
          Luis Alves added a comment - - edited

          I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class

          +1

          I am not sure how to do this with the new parser. I think of the same like the MTQRewriteMethod (final static singletons in MTQ that do the rewrite and can be passed as parameter).

          I think we probably should have TermRangeQueryNode a NumericRangeQueryNode and 2 builders classes that match that, change ParametricRangeQueryNodeProcessor to do the dirty work and create the new TermRangeQueryNode and NumericRangeQueryNode in the correct places, based on the a map with [field name,RangeTools.TYPE] or something similar.
          The builders should simple and just convert each type to the correct Lucene Object.

          • we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name)
          • create the new NumericRangeQueryNode that extends from TermRangeQueryNode
          • change the ParametricRangeQueryNodeProcessor to read the configuration passed by the user and create the correct QueryNode objects.
          • create a new NumericRangeQueryNodeBuilder add it to the StandardQueryTreeBuilder mapping.

          I hope this helps

          Show
          Luis Alves added a comment - - edited I would propose to absorb the RangeTools/Utils and DateTools/Utils (ehat is the correct name???) in one configuration class +1 I am not sure how to do this with the new parser. I think of the same like the MTQRewriteMethod (final static singletons in MTQ that do the rewrite and can be passed as parameter). I think we probably should have TermRangeQueryNode a NumericRangeQueryNode and 2 builders classes that match that, change ParametricRangeQueryNodeProcessor to do the dirty work and create the new TermRangeQueryNode and NumericRangeQueryNode in the correct places, based on the a map with [field name,RangeTools.TYPE] or something similar. The builders should simple and just convert each type to the correct Lucene Object. we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name) create the new NumericRangeQueryNode that extends from TermRangeQueryNode change the ParametricRangeQueryNodeProcessor to read the configuration passed by the user and create the correct QueryNode objects. create a new NumericRangeQueryNodeBuilder add it to the StandardQueryTreeBuilder mapping. I hope this helps
          Hide
          Yonik Seeley added a comment -

          If the existing query parser is not being deprecated, should this issue be pushed out to 3.0 or 3.1 to give it more time? In the meantime, people can use the existing override getRangeQuery() method. 2.9 is looking really close.

          Show
          Yonik Seeley added a comment - If the existing query parser is not being deprecated, should this issue be pushed out to 3.0 or 3.1 to give it more time? In the meantime, people can use the existing override getRangeQuery() method. 2.9 is looking really close.
          Hide
          Mark Miller added a comment -

          Finally read through this whole issue.

          If the existing query parser is not being deprecated, should this issue be pushed out to 3.0 or 3.1 to give it more time? In the meantime, people can use the existing override getRangeQuery() method. 2.9 is looking really close.

          +1 on pushing this. getRangeQuery() will still be first class.

          It does seem like we should at least do this though:

          we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name)

          Show
          Mark Miller added a comment - Finally read through this whole issue. If the existing query parser is not being deprecated, should this issue be pushed out to 3.0 or 3.1 to give it more time? In the meantime, people can use the existing override getRangeQuery() method. 2.9 is looking really close. +1 on pushing this. getRangeQuery() will still be first class. It does seem like we should at least do this though: we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name)
          Hide
          Luis Alves added a comment -

          If the existing query parser is not being deprecated, should this issue be pushed out to 3.0 or 3.1 to give it more time? In the meantime, people can use the existing override getRangeQuery() method. 2.9 is looking really close.

          +1

          Show
          Luis Alves added a comment - If the existing query parser is not being deprecated, should this issue be pushed out to 3.0 or 3.1 to give it more time? In the meantime, people can use the existing override getRangeQuery() method. 2.9 is looking really close. +1
          Hide
          Uwe Schindler added a comment -

          we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name)

          I would not do this. RangeQueryNode is in the syntax tree and the syntax of numeric and term ranges is equal, so the query parser cannot know what type of query it is. When this issue is fixed 3.1, this node will use the configuration of data types for field names (date, numeric, term) to create the correct range query.

          +1 on pushing this. getRangeQuery() will still be first class.

          As noted in my comment on java-dev: We should add a comment in Javadocs, that the old (and also new) query parser do not work automatically with NumericRangeQuery, and that you should override getRangeQuery() and do a case-switch on the field name. I will do this later this day.

          Show
          Uwe Schindler added a comment - we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name) I would not do this. RangeQueryNode is in the syntax tree and the syntax of numeric and term ranges is equal, so the query parser cannot know what type of query it is. When this issue is fixed 3.1, this node will use the configuration of data types for field names (date, numeric, term) to create the correct range query. +1 on pushing this. getRangeQuery() will still be first class. As noted in my comment on java-dev: We should add a comment in Javadocs, that the old (and also new) query parser do not work automatically with NumericRangeQuery, and that you should override getRangeQuery() and do a case-switch on the field name. I will do this later this day.
          Hide
          Adriano Crestani added a comment -

          we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name)

          I would not do this. RangeQueryNode is in the syntax tree and the syntax of numeric and term ranges is equal, so the query parser cannot know what type of query it is. When this issue is fixed 3.1, this node will use the configuration of data types for field names (date, numeric, term) to create the correct range query.

          I think it's ok to rename, as far as I know, the standard.parser.SyntaxParser generates ParametricRangeQueryNode from a range query, which has 2 ParametricQueryNode as child. So, the range processor, will need to convert the 2 ParametricQueryNode to the respective type, based on the user config: TermRangeQueryNode (renamed from RangeQueryNode) or NumericRangeQueryNode.

          Show
          Adriano Crestani added a comment - we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name) I would not do this. RangeQueryNode is in the syntax tree and the syntax of numeric and term ranges is equal, so the query parser cannot know what type of query it is. When this issue is fixed 3.1, this node will use the configuration of data types for field names (date, numeric, term) to create the correct range query. I think it's ok to rename, as far as I know, the standard.parser.SyntaxParser generates ParametricRangeQueryNode from a range query, which has 2 ParametricQueryNode as child. So, the range processor, will need to convert the 2 ParametricQueryNode to the respective type, based on the user config: TermRangeQueryNode (renamed from RangeQueryNode) or NumericRangeQueryNode.
          Hide
          Adriano Crestani added a comment -

          I think this is also a good candidate for GSoC 2011. I will add the labels to it.

          Any comments?

          Show
          Adriano Crestani added a comment - I think this is also a good candidate for GSoC 2011. I will add the labels to it. Any comments?
          Hide
          Adriano Crestani added a comment -

          Hi Uwe,

          Are you willing to mentor this project on GSoC? If you are, I will keep it assigned to you, otherwise let me know so I assign it to me

          Show
          Adriano Crestani added a comment - Hi Uwe, Are you willing to mentor this project on GSoC? If you are, I will keep it assigned to you, otherwise let me know so I assign it to me
          Hide
          Uwe Schindler added a comment -

          I can help!

          Show
          Uwe Schindler added a comment - I can help!
          Hide
          Vinicius Barros added a comment -

          Hi Uwe and Adriano,

          I read the description, javadocs about the numeric support and the new query parser and all the comments here. Let me try to summarize what needs to be done here:

          figure out a way to configure in query parser which fields are numeric, and for each of these fields, the numeric type must also be defined...from the comments, it seems the best way to do this is using a map field>NumericType

          -create a NumericQueryNodeProcessor that converts ParametricRangeQueryNode to NumericRangeQueryNode, when its field is a numeric field. This processor should also convert the range string values to numeric values based on the NumericType

          -create a NumericRangeQueryNodeBuilder, which will build which will build NumericRangeQuery objects from NumericRangeQueryNode objects

          -rename RangeQueryNode to TermRangeQueryNode as it will only be used for string

          -create a NumericRangeQueryNode which will be used for any non-string range query

          -merge DateTools with a new NumericTools class. Does that make sense? I am not sure if I got everything correctly here.

          Some questions below:

          Luis:
          create the new NumericRangeQueryNode that extends from TermRangeQueryNode

          -should NumericRangeQueryNode extends TermRangeQueryNode? I don't see any reason for that, since one will hold Number values and the other String values

          -I remember the old date query, using strings, used to not only allow range queries, but also term queries (date:2010/10/10), is that correct? Does numeric fields also support this kind of query? I could only fine NumericRangeQuery, but no NumericQuery. If the user enters (age:19) in the query, and "age" is a numeric field, should the query parser throw an error saying it's not suppported?

          I am planning to create a GSOC proposal for this project, it looks interesting, very cool this new support to numeric in Lucene, I missed that first time I used Lucene, maybe because I was used to regular databases. Also, the query parser uses some design patterns I have been reading about lately, as builders and processors.

          Show
          Vinicius Barros added a comment - Hi Uwe and Adriano, I read the description, javadocs about the numeric support and the new query parser and all the comments here. Let me try to summarize what needs to be done here: figure out a way to configure in query parser which fields are numeric, and for each of these fields, the numeric type must also be defined...from the comments, it seems the best way to do this is using a map field >NumericType -create a NumericQueryNodeProcessor that converts ParametricRangeQueryNode to NumericRangeQueryNode, when its field is a numeric field. This processor should also convert the range string values to numeric values based on the NumericType -create a NumericRangeQueryNodeBuilder, which will build which will build NumericRangeQuery objects from NumericRangeQueryNode objects -rename RangeQueryNode to TermRangeQueryNode as it will only be used for string -create a NumericRangeQueryNode which will be used for any non-string range query -merge DateTools with a new NumericTools class. Does that make sense? I am not sure if I got everything correctly here. Some questions below: Luis: create the new NumericRangeQueryNode that extends from TermRangeQueryNode -should NumericRangeQueryNode extends TermRangeQueryNode? I don't see any reason for that, since one will hold Number values and the other String values -I remember the old date query, using strings, used to not only allow range queries, but also term queries (date:2010/10/10), is that correct? Does numeric fields also support this kind of query? I could only fine NumericRangeQuery, but no NumericQuery. If the user enters (age:19) in the query, and "age" is a numeric field, should the query parser throw an error saying it's not suppported? I am planning to create a GSOC proposal for this project, it looks interesting, very cool this new support to numeric in Lucene, I missed that first time I used Lucene, maybe because I was used to regular databases. Also, the query parser uses some design patterns I have been reading about lately, as builders and processors.
          Hide
          Adriano Crestani added a comment -

          Hi Vinicius,

          Nice summary! There is a formatting problem, but I think people can understand it.

          I had to re-read all the comments, it took sometime, it seems you got all the main points in the summary.

          -should NumericRangeQueryNode extends TermRangeQueryNode? I don't see any reason for that, since one will hold Number values and the other String values

          You are right, it does not make sense to one extend the other. However, I think they should have a common parent (e.g. <interface> RangeQueryNode), that will have common methods like QueryNode getLowRange().

          I will let Uwe answer the other questions, I am curious to know the answers too

          The student proposal period has started, so go ahead and start drafting it

          Show
          Adriano Crestani added a comment - Hi Vinicius, Nice summary! There is a formatting problem, but I think people can understand it. I had to re-read all the comments, it took sometime, it seems you got all the main points in the summary. -should NumericRangeQueryNode extends TermRangeQueryNode? I don't see any reason for that, since one will hold Number values and the other String values You are right, it does not make sense to one extend the other. However, I think they should have a common parent (e.g. <interface> RangeQueryNode), that will have common methods like QueryNode getLowRange(). I will let Uwe answer the other questions, I am curious to know the answers too The student proposal period has started, so go ahead and start drafting it
          Hide
          Uwe Schindler added a comment -
          • I remember the old date query, using strings, used to not only allow range queries, but also term queries (date:2010/10/10), is that correct? Does numeric fields also support this kind of query? I could only fine NumericRangeQuery, but no NumericQuery. If the user enters (age:19) in the query, and "age" is a numeric field, should the query parser throw an error saying it's not suppported?

          To create something like a NumericQuery (age:19), the correct and most performant way is to use a NumericRangeQuery with (includeLower==includeUpper)==true and (lower==upper)==value. This query is since Lucene 2.9 always rewritten in the most optimal way (internally it uses a ConstantScore TermQuery using the prefix encoded term. This should also be noted in JavaDocs?

          This is also how Solr's QP works.

          Show
          Uwe Schindler added a comment - I remember the old date query, using strings, used to not only allow range queries, but also term queries (date:2010/10/10), is that correct? Does numeric fields also support this kind of query? I could only fine NumericRangeQuery, but no NumericQuery. If the user enters (age:19) in the query, and "age" is a numeric field, should the query parser throw an error saying it's not suppported? To create something like a NumericQuery (age:19), the correct and most performant way is to use a NumericRangeQuery with (includeLower==includeUpper)==true and (lower==upper)==value. This query is since Lucene 2.9 always rewritten in the most optimal way (internally it uses a ConstantScore TermQuery using the prefix encoded term. This should also be noted in JavaDocs? This is also how Solr's QP works.
          Hide
          Uwe Schindler added a comment -

          -merge DateTools with a new NumericTools class. Does that make sense? I am not sure if I got everything correctly here.

          Do you mean Lucene'Core's org.apache.lucene.document.DateTools? Because this class is somehow deprecated (not officially) but all those tool classes should not be used together with NRQ. Or does the new QP has its own tools classes?

          Show
          Uwe Schindler added a comment - -merge DateTools with a new NumericTools class. Does that make sense? I am not sure if I got everything correctly here. Do you mean Lucene'Core's org.apache.lucene.document.DateTools? Because this class is somehow deprecated (not officially) but all those tool classes should not be used together with NRQ. Or does the new QP has its own tools classes?
          Hide
          Nikola Tankovic added a comment -

          Hi folks,

          I'm PhD student from Croatia willing to participate in GSoC this year. I work in Croatian firm called Superius Ltd on software modelling based on graph database (Neo4J to be concrete). This issue here sounds like a nice addition to Lucene that would help us also make queries that are needed over our business data contained in graph (indexed by Lucene). I was wondering whether is this project still open for GSoC or already assigned to someone?

          Thank you!

          Show
          Nikola Tankovic added a comment - Hi folks, I'm PhD student from Croatia willing to participate in GSoC this year. I work in Croatian firm called Superius Ltd on software modelling based on graph database (Neo4J to be concrete). This issue here sounds like a nice addition to Lucene that would help us also make queries that are needed over our business data contained in graph (indexed by Lucene). I was wondering whether is this project still open for GSoC or already assigned to someone? Thank you!
          Hide
          Vinicius Barros added a comment -

          Hi,

          Thanks Uwe and Adriano,

          I finally finished and submitted my proposal to this project, please, take a look and tell me if I need to change something. My linkid is viniciusbarros

          Sorry for taking so long to submit it, but just got a free time this weekend, college stuffs are keeping me busy.

          Uwe: I added to my proposal the idea of enabling the user to enter a query that searches for a single numeric value, example, age:19.

          About DateTools, I think this can be decided later, in the end it's just a class with some format options the user may choose. Anyway, the numeric already have a pre-defined form to format number in strings before indexing right?! Take a look to what I have defined in my proposal, where I allow the user to specify a Format object, which is used by the query parser to parse the string value entered by the user to Number. Please, let me know if I am not getting something here.

          Show
          Vinicius Barros added a comment - Hi, Thanks Uwe and Adriano, I finally finished and submitted my proposal to this project, please, take a look and tell me if I need to change something. My linkid is viniciusbarros Sorry for taking so long to submit it, but just got a free time this weekend, college stuffs are keeping me busy. Uwe: I added to my proposal the idea of enabling the user to enter a query that searches for a single numeric value, example, age:19. About DateTools, I think this can be decided later, in the end it's just a class with some format options the user may choose. Anyway, the numeric already have a pre-defined form to format number in strings before indexing right?! Take a look to what I have defined in my proposal, where I allow the user to specify a Format object, which is used by the query parser to parse the string value entered by the user to Number. Please, let me know if I am not getting something here.
          Hide
          Adriano Crestani added a comment -

          Hi Nikola,

          That's great you are interested in submit a proposal for gsoc this year. Proposal submissions are still open until next 8th (check GSOC timeline at the website). It means you can submit a proposal to any project or even suggest your own project (in this case the community will need to accept it as a gsoc project).

          There is already a proposal for this project, but feel free to submit another one. To raise your chances of getting into gsoc this year, I would suggest you to apply to a project with no candidates yet, check the Lucene projects here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=labels+%3D+lucene-gsoc-11

          Good luck!

          Show
          Adriano Crestani added a comment - Hi Nikola, That's great you are interested in submit a proposal for gsoc this year. Proposal submissions are still open until next 8th (check GSOC timeline at the website). It means you can submit a proposal to any project or even suggest your own project (in this case the community will need to accept it as a gsoc project). There is already a proposal for this project, but feel free to submit another one. To raise your chances of getting into gsoc this year, I would suggest you to apply to a project with no candidates yet, check the Lucene projects here: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=labels+%3D+lucene-gsoc-11 Good luck!
          Hide
          Adriano Crestani added a comment -

          Hi Vinicius,

          Your proposal looks good, details everything you intend to do and the proposed solutions looks good to me, include what the community has previously discussed.

          +1 for adding support for simple numeric queries as age:19

          One thing I would suggest you to change is to make it clearer the query parser you are intending to change is the contrib query parser, to be more specific the standard implementation. You just mention it only once in the entire proposal!

          Show
          Adriano Crestani added a comment - Hi Vinicius, Your proposal looks good, details everything you intend to do and the proposed solutions looks good to me, include what the community has previously discussed. +1 for adding support for simple numeric queries as age:19 One thing I would suggest you to change is to make it clearer the query parser you are intending to change is the contrib query parser, to be more specific the standard implementation. You just mention it only once in the entire proposal!
          Hide
          Vinicius Barros added a comment -

          Thanks for reviewing it Adriano. I updated the proposal to clarify it's the contrib query parser.

          Show
          Vinicius Barros added a comment - Thanks for reviewing it Adriano. I updated the proposal to clarify it's the contrib query parser.
          Hide
          Vinicius Barros added a comment -

          This patch includes the work I did this first week. I started with one of the project's objective: restructure RangeQueryNode and its related classes to support number and text range queries.

          I created some querynode interfaces, such as ValueQueryNode that abstract the value a leaf node may hold, since now, leaf nodes do not only hold text anymore, but also number values.

          Let me know if you have any questions or any suggestions about the code.

          I expect I created the patch correctly, as it's the first time I play with subversion

          Show
          Vinicius Barros added a comment - This patch includes the work I did this first week. I started with one of the project's objective: restructure RangeQueryNode and its related classes to support number and text range queries. I created some querynode interfaces, such as ValueQueryNode that abstract the value a leaf node may hold, since now, leaf nodes do not only hold text anymore, but also number values. Let me know if you have any questions or any suggestions about the code. I expect I created the patch correctly, as it's the first time I play with subversion
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius,

          thanks for your update! The patch looks correct, I assume it's for Lucene trunk? You should produce them against the top-level directory (below trunk/), not the lucene sub-directory (since Lucene was merged with Solr last year).

          I have not closely looked at the code (came back from California recently), but your refactoring as a first step looks fine. I would only suggest to never depend on the default locale (Robert Muir will tell you the same thing), so it should respect the local given to query parser.

          I will report back, when I had time to look into it, but it looks really fine - also from the Generics Policeman Point of View g

          Show
          Uwe Schindler added a comment - Hi Vinicius, thanks for your update! The patch looks correct, I assume it's for Lucene trunk? You should produce them against the top-level directory (below trunk/), not the lucene sub-directory (since Lucene was merged with Solr last year). I have not closely looked at the code (came back from California recently), but your refactoring as a first step looks fine. I would only suggest to never depend on the default locale (Robert Muir will tell you the same thing), so it should respect the local given to query parser. I will report back, when I had time to look into it, but it looks really fine - also from the Generics Policeman Point of View g
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          Thanks for quickly reviewing the patch. Yes, I am using trunk's code. I will do the changes you suggested and include in the next patch.

          Show
          Vinicius Barros added a comment - Hi Uwe, Thanks for quickly reviewing the patch. Yes, I am using trunk's code. I will do the changes you suggested and include in the next patch.
          Hide
          Vinicius Barros added a comment -

          This is the patch for my second week of work.

          Show
          Vinicius Barros added a comment - This is the patch for my second week of work.
          Hide
          Vinicius Barros added a comment -

          The second patch includes the week1 changes plus: implemented classes to support numeric configuration in query parser

          I was not sure what to do about the locale. The locale is required by EscapeQuerySyntax.escape, which seems to escape characters so they don't mix with query parser's operators. The code I used locale I copied from FieldQueryNode, which uses the escaper and passes the default locale, however, other nodes as RegexpQueryNode ignore the escaper and just return the plain text. I was not sure what to do, then I am forcing the locale to ENGLISH now.

          I also took a long time to figure out how to implement the numeric configuration, it seemed to me the best approach was to copy the way FieldBoostAttribute is configured. It's complex, but it's the only way I found without doing any ugly workaround.

          Please, take a look at the code and give me some suggestions in case you thing I need to change something.

          PS: the patch is now created from the trunk folder, as Uwe suggested

          Thanks!

          Show
          Vinicius Barros added a comment - The second patch includes the week1 changes plus: implemented classes to support numeric configuration in query parser I was not sure what to do about the locale. The locale is required by EscapeQuerySyntax.escape, which seems to escape characters so they don't mix with query parser's operators. The code I used locale I copied from FieldQueryNode, which uses the escaper and passes the default locale, however, other nodes as RegexpQueryNode ignore the escaper and just return the plain text. I was not sure what to do, then I am forcing the locale to ENGLISH now. I also took a long time to figure out how to implement the numeric configuration, it seemed to me the best approach was to copy the way FieldBoostAttribute is configured. It's complex, but it's the only way I found without doing any ugly workaround. Please, take a look at the code and give me some suggestions in case you thing I need to change something. PS: the patch is now created from the trunk folder, as Uwe suggested Thanks!
          Hide
          Vinicius Barros added a comment -

          ah, one more thing. Uwe, what is "Generics Policeman Point of View"?

          Show
          Vinicius Barros added a comment - ah, one more thing. Uwe, what is "Generics Policeman Point of View"?
          Hide
          Uwe Schindler added a comment -

          ah, one more thing. Uwe, what is "Generics Policeman Point of View"?

          That's just my nickname, because I always watch correct usage of Java 5 Generics I just wanted to confirm, that your generification of some classes looked fine.

          I will revisit your patch tomorrow together with others here at BerlinBuzzword (a conference about Lucene and other NoSQL related stuff).

          Uwe

          Show
          Uwe Schindler added a comment - ah, one more thing. Uwe, what is "Generics Policeman Point of View"? That's just my nickname, because I always watch correct usage of Java 5 Generics I just wanted to confirm, that your generification of some classes looked fine. I will revisit your patch tomorrow together with others here at BerlinBuzzword (a conference about Lucene and other NoSQL related stuff). Uwe
          Hide
          Adriano Crestani added a comment -

          Vinicius: you are right, contrib QP configuration is very complicated, that's why Phillip is working on another GSOC project to make it simpler. So don't worry much about the best way to use the config API, since it will change, just make it work for now using the old API . You have a good point when you mentioned the escaper problem with Locale. I should think more about it...

          Uwe: Are you intending to commit the patch only at the end of gsoc? Just wondering, since Vinicius is not selecting the ASF checkbox when submitting the patch, which means the current patches will not be able to be committed.

          Show
          Adriano Crestani added a comment - Vinicius: you are right, contrib QP configuration is very complicated, that's why Phillip is working on another GSOC project to make it simpler. So don't worry much about the best way to use the config API, since it will change, just make it work for now using the old API . You have a good point when you mentioned the escaper problem with Locale. I should think more about it... Uwe: Are you intending to commit the patch only at the end of gsoc? Just wondering, since Vinicius is not selecting the ASF checkbox when submitting the patch, which means the current patches will not be able to be committed.
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius,

          if you want the code be committed later, you should check the license box ("Grant license to ASF for inclusion in ASF works (as per the Apache License §5)"), else we will be not able to submit it to the main repository.

          If you want us to commit the patch only at the end of GSOC, it's enough to check this box in your final submission, but it should be noted, that we may submit minor parts of the work even before (once you are at a state where it is 'useable' and passes existing tests). A second commit could e.g. adding sophisticated tests, and so on.

          Show
          Uwe Schindler added a comment - Hi Vinicius, if you want the code be committed later, you should check the license box ("Grant license to ASF for inclusion in ASF works (as per the Apache License §5)"), else we will be not able to submit it to the main repository. If you want us to commit the patch only at the end of GSOC, it's enough to check this box in your final submission, but it should be noted, that we may submit minor parts of the work even before (once you are at a state where it is 'useable' and passes existing tests). A second commit could e.g. adding sophisticated tests, and so on.
          Hide
          Uwe Schindler added a comment - - edited

          One small thing I have seen after applying your patch:
          The code guidelines of Lucene require no TABS but two whitespace to indent. We have a code style available for Eclipse and IDEA in the dev-tools folder (below trunk). You only have to install it.

          Also you are using Java 6 interface overrides, so the code does not compile with Java 5 (unfortunately this is a bug in Java 6's javac, as it does not complain when in "-source 1.5" mode). In Java 5 compatible code it is not allowed to add @Override to methods implemented for interfaces:

          common.compile-core:
              [mkdir] Created dir: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java
              [javac] Compiling 175 source files to C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:182: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:187: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\config\NumericFieldConfigListener.java:21: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:17: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:32: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:79: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:20: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:25: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:35: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:52: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:57: method does not override a method from its superclass
              [javac]     @Override
              [javac]          ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\parser\JavaCharStream.java:367: warning: [dep-ann] deprecated name isnt annotated with @Deprecated
              [javac]   public int getEndColumn() {
              [javac]              ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated
              [javac]   int getColumn();
              [javac]       ^
              [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:41: warning: [dep-ann] deprecated name isnt annotated with @Deprecated
              [javac]   int getLine();
              [javac]       ^
              [javac] Note: Some input files use or override a deprecated API.
              [javac] Note: Recompile with -Xlint:deprecation for details.
              [javac] 11 errors
              [javac] 3 warnings
          

          With Java 6 the code compiles, but some tests fail to work. I assume its simply because of the work-in-progress,

          Show
          Uwe Schindler added a comment - - edited One small thing I have seen after applying your patch: The code guidelines of Lucene require no TABS but two whitespace to indent. We have a code style available for Eclipse and IDEA in the dev-tools folder (below trunk). You only have to install it. Also you are using Java 6 interface overrides, so the code does not compile with Java 5 (unfortunately this is a bug in Java 6's javac, as it does not complain when in "-source 1.5" mode). In Java 5 compatible code it is not allowed to add @Override to methods implemented for interfaces: common.compile-core: [mkdir] Created dir: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java [javac] Compiling 175 source files to C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:182: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:187: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\config\NumericFieldConfigListener.java:21: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:17: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:32: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:79: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:20: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:25: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:35: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:52: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:57: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\parser\JavaCharStream.java:367: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] public int getEndColumn() { [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getColumn(); [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:41: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getLine(); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 11 errors [javac] 3 warnings With Java 6 the code compiles, but some tests fail to work. I assume its simply because of the work-in-progress,
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          Thanks for reviewing the patch again. I will fix the problems you mentioned.

          I do not think the code is ready to be committed, I am just sending the patches so you can keep track of my progress. I hope to have something useable soon, then you can commit, probably before the end of gsoc.

          Show
          Vinicius Barros added a comment - Hi Uwe, Thanks for reviewing the patch again. I will fix the problems you mentioned. I do not think the code is ready to be committed, I am just sending the patches so you can keep track of my progress. I hope to have something useable soon, then you can commit, probably before the end of gsoc.
          Hide
          Adriano Crestani added a comment -

          Hi Vinicius,

          Assuming you are using eclipse, you can find the codestyle used to create lucene code at the bottom of this page: http://wiki.apache.org/lucene-java/HowToContribute, it will fix the identation problem Uwe mentioned.

          After reviewing the code, I want to remember you that all files must have the ASF header. Take a look at the other Java classes in Lucene repository so you can have an example.

          The way you are organizing the code looks good to me, just make sure whenever you add a new class to contrib query parser, place it under the right package. "core", if the class is generic and might be used by other queryparser implemenations; "standard", if the class is specific to lucene standard query parser implementation.

          Show
          Adriano Crestani added a comment - Hi Vinicius, Assuming you are using eclipse, you can find the codestyle used to create lucene code at the bottom of this page: http://wiki.apache.org/lucene-java/HowToContribute , it will fix the identation problem Uwe mentioned. After reviewing the code, I want to remember you that all files must have the ASF header. Take a look at the other Java classes in Lucene repository so you can have an example. The way you are organizing the code looks good to me, just make sure whenever you add a new class to contrib query parser, place it under the right package. "core", if the class is generic and might be used by other queryparser implemenations; "standard", if the class is specific to lucene standard query parser implementation.
          Hide
          Adriano Crestani added a comment -

          One more thing I forgot to mention, when creating new QueryNodes, try to enforce the user when using the constructor to pass the required arguments. For example: NumericQueryNode does not have any constructor, I would suggest you to change it to NumericQueryNode(CharSequence field, Number number, NumberFormat format).

          Show
          Adriano Crestani added a comment - One more thing I forgot to mention, when creating new QueryNodes, try to enforce the user when using the constructor to pass the required arguments. For example: NumericQueryNode does not have any constructor, I would suggest you to change it to NumericQueryNode(CharSequence field, Number number, NumberFormat format).
          Hide
          Vinicius Barros added a comment -

          This patch includes:

          -changes suggested by Adriano and Uwe:
          -remvoed @Override
          -applied Lucene codestyle
          -created constructor to NumericQueryNode
          -added ASF header to new classes
          -Implemented new NumericQueryNodeProcessor and NumericRangeQueryNodeProcessor

          Show
          Vinicius Barros added a comment - This patch includes: -changes suggested by Adriano and Uwe: -remvoed @Override -applied Lucene codestyle -created constructor to NumericQueryNode -added ASF header to new classes -Implemented new NumericQueryNodeProcessor and NumericRangeQueryNodeProcessor
          Hide
          Vinicius Barros added a comment -

          This patch includes the builder for numeric range queries. This week I intend to start writing junits.

          Show
          Vinicius Barros added a comment - This patch includes the builder for numeric range queries. This week I intend to start writing junits.
          Hide
          Vinicius Barros added a comment -

          This patch includes code from week 5 and 6. Sorry for taking so long to submit a new patch, but I was hitting many problems when I finally started run the junits.

          I updated my workspace and got the new changes which use NumericField to add numeric fields to Documents. So I adapted my junits to that and also refactored some code to use NumericField.DataType in NumericConfig, instead of using Class<? extends Number> when specifying the NumericConfig type.

          I added a junit class (TestNumericQueryParser). I reused some LuceneTestCase methods, mainly the ones that generate random things. I took sometime to get where I wanted, I was hitting many problems when using dates in range. In the end, I liked the way the junit is running, inputs are very random, mainly the date formats, using randomly different format styles and locale. I ran the junit in a loop 1000 times and I got no error, so I think it's pretty stable now. Uwe, please, take a look and let me know what you think about it.

          testInclusiveLowerRangeQuery and testInclusiveUpperRangeQuery are commented since I found that StandardQueryParser is not supporting mixed include syntaxes, such as {1 TO 2]. Shouldn't it be supported? I don't see why not.

          Also, the query parser is not supporting open ended range queries, such as [... to 3].

          I am checking "Grant license to ASF for inclusion in ASF works (as per the Apache License §5)" now, because I think the patch is finally ready to be checked in. I ran Lucene top level ant and it finished successfully.

          I will work on creating javadocs now.

          Show
          Vinicius Barros added a comment - This patch includes code from week 5 and 6. Sorry for taking so long to submit a new patch, but I was hitting many problems when I finally started run the junits. I updated my workspace and got the new changes which use NumericField to add numeric fields to Documents. So I adapted my junits to that and also refactored some code to use NumericField.DataType in NumericConfig, instead of using Class<? extends Number> when specifying the NumericConfig type. I added a junit class (TestNumericQueryParser). I reused some LuceneTestCase methods, mainly the ones that generate random things. I took sometime to get where I wanted, I was hitting many problems when using dates in range. In the end, I liked the way the junit is running, inputs are very random, mainly the date formats, using randomly different format styles and locale. I ran the junit in a loop 1000 times and I got no error, so I think it's pretty stable now. Uwe, please, take a look and let me know what you think about it. testInclusiveLowerRangeQuery and testInclusiveUpperRangeQuery are commented since I found that StandardQueryParser is not supporting mixed include syntaxes, such as {1 TO 2]. Shouldn't it be supported? I don't see why not. Also, the query parser is not supporting open ended range queries, such as [... to 3] . I am checking "Grant license to ASF for inclusion in ASF works (as per the Apache License §5)" now, because I think the patch is finally ready to be checked in. I ran Lucene top level ant and it finished successfully. I will work on creating javadocs now.
          Hide
          Adriano Crestani added a comment -

          testInclusiveLowerRangeQuery and testInclusiveUpperRangeQuery are commented since I found that StandardQueryParser is not supporting mixed include syntaxes, such as {1 TO 2]. Shouldn't it be supported? I don't see why not.

          I will take a look into it, thanks for reporting it, this might really be a bug

          Show
          Adriano Crestani added a comment - testInclusiveLowerRangeQuery and testInclusiveUpperRangeQuery are commented since I found that StandardQueryParser is not supporting mixed include syntaxes, such as {1 TO 2]. Shouldn't it be supported? I don't see why not. I will take a look into it, thanks for reporting it, this might really be a bug
          Hide
          Adriano Crestani added a comment -

          Hi,

          I committed the first patch from LUCENE-2979 and it changed completely the queryparser config API.

          It seems Vinicius is ahead of schedule with this project, is that corret?! Is there anything else to do after documentation? If not, I would ask whether it's possible for you to change the way you use config API for numeric. I think it will not require a lot of change, the api is much simpler now. I can ask Phillipe to help you and explain how the new api works Otherwise, Uwe will not be able to commit your patch, since there will be many classes missing now.

          What do you think Uwe?

          Show
          Adriano Crestani added a comment - Hi, I committed the first patch from LUCENE-2979 and it changed completely the queryparser config API. It seems Vinicius is ahead of schedule with this project, is that corret?! Is there anything else to do after documentation? If not, I would ask whether it's possible for you to change the way you use config API for numeric. I think it will not require a lot of change, the api is much simpler now. I can ask Phillipe to help you and explain how the new api works Otherwise, Uwe will not be able to commit your patch, since there will be many classes missing now. What do you think Uwe?
          Hide
          Uwe Schindler added a comment -

          Hi Adriano,

          I will also review the patch today (using older trunk rev) and then respond.

          I think, Vinicius should help with updating the code to the latest config API version which was committed. This is where "community" in this project gets involved. Maybe the other GSoC student and Vinicius should work together to get this solved.

          Next week the mid-term evaluations will come, so we should have something to work on this issue except documentation for the second half.

          Show
          Uwe Schindler added a comment - Hi Adriano, I will also review the patch today (using older trunk rev) and then respond. I think, Vinicius should help with updating the code to the latest config API version which was committed. This is where "community" in this project gets involved. Maybe the other GSoC student and Vinicius should work together to get this solved. Next week the mid-term evaluations will come, so we should have something to work on this issue except documentation for the second half.
          Hide
          Uwe Schindler added a comment -

          I downgraded my trunk checkout to rev. 1142861 and applied the patch. All tests pass, so nice work until now. The problem is that I am no longer able to commit the patch, as rev. 1142862 was committed before, changing the QP config API, so how to proceed?

          Vinicius, how do you want to proceed, are you planning to rewrite the stuff related to LUCENE-2979?

          Show
          Uwe Schindler added a comment - I downgraded my trunk checkout to rev. 1142861 and applied the patch. All tests pass, so nice work until now. The problem is that I am no longer able to commit the patch, as rev. 1142862 was committed before, changing the QP config API, so how to proceed? Vinicius, how do you want to proceed, are you planning to rewrite the stuff related to LUCENE-2979 ?
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          I talked to Phillipe about the new API, it does not look complicated. I will try to do the change and submit a new patch by the end of this week, then we can finally have something checked in \o/

          Next things do to then:
          -fix config
          -write javadocs
          -write wiki/tutorial explaining how to use query parser with numeric fields

          Something else you have in mind Uwe that we can add to the above list? Maybe fix the range query in contrib queryparser which I mentioned in my past comments above. (Have you verified the bug Adriano?)

          Show
          Vinicius Barros added a comment - Hi Uwe, I talked to Phillipe about the new API, it does not look complicated. I will try to do the change and submit a new patch by the end of this week, then we can finally have something checked in \o/ Next things do to then: -fix config -write javadocs -write wiki/tutorial explaining how to use query parser with numeric fields Something else you have in mind Uwe that we can add to the above list? Maybe fix the range query in contrib queryparser which I mentioned in my past comments above. (Have you verified the bug Adriano?)
          Hide
          Uwe Schindler added a comment -

          I think the "bug" in Contrib QP is there because it's also a bug in Lucene's Core QP (the default impl just emulates core's QP). But in my opinion, it should have separate include/exclude for both limits, so a query like {1 TO 2] should behave as expected.

          The QP should in my opinion also support open-ended ranges. The syntax by NumericRangeQuery's toString() is by using stars (this is also what Solr's QP does). For numeric fields, half open ranges are important, as it supports queries like "price < 2.00 Dollar". Wasn't there not also an issue open to support other syntax for numerics like > and < operators?

          Show
          Uwe Schindler added a comment - I think the "bug" in Contrib QP is there because it's also a bug in Lucene's Core QP (the default impl just emulates core's QP). But in my opinion, it should have separate include/exclude for both limits, so a query like {1 TO 2] should behave as expected. The QP should in my opinion also support open-ended ranges. The syntax by NumericRangeQuery's toString() is by using stars (this is also what Solr's QP does). For numeric fields, half open ranges are important, as it supports queries like "price < 2.00 Dollar". Wasn't there not also an issue open to support other syntax for numerics like > and < operators?
          Hide
          Uwe Schindler added a comment - - edited

          I have some small things (maybe I will extend this comment when i review more of the code):

          • in NumericRangeQueryNodeBuilder, i would use switch statement on the NumericField.DataType and throw a UnsupportedEx if an unknown type is hit. I want to do this, because there is already an issue open to extend NumericField and NumericRangeQuery by BYTE and SHORT. If we add those types, the code gets unpredictable. With a switch statement and a exception in the defalt clause, it fails predicatable. You should also add similar "default" statement throwing exceptions in all switch statements on this enum type (in general it is good code style to do this).
          • The current code always handles Date as Long (like Solr does). I am not sure, if this would always be the best idea (see e.g. Lucene In Action book). Maybe we should add a possibility to override the conversion Date <-> Number at a central place to customize? One of the reasons to not support direkt DateRangeQueries was to make the representation of Dates as Numbers open to the user.
          • Method "NumericField.DataType getNumericDataType(Number number)": I would use instanceOf checks (as all Number subclasses are final, this is fine and correct). They should be faster than this reflection code.

          This is all for now, more may be added tomorrow.

          Show
          Uwe Schindler added a comment - - edited I have some small things (maybe I will extend this comment when i review more of the code): in NumericRangeQueryNodeBuilder, i would use switch statement on the NumericField.DataType and throw a UnsupportedEx if an unknown type is hit. I want to do this, because there is already an issue open to extend NumericField and NumericRangeQuery by BYTE and SHORT. If we add those types, the code gets unpredictable. With a switch statement and a exception in the defalt clause, it fails predicatable. You should also add similar "default" statement throwing exceptions in all switch statements on this enum type (in general it is good code style to do this). The current code always handles Date as Long (like Solr does). I am not sure, if this would always be the best idea (see e.g. Lucene In Action book). Maybe we should add a possibility to override the conversion Date <-> Number at a central place to customize? One of the reasons to not support direkt DateRangeQueries was to make the representation of Dates as Numbers open to the user. Method "NumericField.DataType getNumericDataType(Number number)": I would use instanceOf checks (as all Number subclasses are final, this is fine and correct). They should be faster than this reflection code. This is all for now, more may be added tomorrow.
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          >>>I think the "bug" in Contrib QP is there because it's also a bug in Lucene's Core QP (the default impl just emulates core's QP). But in my opinion, it should have separate include/exclude for both limits, so a query like {1 TO 2] should behave as expected.

          I think the same way, no idea why it's not supporting {1 TO 2]. I can try to fix it, but not sure how long it will take, haven't looked at the code yet. Anyway, it shouldn't be that hard, since core QueryParser is already working the way we expect, we need somehow to copy the same behavour to contrib queryparser.

          >>>in NumericRangeQueryNodeBuilder, i would use switch statement on the NumericField.DataType and throw a UnsupportedEx if an unknown type is hit. I want to do this, because there is already an issue open to extend NumericField and NumericRangeQuery by BYTE and SHORT. If we add those types, the code gets unpredictable. With a switch statement and a exception in the defalt clause, it fails predicatable. You should also add similar "default" statement throwing exceptions in all switch statements on this enum type (in general it is good code style to do this).

          Agreed! I will do the change.

          >>>The current code always handles Date as Long (like Solr does). I am not sure, if this would always be the best idea (see e.g. Lucene In Action book). Maybe we should add a possibility to override the conversion Date <-> Number at a central place to customize? One of the reasons to not support direkt DateRangeQueries was to make the representation of Dates as Numbers open to the user.

          Do you mean that Date could return some other type other than LONG? Like INT?! I could add a new parameter to NumberDateFormat that will receive NumeridField.DataType, this way the code could parse the date to the expected number type.

          >>> Method "NumericField.DataType getNumericDataType(Number number)": I would use instanceOf checks (as all Number subclasses are final, this is fine and correct). They should be faster than this reflection code.

          I will change that as well.

          Show
          Vinicius Barros added a comment - Hi Uwe, >>>I think the "bug" in Contrib QP is there because it's also a bug in Lucene's Core QP (the default impl just emulates core's QP). But in my opinion, it should have separate include/exclude for both limits, so a query like {1 TO 2] should behave as expected. I think the same way, no idea why it's not supporting {1 TO 2]. I can try to fix it, but not sure how long it will take, haven't looked at the code yet. Anyway, it shouldn't be that hard, since core QueryParser is already working the way we expect, we need somehow to copy the same behavour to contrib queryparser. >>>in NumericRangeQueryNodeBuilder, i would use switch statement on the NumericField.DataType and throw a UnsupportedEx if an unknown type is hit. I want to do this, because there is already an issue open to extend NumericField and NumericRangeQuery by BYTE and SHORT. If we add those types, the code gets unpredictable. With a switch statement and a exception in the defalt clause, it fails predicatable. You should also add similar "default" statement throwing exceptions in all switch statements on this enum type (in general it is good code style to do this). Agreed! I will do the change. >>>The current code always handles Date as Long (like Solr does). I am not sure, if this would always be the best idea (see e.g. Lucene In Action book). Maybe we should add a possibility to override the conversion Date <-> Number at a central place to customize? One of the reasons to not support direkt DateRangeQueries was to make the representation of Dates as Numbers open to the user. Do you mean that Date could return some other type other than LONG? Like INT?! I could add a new parameter to NumberDateFormat that will receive NumeridField.DataType, this way the code could parse the date to the expected number type. >>> Method "NumericField.DataType getNumericDataType(Number number)": I would use instanceOf checks (as all Number subclasses are final, this is fine and correct). They should be faster than this reflection code. I will change that as well.
          Hide
          Uwe Schindler added a comment -

          >>>The current code always handles Date as Long (like Solr does). I am not sure, if this would always be the best idea (see e.g. Lucene In Action book). Maybe we should add a possibility to override the conversion Date <-> Number at a central place to customize? One of the reasons to not support direkt DateRangeQueries was to make the representation of Dates as Numbers open to the user.

          Do you mean that Date could return some other type other than LONG? Like INT?! I could add a new parameter to NumberDateFormat that will receive NumeridField.DataType, this way the code could parse the date to the expected number type.

          In general, the formatting of a Date to a Number should be possible to customize (e.g. if you only want to format the Date without the time and your index supports only Int dates, like number of days).

          I talked to Phillipe about the new API, it does not look complicated. I will try to do the change and submit a new patch by the end of this week, then we can finally have something checked in \o/

          That communication should be done in public...

          Show
          Uwe Schindler added a comment - >>>The current code always handles Date as Long (like Solr does). I am not sure, if this would always be the best idea (see e.g. Lucene In Action book). Maybe we should add a possibility to override the conversion Date <-> Number at a central place to customize? One of the reasons to not support direkt DateRangeQueries was to make the representation of Dates as Numbers open to the user. Do you mean that Date could return some other type other than LONG? Like INT?! I could add a new parameter to NumberDateFormat that will receive NumeridField.DataType, this way the code could parse the date to the expected number type. In general, the formatting of a Date to a Number should be possible to customize (e.g. if you only want to format the Date without the time and your index supports only Int dates, like number of days). I talked to Phillipe about the new API, it does not look complicated. I will try to do the change and submit a new patch by the end of this week, then we can finally have something checked in \o/ That communication should be done in public...
          Hide
          Vinicius Barros added a comment -

          >>> In general, the formatting of a Date to a Number should be possible to customize (e.g. if you only want to format the Date without the time and your index supports only Int dates, like number of days).

          That's possible, the user just need to provide a NumberFormat that will convert String to whatever Number object the user wants.

          Show
          Vinicius Barros added a comment - >>> In general, the formatting of a Date to a Number should be possible to customize (e.g. if you only want to format the Date without the time and your index supports only Int dates, like number of days). That's possible, the user just need to provide a NumberFormat that will convert String to whatever Number object the user wants.
          Hide
          Chris Male added a comment -

          Hey,

          In the next week or so I'm going to want to move this QueryParser, along with the contents of contrib/queryparser, to the new consolidated queryparser module. Is this going to get in the way of the work here?

          Show
          Chris Male added a comment - Hey, In the next week or so I'm going to want to move this QueryParser, along with the contents of contrib/queryparser, to the new consolidated queryparser module. Is this going to get in the way of the work here?
          Hide
          Adriano Crestani added a comment -

          Hi Chris,

          If you are just moving code, I think it's fine, at max we will need to rename the paths in the patch to match the new file structure.

          Show
          Adriano Crestani added a comment - Hi Chris, If you are just moving code, I think it's fine, at max we will need to rename the paths in the patch to match the new file structure.
          Hide
          Adriano Crestani added a comment -

          For numeric fields, half open ranges are important, as it supports queries like "price < 2.00 Dollar". Wasn't there not also an issue open to support other syntax for numerics like > and < operators?

          Yes, there is, just do not recall the JIRA number now. Maybe Vinicius could try to implement it as well to fill out his task list in case he finishes his tasks before schedule, since it is also related to numeric queries. I am just not sure how much complex the task would be, I know the big change for this is in the syntax parser, which will require to know how to change javacc files.

          Show
          Adriano Crestani added a comment - For numeric fields, half open ranges are important, as it supports queries like "price < 2.00 Dollar". Wasn't there not also an issue open to support other syntax for numerics like > and < operators? Yes, there is, just do not recall the JIRA number now. Maybe Vinicius could try to implement it as well to fill out his task list in case he finishes his tasks before schedule, since it is also related to numeric queries. I am just not sure how much complex the task would be, I know the big change for this is in the syntax parser, which will require to know how to change javacc files.
          Hide
          Vinicius Barros added a comment -

          This patch includes the fix for the new config api and all changes suggested by Uwe.

          Uwe, you should be able to commit it now, I checked out the trunk and applied the patch myself and everything is working fine.

          Show
          Vinicius Barros added a comment - This patch includes the fix for the new config api and all changes suggested by Uwe. Uwe, you should be able to commit it now, I checked out the trunk and applied the patch myself and everything is working fine.
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius,
          cool thanks! Patched, works - nothing to complain about.

          I think I will commit this soon, so your partner from LUCENE-2979 can also coordinate with you (which was also committed).

          Show
          Uwe Schindler added a comment - Hi Vinicius, cool thanks! Patched, works - nothing to complain about. I think I will commit this soon, so your partner from LUCENE-2979 can also coordinate with you (which was also committed).
          Hide
          Uwe Schindler added a comment -

          Committed to Lucene trunk revision: 1144744
          You can now start and improve the documentation, tests,... Please use current trunk as basis.

          One note: I have still seen some switch statements without a default while reviewing.

          We should also decide together with LUCENE-2979 if we should backport this stuff to 3.x (once finished). Adriano?

          Show
          Uwe Schindler added a comment - Committed to Lucene trunk revision: 1144744 You can now start and improve the documentation, tests,... Please use current trunk as basis. One note: I have still seen some switch statements without a default while reviewing. We should also decide together with LUCENE-2979 if we should backport this stuff to 3.x (once finished). Adriano?
          Hide
          Uwe Schindler added a comment -

          I had to fix a javadoc warning, breaking the Jenkins Builds: rev 1144755

          Show
          Uwe Schindler added a comment - I had to fix a javadoc warning, breaking the Jenkins Builds: rev 1144755
          Hide
          Adriano Crestani added a comment -

          Hi Uwe,

          There are plans to backport LUCENE-2979 to 3.x, Phillipe is already working on it, still evaluating if it will be possible.

          I am not sure about numeric support, Vinicius changed TermRangeQueryNode inheritance, which breaks the backwards compatibility. I am not saying the change is bad, I agree with the new structure, however Vinicius will need to find another solution before backporting it to 3.x.

          Show
          Adriano Crestani added a comment - Hi Uwe, There are plans to backport LUCENE-2979 to 3.x, Phillipe is already working on it, still evaluating if it will be possible. I am not sure about numeric support, Vinicius changed TermRangeQueryNode inheritance, which breaks the backwards compatibility. I am not saying the change is bad, I agree with the new structure, however Vinicius will need to find another solution before backporting it to 3.x.
          Hide
          Uwe Schindler added a comment -

          Vinicius, do you have any plans about backporting the stuff to Lucene 3.x - it should not be that hard

          I am not sure about numeric support, Vinicius changed TermRangeQueryNode inheritance, which breaks the backwards compatibility. I am not saying the change is bad, I agree with the new structure, however Vinicius will need to find another solution before backporting it to 3.x.

          I am not sure if this is really a break when you change inheritance. If code still compiles, its no break, if classes were renamed its more serious. I am not sure, if implementation classes (and -names) should be covered by the backwards compatibility. In my opinion, mainly the configuration and interfaces of the QP must be covered by backwards policy.

          As we are now at mid-time, it would be a good idea, to maybe add some extra syntax support for numerics, like "<" and ">"? We should also add tests/support for half-open ranges, so syntax like "[* TO 1.0]" should also be supported (I am not sure, if TermRangeQueryNode supports this, but numerics should do this in all cases) - the above syntax is also printed out on NumericRangeQuery.toString(), if one of the bounds is null. The latter could be easily implemented by checking for "*" as input to the range bounds and map those special "values" to NULL. Adding support for "<" and ">" (also "<=", ">=") needs knowledge of JavaCC parser language. Vinicius, have you ever worked with JavaCC, so do you think you will be able to extend the syntax?

          Show
          Uwe Schindler added a comment - Vinicius, do you have any plans about backporting the stuff to Lucene 3.x - it should not be that hard I am not sure about numeric support, Vinicius changed TermRangeQueryNode inheritance, which breaks the backwards compatibility. I am not saying the change is bad, I agree with the new structure, however Vinicius will need to find another solution before backporting it to 3.x. I am not sure if this is really a break when you change inheritance. If code still compiles, its no break, if classes were renamed its more serious. I am not sure, if implementation classes (and -names) should be covered by the backwards compatibility. In my opinion, mainly the configuration and interfaces of the QP must be covered by backwards policy. As we are now at mid-time, it would be a good idea, to maybe add some extra syntax support for numerics, like "<" and ">"? We should also add tests/support for half-open ranges, so syntax like " [* TO 1.0] " should also be supported (I am not sure, if TermRangeQueryNode supports this, but numerics should do this in all cases) - the above syntax is also printed out on NumericRangeQuery.toString(), if one of the bounds is null. The latter could be easily implemented by checking for "*" as input to the range bounds and map those special "values" to NULL. Adding support for "<" and ">" (also "<=", ">=") needs knowledge of JavaCC parser language. Vinicius, have you ever worked with JavaCC, so do you think you will be able to extend the syntax?
          Hide
          Adriano Crestani added a comment -

          I am not sure if this is really a break when you change inheritance. If code still compiles, its no break, if classes were renamed its more serious. I am not sure, if implementation classes (and -names) should be covered by the backwards compatibility. In my opinion, mainly the configuration and interfaces of the QP must be covered by backwards policy.

          I didn't see any class renaming, I need to double check Vinicius's patches. But he did change the query node inheritance, which may affect how processors and builder (specially QueryNodeTreeBuilder) work. I am not saying it is not possible to implement his approach on 3.x, but he will need to deal differently with query nodes classes he created. As I said before, what he did is good and clean, I like the way it is, but it will break someone's code if pushed to 3.x. So if you ask me whether to push it to 3.x, I say YES, just make sure to not break the query node structure that people may be relying on.

          As we are now at mid-time, it would be a good idea, to maybe add some extra syntax support for numerics, like "<" and ">"? We should also add tests/support for half-open ranges, so syntax like "[* TO 1.0]" should also be supported (I am not sure, if TermRangeQueryNode supports this, but numerics should do this in all cases) - the above syntax is also printed out on NumericRangeQuery.toString(), if one of the bounds is null. The latter could be easily implemented by checking for "*" as input to the range bounds and map those special "values" to NULL. Adding support for "<" and ">" (also "<=", ">=") needs knowledge of JavaCC parser language. Vinicius, have you ever worked with JavaCC, so do you think you will be able to extend the syntax?

          I still need to investigate the bugs Vinicius reported (should have been created a JIRA for that already), I never really tried open ranges in contrib QP. And if Vinicius thinks he will have time and skills to do the JAVACC change to support those new operators, go for it! And remember Vinicius, you don't need to do everything during gsoc, you are always welcome to contribute code whenever you want

          Show
          Adriano Crestani added a comment - I am not sure if this is really a break when you change inheritance. If code still compiles, its no break, if classes were renamed its more serious. I am not sure, if implementation classes (and -names) should be covered by the backwards compatibility. In my opinion, mainly the configuration and interfaces of the QP must be covered by backwards policy. I didn't see any class renaming, I need to double check Vinicius's patches. But he did change the query node inheritance, which may affect how processors and builder (specially QueryNodeTreeBuilder) work. I am not saying it is not possible to implement his approach on 3.x, but he will need to deal differently with query nodes classes he created. As I said before, what he did is good and clean, I like the way it is, but it will break someone's code if pushed to 3.x. So if you ask me whether to push it to 3.x, I say YES, just make sure to not break the query node structure that people may be relying on. As we are now at mid-time, it would be a good idea, to maybe add some extra syntax support for numerics, like "<" and ">"? We should also add tests/support for half-open ranges, so syntax like " [* TO 1.0] " should also be supported (I am not sure, if TermRangeQueryNode supports this, but numerics should do this in all cases) - the above syntax is also printed out on NumericRangeQuery.toString(), if one of the bounds is null. The latter could be easily implemented by checking for "*" as input to the range bounds and map those special "values" to NULL. Adding support for "<" and ">" (also "<=", ">=") needs knowledge of JavaCC parser language. Vinicius, have you ever worked with JavaCC, so do you think you will be able to extend the syntax? I still need to investigate the bugs Vinicius reported (should have been created a JIRA for that already), I never really tried open ranges in contrib QP. And if Vinicius thinks he will have time and skills to do the JAVACC change to support those new operators, go for it! And remember Vinicius, you don't need to do everything during gsoc, you are always welcome to contribute code whenever you want
          Hide
          Vinicius Barros added a comment -

          Thanks for committing the patch Uwe!

          I will review the code again looking for switch without default case and fix it.

          I never did anything with javacc, I just quickly looked at the code, does not seem complicated, however, I have no idea how complex is to run javacc and regenerate the java files. Does lucene ant script do that automaticaly?

          I can try to fix open range queries on contrib query parser, add "<="-like operators or backport numeric support to 3.x. Just let me know the priorities and I will work on it. My suggestion is that the bug on open range queries is the most critical now, so I could start working on that. Your call Uwe.

          Show
          Vinicius Barros added a comment - Thanks for committing the patch Uwe! I will review the code again looking for switch without default case and fix it. I never did anything with javacc, I just quickly looked at the code, does not seem complicated, however, I have no idea how complex is to run javacc and regenerate the java files. Does lucene ant script do that automaticaly? I can try to fix open range queries on contrib query parser, add "<="-like operators or backport numeric support to 3.x. Just let me know the priorities and I will work on it. My suggestion is that the bug on open range queries is the most critical now, so I could start working on that. Your call Uwe.
          Hide
          Robert Muir added a comment -

          Hey guys, we had some hudson problems I wonder if you could look at:

          The first thing, the randomization of this test was not working correctly, because it used 'random' from static initializers.
          In LuceneTestCase, the random is first initialized in @beforeClass, so its not really usable before then.

          I committed this to fix the randomization: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/queryparser/src/test/org/apache/lucene/queryParser/standard/TestNumericQueryParser.java?r1=1145452&r2=1145451&pathrev=1145452

          Once I did this, then the test is working correctly (reproducible etc) but it would fail and sometimes hang.

          I took care of the hang, the reason was if the test got minIntegerDigits < 4, it would loop forever looking for a number >= 1000.
          You can see the fix here: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/queryparser/src/test/org/apache/lucene/queryParser/standard/TestNumericQueryParser.java?view=diff&r1=1145456&r2=1145457&pathrev=1145457

          But one problem still remains, for some situations (e.g. certain numbers and locales) the test is failing.

          Show
          Robert Muir added a comment - Hey guys, we had some hudson problems I wonder if you could look at: The first thing, the randomization of this test was not working correctly, because it used 'random' from static initializers. In LuceneTestCase, the random is first initialized in @beforeClass, so its not really usable before then. I committed this to fix the randomization: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/queryparser/src/test/org/apache/lucene/queryParser/standard/TestNumericQueryParser.java?r1=1145452&r2=1145451&pathrev=1145452 Once I did this, then the test is working correctly (reproducible etc) but it would fail and sometimes hang. I took care of the hang, the reason was if the test got minIntegerDigits < 4, it would loop forever looking for a number >= 1000. You can see the fix here: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/contrib/queryparser/src/test/org/apache/lucene/queryParser/standard/TestNumericQueryParser.java?view=diff&r1=1145456&r2=1145457&pathrev=1145457 But one problem still remains, for some situations (e.g. certain numbers and locales) the test is failing.
          Hide
          Vinicius Barros added a comment -

          Hi Robert,

          Thanks for fixing the problems.

          About the tests failing, do you know how I can reproduce it? When I was testing, whenever I hit a failure, ant would give me some seed values so I could set in the properties when re-running the tests. Do you have the same values?

          Show
          Vinicius Barros added a comment - Hi Robert, Thanks for fixing the problems. About the tests failing, do you know how I can reproduce it? When I was testing, whenever I hit a failure, ant would give me some seed values so I could set in the properties when re-running the tests. Do you have the same values?
          Hide
          Robert Muir added a comment -

          Here are some, from hudson:
          https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9527/testReport/junit/org.apache.lucene.queryParser.standard/TestNumericQueryParser/testSimpleNumericQuery/

          ant test -Dtestcase=TestNumericQueryParser -Dtestmethod=testSimpleNumericQuery -Dtests.seed=941414100019268099:-3819525649270784880 -Dtests.multiplier=3
          

          There is no guarantee this will work though, because when randomizing locales, the number available can vary from JRE to JRE, even changing in minor versions.

          Show
          Robert Muir added a comment - Here are some, from hudson: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9527/testReport/junit/org.apache.lucene.queryParser.standard/TestNumericQueryParser/testSimpleNumericQuery/ ant test -Dtestcase=TestNumericQueryParser -Dtestmethod=testSimpleNumericQuery -Dtests.seed=941414100019268099:-3819525649270784880 -Dtests.multiplier=3 There is no guarantee this will work though, because when randomizing locales, the number available can vary from JRE to JRE, even changing in minor versions.
          Hide
          Uwe Schindler added a comment -

          I assume the problem lies in Thai locale - as always

          Show
          Uwe Schindler added a comment - I assume the problem lies in Thai locale - as always
          Hide
          Vinicius Barros added a comment -

          Hi Robert,

          I updated my code to latest revision and I don't get any junit failure, I ran

          ant test -Dtestcase=TestNumericQueryParser -Dtestmethod=testSimpleNumericQuery -Dtests.seed=941414100019268099:-3819525649270784880 -Dtests.multiplier=3

          in contrib/queryparser and junit on eclipse with

          -Dtestcase=TestNumericQueryParser -Dtestmethod=testSimpleNumericQuery -Dtests.seed=941414100019268099:-3819525649270784880 -Dtests.multiplier=3

          None of them failed.

          Show
          Vinicius Barros added a comment - Hi Robert, I updated my code to latest revision and I don't get any junit failure, I ran ant test -Dtestcase=TestNumericQueryParser -Dtestmethod=testSimpleNumericQuery -Dtests.seed=941414100019268099:-3819525649270784880 -Dtests.multiplier=3 in contrib/queryparser and junit on eclipse with -Dtestcase=TestNumericQueryParser -Dtestmethod=testSimpleNumericQuery -Dtests.seed=941414100019268099:-3819525649270784880 -Dtests.multiplier=3 None of them failed.
          Hide
          Robert Muir added a comment -

          yeah the seed will unfortunately rely upon how many locales you have on your JRE

          So to get some seeds that work on your machine, can you try this script?

          #!/bin/bash
          x=0
          while [ $x -lt 50 ]
          do
            echo "test iteration number $x"
            (cd lucene/contrib/queryparser && ant test -Dtestcase=TestNumericQueryParser) >> /home/rmuir/log.txt 2>&1
            x=$(( $x + 1 ))
          done
          
          Show
          Robert Muir added a comment - yeah the seed will unfortunately rely upon how many locales you have on your JRE So to get some seeds that work on your machine, can you try this script? #!/bin/bash x=0 while [ $x -lt 50 ] do echo "test iteration number $x" (cd lucene/contrib/queryparser && ant test -Dtestcase=TestNumericQueryParser) >> /home/rmuir/log.txt 2>&1 x=$(( $x + 1 )) done
          Hide
          Vinicius Barros added a comment -

          Hi Robert,

          Thanks for the script, it helped me to find a case where it fails.

          On my JRE, the code below fails:

            SimpleDateFormat df = (SimpleDateFormat) DateFormat.getDateTimeInstance(
                DateFormat.FULL, DateFormat.LONG, new Locale("de_CH"));
            // most of date pattern do not include era, so we add it here. Also,
            // sometimes second is not available, we make sure it's present too
            df.applyPattern(df.toPattern() + " G s Z yyyy");
            df.setTimeZone(TimeZone.getTimeZone("America/Grand_Turk"));
            long l1 = -1881411016000l;
            long l2 = -1881411028000l;
            String d1 = df.format(new Date(l1));
            String d2 = df.format(new Date(l2));
            long newL1 = df.parse(d1).getTime();
            long newL2 = df.parse(d2).getTime();
            
            System.out.println(l1 + " => " + d1 + " => " + newL1);
            System.out.println(l2 + " => " + d2 + " => " + newL2);
            assertEquals(l1, newL1);
            assertEquals(l2, newL2);
          

          and it outputs:

          -1881411016000 => Friday, May 20, 1910 4:42:32 AM EST => -1881411448000
          -1881411028000 => Friday, May 20, 1910 4:42:20 AM EST => -1881411460000
          

          As you can see, it seems DateFormat is not very reliable and do not convert back and forth from date(long) to string format and vice-versa.

          I am planning to add a sanity check when creating randomly the date format, the check will check whether the date can be converted to/from long from/to string without data loss.

          What do you think?

          Show
          Vinicius Barros added a comment - Hi Robert, Thanks for the script, it helped me to find a case where it fails. On my JRE, the code below fails: SimpleDateFormat df = (SimpleDateFormat) DateFormat.getDateTimeInstance( DateFormat.FULL, DateFormat.LONG, new Locale( "de_CH" )); // most of date pattern do not include era, so we add it here. Also, // sometimes second is not available, we make sure it's present too df.applyPattern(df.toPattern() + " G s Z yyyy" ); df.setTimeZone(TimeZone.getTimeZone( "America/Grand_Turk" )); long l1 = -1881411016000l; long l2 = -1881411028000l; String d1 = df.format( new Date(l1)); String d2 = df.format( new Date(l2)); long newL1 = df.parse(d1).getTime(); long newL2 = df.parse(d2).getTime(); System .out.println(l1 + " => " + d1 + " => " + newL1); System .out.println(l2 + " => " + d2 + " => " + newL2); assertEquals(l1, newL1); assertEquals(l2, newL2); and it outputs: -1881411016000 => Friday, May 20, 1910 4:42:32 AM EST => -1881411448000 -1881411028000 => Friday, May 20, 1910 4:42:20 AM EST => -1881411460000 As you can see, it seems DateFormat is not very reliable and do not convert back and forth from date(long) to string format and vice-versa. I am planning to add a sanity check when creating randomly the date format, the check will check whether the date can be converted to/from long from/to string without data loss. What do you think?
          Hide
          Chris Male added a comment -
          Show
          Chris Male added a comment - Another trunk failure: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9532/
          Hide
          Chris Male added a comment -

          Also, note in LUCENE-3285 I'm about to move the flexible QP over to the new queryparser module and remove the contrib.

          Show
          Chris Male added a comment - Also, note in LUCENE-3285 I'm about to move the flexible QP over to the new queryparser module and remove the contrib.
          Hide
          Uwe Schindler added a comment -

          As you can see, it seems DateFormat is not very reliable and do not convert back and forth from date(long) to string format and vice-versa.

          I would write the test a little bit different: Use a random number as basis for the date, then convert to string and back. Throw away the orginal random number and use the "normalized" number and their string representation only?

          This should work better, as I assume SimpleDateFormat always produces same results for same numbers? But you can still add the sanity check and throw away random numbers that don't meet the requirement.

          Additionally, the millisecond part should always be 000, as you dont use miliseconds?

          Show
          Uwe Schindler added a comment - As you can see, it seems DateFormat is not very reliable and do not convert back and forth from date(long) to string format and vice-versa. I would write the test a little bit different: Use a random number as basis for the date, then convert to string and back. Throw away the orginal random number and use the "normalized" number and their string representation only? This should work better, as I assume SimpleDateFormat always produces same results for same numbers? But you can still add the sanity check and throw away random numbers that don't meet the requirement. Additionally, the millisecond part should always be 000, as you dont use miliseconds?
          Hide
          Uwe Schindler added a comment -

          I disabled the test case for now: Trunk revision: 1145885

          I would write the test a little bit different: Use a random number as basis for the date, then convert to string and back. Throw away the orginal random number and use the "normalized" number and their string representation only?

          Ah, you already did that, so the DateFormats are not really reliabe, you are right!

          Show
          Uwe Schindler added a comment - I disabled the test case for now: Trunk revision: 1145885 I would write the test a little bit different: Use a random number as basis for the date, then convert to string and back. Throw away the orginal random number and use the "normalized" number and their string representation only? Ah, you already did that, so the DateFormats are not really reliabe, you are right!
          Hide
          Vinicius Barros added a comment -

          This is the fix for the junit failure. I used the Robert's script and the test has not failed anymore.

          Show
          Vinicius Barros added a comment - This is the fix for the junit failure. I used the Robert's script and the test has not failed anymore.
          Hide
          Chris Male added a comment -

          Note, I've just moved this contrib to the queryparser module and nuked the contrib. Sorry Vinicius :/

          Show
          Chris Male added a comment - Note, I've just moved this contrib to the queryparser module and nuked the contrib. Sorry Vinicius :/
          Hide
          Adriano Crestani added a comment -

          Hey Chris,you just moved the files right?! Vinicus patch only includes a single file (the failing junit), all Uwe need to do is to apply the patch to the moved file, is that correct?!

          Show
          Adriano Crestani added a comment - Hey Chris,you just moved the files right?! Vinicus patch only includes a single file (the failing junit), all Uwe need to do is to apply the patch to the moved file, is that correct?!
          Hide
          Chris Male added a comment -

          Correct.

          The Flexible QueryParser can now be found at org.apache.lucene.queryparser.flexible.*

          Show
          Chris Male added a comment - Correct. The Flexible QueryParser can now be found at org.apache.lucene.queryparser.flexible.*
          Hide
          Uwe Schindler added a comment -

          This is the fix for the junit failure. I used the Robert's script and the test has not failed anymore.

          Thanks Vinicius,
          I will check the patch and apply it to the new location of the query parser. You don't need to do anything!

          If you have other changes or fixes in your checkout, I would recommend to create a patch first (below the queryparser folder) and then update svn and reapply the patch at the new location.

          Show
          Uwe Schindler added a comment - This is the fix for the junit failure. I used the Robert's script and the test has not failed anymore. Thanks Vinicius, I will check the patch and apply it to the new location of the query parser. You don't need to do anything! If you have other changes or fixes in your checkout, I would recommend to create a patch first (below the queryparser folder) and then update svn and reapply the patch at the new location.
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius,

          I tested your patch (after converting it to new trunk directory/package layout (thanks Chris Male

          It still fails quite often for some locales (at least in Java 6). The problem is that some locales produce date formats that are not immune to case changes. As QueryParser seems to lowercase the range bounds, some dates cannot be parsed.

          This throwed with your patch a NPE, because you implemented NumberDateFormat wrong: public Number parse(String source, ParsePosition parsePosition) is allowed to return null (it must if a parse error occurs). The same applies to date formats, but if you call getTime() on a null Date it throws NPE. So the attached patch also fixes the NumberDateFormat to handle null correctly.

          I also changed the test initialization a bit to produce sane dates from the beginning.

          I then added a toLowerCase(LOCALE) to the sanity checker

          Now the static initializer fails for:

          ant test -Dtestcase=TestNumericQueryParser -Dtestmethod=testInclusiveNumericRange -Dtests.seed=5825000776503943381:-1057095952794658416
          

          As a lowercased date cannot be parsed, this fails with ParseException. The locale is "es", so the spanisch translation of "GMT" is case sensitive:

          This parses:

          domingo 19 de agosto de 1973 11H31' GAMT AD 34 -0900 1973
          

          This not:

          domingo 19 de agosto de 1973 11h31' gamt ad 34 -0900 1973
          
          Show
          Uwe Schindler added a comment - Hi Vinicius, I tested your patch (after converting it to new trunk directory/package layout (thanks Chris Male It still fails quite often for some locales (at least in Java 6). The problem is that some locales produce date formats that are not immune to case changes. As QueryParser seems to lowercase the range bounds, some dates cannot be parsed. This throwed with your patch a NPE, because you implemented NumberDateFormat wrong: public Number parse(String source, ParsePosition parsePosition) is allowed to return null (it must if a parse error occurs). The same applies to date formats, but if you call getTime() on a null Date it throws NPE. So the attached patch also fixes the NumberDateFormat to handle null correctly. I also changed the test initialization a bit to produce sane dates from the beginning. I then added a toLowerCase(LOCALE) to the sanity checker Now the static initializer fails for: ant test -Dtestcase=TestNumericQueryParser -Dtestmethod=testInclusiveNumericRange -Dtests.seed=5825000776503943381:-1057095952794658416 As a lowercased date cannot be parsed, this fails with ParseException. The locale is "es", so the spanisch translation of "GMT" is case sensitive: This parses: domingo 19 de agosto de 1973 11H31' GAMT AD 34 -0900 1973 This not: domingo 19 de agosto de 1973 11h31' gamt ad 34 -0900 1973
          Hide
          Uwe Schindler added a comment -

          I would suggest to not lowercase range bounds for numbers/dates in the QueryParser. This makes only sense for terms, but numbers should not change case. I think we can remove the toLowerCase() in the query builder.

          Show
          Uwe Schindler added a comment - I would suggest to not lowercase range bounds for numbers/dates in the QueryParser. This makes only sense for terms, but numbers should not change case. I think we can remove the toLowerCase() in the query builder.
          Hide
          Adriano Crestani added a comment -

          I agree with Uwe. I think the format passed to NumericConfig should decide if it supports or not case.

          Show
          Adriano Crestani added a comment - I agree with Uwe. I think the format passed to NumericConfig should decide if it supports or not case.
          Hide
          Uwe Schindler added a comment -

          Attached you find the same patch, but with the numeric nodes not lowercased. The fix was easy: As the NumericQueryNodeProcessor transforms all text nodes to numeric nodes, we can simply move the LowerCaseExpandedTerms processor after those transformations, so the nodes are no longer touched.

          The attached patch simply moves the LowercaseExpandedTermsQueryNodeProcessorfilter after the NumericRangeQueryNodeProcessor.

          Tests now pass, Vinicius & Adriano, what do you think?

          Show
          Uwe Schindler added a comment - Attached you find the same patch, but with the numeric nodes not lowercased. The fix was easy: As the NumericQueryNodeProcessor transforms all text nodes to numeric nodes, we can simply move the LowerCaseExpandedTerms processor after those transformations, so the nodes are no longer touched. The attached patch simply moves the LowercaseExpandedTermsQueryNodeProcessorfilter after the NumericRangeQueryNodeProcessor. Tests now pass, Vinicius & Adriano, what do you think?
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          Thanks for fixing the junit, I think changing the order of those processors is OK, mainly if Numeric processors do no require lowercasing.

          Show
          Vinicius Barros added a comment - Hi Uwe, Thanks for fixing the junit, I think changing the order of those processors is OK, mainly if Numeric processors do no require lowercasing.
          Hide
          Uwe Schindler added a comment -

          I will commit those fixes ASAP to get it again into the nightly test runs (where this test is currently disabled). I never trust the complexity of all Locales, maybe there are more glitches which can only be found with random testing.

          Show
          Uwe Schindler added a comment - I will commit those fixes ASAP to get it again into the nightly test runs (where this test is currently disabled). I never trust the complexity of all Locales, maybe there are more glitches which can only be found with random testing.
          Hide
          Adriano Crestani added a comment -

          It sounds ok for me to change when LowercaseExpandedTermsQueryNodeProcessor gets executed.

          Show
          Adriano Crestani added a comment - It sounds ok for me to change when LowercaseExpandedTermsQueryNodeProcessor gets executed.
          Hide
          Uwe Schindler added a comment -

          Here the patch I'll commit now. I changed the initialization order of beforeClass() a little bit to make it easier to undertstand. The NumberFormat is no longer initialized inside the Date loop, as its not used there.

          I'll commit now.

          Show
          Uwe Schindler added a comment - Here the patch I'll commit now. I changed the initialization order of beforeClass() a little bit to make it easier to undertstand. The NumberFormat is no longer initialized inside the Date loop, as its not used there. I'll commit now.
          Hide
          Vinicius Barros added a comment -

          This patch includes javadocs for the changes I did so far.

          I am working right now on fixing open range bug on flexible standard query parser.

          Show
          Vinicius Barros added a comment - This patch includes javadocs for the changes I did so far. I am working right now on fixing open range bug on flexible standard query parser.
          Hide
          Vinicius Barros added a comment -

          Fixed open range and mixed include and exclude bugs on LUCENE-3338

          Show
          Vinicius Barros added a comment - Fixed open range and mixed include and exclude bugs on LUCENE-3338
          Hide
          Uwe Schindler added a comment -

          Committed the javadocs changes in rev 1150671.

          Show
          Uwe Schindler added a comment - Committed the javadocs changes in rev 1150671.
          Hide
          Vinicius Barros added a comment -

          Sorry, I have been in silence for so long time, trying to get some results, but many doubts showed up, I need to guidance here.

          First, I started working on implementing >=, <=, <, > and = operator for standard query parser. Then I later realized someone had submitted a patch for that already and I stopped working on it.

          Then I decided to implement support for date resolutions for numeric queries in query parser. I started by changing NumberDateFormat to receive a resolution parameter (DAY, SECONDS, MINUTES, etc) and this new parameter is taken into account when doing the date conversion. For that, I added a new method to do the date rounding that takes TimeZone into account, since the current round methods in DateTools do not support timezone. I was able to make it work up to this part.

          After that, I started to work on date compression, as you suggested before Uwe. For example, if the user wants a DAY resolution, NumberDateFormat should only return the number of days for the given date since 1970, not the miliseconds. For SECOND resolution, it's easy, just divide the miliseconds by 1000. For minutes the same, divide the miliseconds by 1000 * 60 and so on. However, when I got to month, I have no easy way to compress the miliseconds, I mean, no easy way to truncate the days and only get the month count since 1970. The only good solution I found was to get the number of years since 1970 and multiply by 12 plus the current month number.

          I am still wondering if we can always assume that dividing the miliseconds by 1000 (sec), 60 (minute), 60 (hour) and 24 (day) will actually be precise. What about the leap second? Not sure if the miliseconds -> (defined_resolution) and (defined_resolution) -> miliseconds will always be correct. Maybe I am missing something or overcomplicating.

          Show
          Vinicius Barros added a comment - Sorry, I have been in silence for so long time, trying to get some results, but many doubts showed up, I need to guidance here. First, I started working on implementing >=, <=, <, > and = operator for standard query parser. Then I later realized someone had submitted a patch for that already and I stopped working on it. Then I decided to implement support for date resolutions for numeric queries in query parser. I started by changing NumberDateFormat to receive a resolution parameter (DAY, SECONDS, MINUTES, etc) and this new parameter is taken into account when doing the date conversion. For that, I added a new method to do the date rounding that takes TimeZone into account, since the current round methods in DateTools do not support timezone. I was able to make it work up to this part. After that, I started to work on date compression, as you suggested before Uwe. For example, if the user wants a DAY resolution, NumberDateFormat should only return the number of days for the given date since 1970, not the miliseconds. For SECOND resolution, it's easy, just divide the miliseconds by 1000. For minutes the same, divide the miliseconds by 1000 * 60 and so on. However, when I got to month, I have no easy way to compress the miliseconds, I mean, no easy way to truncate the days and only get the month count since 1970. The only good solution I found was to get the number of years since 1970 and multiply by 12 plus the current month number. I am still wondering if we can always assume that dividing the miliseconds by 1000 (sec), 60 (minute), 60 (hour) and 24 (day) will actually be precise. What about the leap second? Not sure if the miliseconds -> (defined_resolution) and (defined_resolution) -> miliseconds will always be correct. Maybe I am missing something or overcomplicating.
          Hide
          Vinicius Barros added a comment -

          I also started working on applying the numeric support to 3x. However I am not sure about backwards compatibility there.

          The problem is that in trunk I had renamed RangeQueryNode to TermRangeQueryNode. Also, TermRangeQueryNode no longer extends ParametricRangeQueryNode, it now extends AbstractRangeQueryNode. Because of that, ParametricRangeQueryNodeProcessor returns a TermRangeQueryNode instead of RangeQueryNode. As you can see, many things the user might expect to still work the same is working completely different. I see some classes in Lucene use Version, but I don't know exactly how that works and why standard query parser do not use it. Should it?

          Not sure how I should proceed now. Should I ignore the backward compatibility and go ahead and change how everything behaves or try to make everything backward compatible (not sure how I could do that without the use of Version).

          Do you have any comments on that Uwe?

          Show
          Vinicius Barros added a comment - I also started working on applying the numeric support to 3x. However I am not sure about backwards compatibility there. The problem is that in trunk I had renamed RangeQueryNode to TermRangeQueryNode. Also, TermRangeQueryNode no longer extends ParametricRangeQueryNode, it now extends AbstractRangeQueryNode. Because of that, ParametricRangeQueryNodeProcessor returns a TermRangeQueryNode instead of RangeQueryNode. As you can see, many things the user might expect to still work the same is working completely different. I see some classes in Lucene use Version, but I don't know exactly how that works and why standard query parser do not use it. Should it? Not sure how I should proceed now. Should I ignore the backward compatibility and go ahead and change how everything behaves or try to make everything backward compatible (not sure how I could do that without the use of Version). Do you have any comments on that Uwe?
          Hide
          Uwe Schindler added a comment - - edited

          I see some classes in Lucene use Version, but I don't know exactly how that works and why standard query parser do not use it. Should it?

          Version is inteneded to be used for behavioural changes to keep index compatibility, so people can use new Lucene versions without reindexing. It does not help for API changes (it could sometimes, but only for those cases where the API changes are something like: If versionA call method a else method b, if method a or b trigger different APIs).

          Typical examples for Version are changes in tokenization (so most analyzers use it): When a bugfix in the analyzer produces different tokens than before the version flag is used to be able to enable the "buggy" behaviour, so querying your index with the "wrong" tokens still works. The core queryparser also uses it to change the behaviour of creating phrase queries (the flexible query parser is, as far as I know, still missing this).

          I am away this weekend, I will come back to you on Monday for the other questions.

          Show
          Uwe Schindler added a comment - - edited I see some classes in Lucene use Version, but I don't know exactly how that works and why standard query parser do not use it. Should it? Version is inteneded to be used for behavioural changes to keep index compatibility, so people can use new Lucene versions without reindexing. It does not help for API changes (it could sometimes, but only for those cases where the API changes are something like: If versionA call method a else method b, if method a or b trigger different APIs). Typical examples for Version are changes in tokenization (so most analyzers use it): When a bugfix in the analyzer produces different tokens than before the version flag is used to be able to enable the "buggy" behaviour, so querying your index with the "wrong" tokens still works. The core queryparser also uses it to change the behaviour of creating phrase queries (the flexible query parser is, as far as I know, still missing this). I am away this weekend, I will come back to you on Monday for the other questions.
          Hide
          Adriano Crestani added a comment -

          Good point, should flexible query parser be using version? The changes below might break users' code.

          • processor pipeline configuration changes: any change on the configuration may affect the query node tree result. If the user has created it's own query builder (or somehow extended/modified the current builder), changes to what the processor pipeline outputs might break user's code.
          • syntax parser changes: if the syntax parser starts outputting a different query node tree it may break any user processor or builder
          • whenever a builder is expecting XQueryNode and then in a later version is expects YQueryNode as input

          The question is: should we support versioning for the above changes or should we only document the changes and let the user be aware of such change?

          I think we should be very careful and only use Version when it's really required (like things that imply in index changes). So I vote for only documenting the change and also try to avoid changes that will change the code behavior.

          I see the classic QP only uses version to change the default value for certain instance attributes between 2.9/3.0 and 3.x versions. I think it's only there because 2.9 and 3.0 were required to be the same, except removed deprecations. I don't think 4.0 will have the same requirement, will it?

          Show
          Adriano Crestani added a comment - Good point, should flexible query parser be using version? The changes below might break users' code. processor pipeline configuration changes: any change on the configuration may affect the query node tree result. If the user has created it's own query builder (or somehow extended/modified the current builder), changes to what the processor pipeline outputs might break user's code. syntax parser changes: if the syntax parser starts outputting a different query node tree it may break any user processor or builder whenever a builder is expecting XQueryNode and then in a later version is expects YQueryNode as input The question is: should we support versioning for the above changes or should we only document the changes and let the user be aware of such change? I think we should be very careful and only use Version when it's really required (like things that imply in index changes). So I vote for only documenting the change and also try to avoid changes that will change the code behavior. I see the classic QP only uses version to change the default value for certain instance attributes between 2.9/3.0 and 3.x versions. I think it's only there because 2.9 and 3.0 were required to be the same, except removed deprecations. I don't think 4.0 will have the same requirement, will it?
          Hide
          Adriano Crestani added a comment -

          I took a look at the code and I have some suggestions to avoid doing to many changes:

          • for 3x, do not rename RangeQueryNode to TermRangeQueryNode, just deprecate it and document it saying it will change name in future (Uwe, can you confirm this is the right procedure for class renaming?!).
          • ParametricRangeQueryNode and AbstractRangeQueryNode have exactly the same methods, why can't they share the a new common interface? I think this will enable you to keep the same inheritance structure for RangeQueryNode, is that correct?! I would name this new interface RangeQueryNode, if there was no such class already. So I will let you pick any name for it

          I think it's a start, I haven't tried the changes, so I will let you(Vinicius) play with it. Let us know if it helped or if you hit any new problems

          Show
          Adriano Crestani added a comment - I took a look at the code and I have some suggestions to avoid doing to many changes: for 3x, do not rename RangeQueryNode to TermRangeQueryNode, just deprecate it and document it saying it will change name in future (Uwe, can you confirm this is the right procedure for class renaming?!). ParametricRangeQueryNode and AbstractRangeQueryNode have exactly the same methods, why can't they share the a new common interface? I think this will enable you to keep the same inheritance structure for RangeQueryNode, is that correct?! I would name this new interface RangeQueryNode, if there was no such class already. So I will let you pick any name for it I think it's a start, I haven't tried the changes, so I will let you(Vinicius) play with it. Let us know if it helped or if you hit any new problems
          Hide
          Uwe Schindler added a comment -

          for 3x, do not rename RangeQueryNode to TermRangeQueryNode, just deprecate it and document it saying it will change name in future (Uwe, can you confirm this is the right procedure for class renaming?!).

          I can confirm this, we do it generally a little bit different, there are two possibilities (depending on if the user will create instances of this class and pass it to the API or if the API returns the instance):

          • For the first case, you would leave the name as in trunk, but simply create an empty subclass with the old 3.x name (only copy the ctors) and deprecate this subclass.
          • For the second case, create the deprecated class as superclass of the new one (and deprecate it), copying all code to this deprecated class. The problem here is that deprecation inherits... To fix, you have to declare all methods and let them call super without deprecation.

          I agree with the rest of Adriano's comments.

          In general this is a contrib module and contrib modules don't have the strict backwards requirements like core classes. Because of this I would only provide minimal backwards layers (no sophisticated ones) and document all changes. Backwards compatibility sometimes get a pain, so we even document and break backwards in core (sometimes). We list all those breaks in the backwards section. Backwards breaks that can be fixed by recompilation are very minor and should only be documented (as drop-in-JAR replacements no longer work), so they are no issue at all.

          I see the classic QP only uses version to change the default value for certain instance attributes between 2.9/3.0 and 3.x versions. I think it's only there because 2.9 and 3.0 were required to be the same, except removed deprecations. I don't think 4.0 will have the same requirement, will it?

          4.0 breaks backwards completely with no deprecation layers. The core API of Lucene changed hard (new enum types, new structures, totally new API, change from char[] to byte[],... the list is very long). We have a MIGRATION.txt that explains all changes.

          We only add deprecation layers in 3.x core for cases where very high-level user classes are affected (e.g. IndexSearcher.search() methods, so transition is easy). All other places in Lucene that are more expert will change as described before.

          Show
          Uwe Schindler added a comment - for 3x, do not rename RangeQueryNode to TermRangeQueryNode, just deprecate it and document it saying it will change name in future (Uwe, can you confirm this is the right procedure for class renaming?!). I can confirm this, we do it generally a little bit different, there are two possibilities (depending on if the user will create instances of this class and pass it to the API or if the API returns the instance): For the first case, you would leave the name as in trunk, but simply create an empty subclass with the old 3.x name (only copy the ctors) and deprecate this subclass. For the second case, create the deprecated class as superclass of the new one (and deprecate it), copying all code to this deprecated class. The problem here is that deprecation inherits... To fix, you have to declare all methods and let them call super without deprecation. I agree with the rest of Adriano's comments. In general this is a contrib module and contrib modules don't have the strict backwards requirements like core classes. Because of this I would only provide minimal backwards layers (no sophisticated ones) and document all changes. Backwards compatibility sometimes get a pain, so we even document and break backwards in core (sometimes). We list all those breaks in the backwards section. Backwards breaks that can be fixed by recompilation are very minor and should only be documented (as drop-in-JAR replacements no longer work), so they are no issue at all. I see the classic QP only uses version to change the default value for certain instance attributes between 2.9/3.0 and 3.x versions. I think it's only there because 2.9 and 3.0 were required to be the same, except removed deprecations. I don't think 4.0 will have the same requirement, will it? 4.0 breaks backwards completely with no deprecation layers. The core API of Lucene changed hard (new enum types, new structures, totally new API, change from char[] to byte[],... the list is very long). We have a MIGRATION.txt that explains all changes. We only add deprecation layers in 3.x core for cases where very high-level user classes are affected (e.g. IndexSearcher.search() methods, so transition is easy). All other places in Lucene that are more expert will change as described before.
          Hide
          Vinicius Barros added a comment -

          Hi Uwe and Adriano,

          Thanks for the comments.

          I have been trying to follow Adriano's instructions to avoid major changes to API and behavior. However, ParametricRangeQueryNode and AbstractRangeQueryNode do not share the same methods (as Adriano said above). ParametricRangeQueryNode has lower and upper bound, which are ParametricQueryNodes, and these last ones hold the inclusive/exclusive information (they use CompareOperator for that). AbtractRangeQueryNode have lower and upper bounds (these ones defined by the template, so it could be a ParametricQueryNode, which is compatible with ParametricRangeQueryNode), however it holds the inclusive/exclusive information differently, through isLowerInclusive and isUpperInclusive methods.

          I just don't understand why ParametricQueryNode hold this CompareOperator value, I think this should be part of the ParametricRangeQueryNode and ParametricQueryNode should only be a simple value (FieldQueryNode). Any suggestions here?

          Show
          Vinicius Barros added a comment - Hi Uwe and Adriano, Thanks for the comments. I have been trying to follow Adriano's instructions to avoid major changes to API and behavior. However, ParametricRangeQueryNode and AbstractRangeQueryNode do not share the same methods (as Adriano said above). ParametricRangeQueryNode has lower and upper bound, which are ParametricQueryNodes, and these last ones hold the inclusive/exclusive information (they use CompareOperator for that). AbtractRangeQueryNode have lower and upper bounds (these ones defined by the template, so it could be a ParametricQueryNode, which is compatible with ParametricRangeQueryNode), however it holds the inclusive/exclusive information differently, through isLowerInclusive and isUpperInclusive methods. I just don't understand why ParametricQueryNode hold this CompareOperator value, I think this should be part of the ParametricRangeQueryNode and ParametricQueryNode should only be a simple value (FieldQueryNode). Any suggestions here?
          Hide
          Adriano Crestani added a comment -

          I just don't understand why ParametricQueryNode hold this CompareOperator value, I think this should be part of the ParametricRangeQueryNode and ParametricQueryNode should only be a simple value (FieldQueryNode). Any suggestions here?

          That's a looong story. And you are right, they are not compatible (my fault) and CompareOperator does not make much sense. Today, if you want, you can set a ParametricRangeQueryNode with CompareOperator.LE set in the two bounds :S. Lets take the opportunity and try to redesign it.

          I also agree that ParametricRangeQueryNode could only have FieldQueryNode as its bounds, so I think we can get rid of ParametricQueryNode (for 4.0). For now, I will suggest the following change:

          • keep using ParametricQueryNode in ParametricRangeQueryNode
          • deprecate ParametricQueryNode
          • make ParametricRangeQueryNode implement that RangeQueryNode interface I mentioned on my last comment. This interface will have isLowerInclusive, isUpperInclusive, setUpperInclusive and setLowerInclusive. For the template parameter, ParametricRangeQueryNode should use FieldQueryNode.
          • Make sure whenever the user sets lowerInclusive and upperInclusive we replicate that to the ParametricQueryNode (upper and lower bounds) setting the equivalent CompareOperator.

          I think this way we can get rid of ParametricQueryNode in future. I don't know the implications of these changes, since I haven't tried it. So let me know if you find any new conflict.

          Show
          Adriano Crestani added a comment - I just don't understand why ParametricQueryNode hold this CompareOperator value, I think this should be part of the ParametricRangeQueryNode and ParametricQueryNode should only be a simple value (FieldQueryNode). Any suggestions here? That's a looong story. And you are right, they are not compatible (my fault) and CompareOperator does not make much sense. Today, if you want, you can set a ParametricRangeQueryNode with CompareOperator.LE set in the two bounds :S. Lets take the opportunity and try to redesign it. I also agree that ParametricRangeQueryNode could only have FieldQueryNode as its bounds, so I think we can get rid of ParametricQueryNode (for 4.0). For now, I will suggest the following change: keep using ParametricQueryNode in ParametricRangeQueryNode deprecate ParametricQueryNode make ParametricRangeQueryNode implement that RangeQueryNode interface I mentioned on my last comment. This interface will have isLowerInclusive, isUpperInclusive, setUpperInclusive and setLowerInclusive. For the template parameter, ParametricRangeQueryNode should use FieldQueryNode. Make sure whenever the user sets lowerInclusive and upperInclusive we replicate that to the ParametricQueryNode (upper and lower bounds) setting the equivalent CompareOperator. I think this way we can get rid of ParametricQueryNode in future. I don't know the implications of these changes, since I haven't tried it. So let me know if you find any new conflict.
          Hide
          Vinicius Barros added a comment -

          This patches backports the numeric support to 3x.

          Uwe and Adriano, please, review it and let me know if the backports looks OK.

          Thanks!

          Show
          Vinicius Barros added a comment - This patches backports the numeric support to 3x. Uwe and Adriano, please, review it and let me know if the backports looks OK. Thanks!
          Hide
          Adriano Crestani added a comment -

          Hi Vinicius,

          Good work, the patch looks good. I saw you named the new interface RangeQueryNode as well. I don't see any problem, since they are in different packages.

          Show
          Adriano Crestani added a comment - Hi Vinicius, Good work, the patch looks good. I saw you named the new interface RangeQueryNode as well. I don't see any problem, since they are in different packages.
          Hide
          Uwe Schindler added a comment -

          I have no problems with the patch, too. We have unfortunately no real backwards tests for contribs, but I see no major problems.

          I will commit this soon and record merges from trunk, so the commits are related by mergeprops. What changes are still missing from trunk that will never be backported?

          Adriano: Can you summarize all your changes in both issues into a changes.txt entry for both branches?

          Show
          Uwe Schindler added a comment - I have no problems with the patch, too. We have unfortunately no real backwards tests for contribs, but I see no major problems. I will commit this soon and record merges from trunk, so the commits are related by mergeprops. What changes are still missing from trunk that will never be backported? Adriano: Can you summarize all your changes in both issues into a changes.txt entry for both branches?
          Hide
          Uwe Schindler added a comment -

          In 3.x we depend on Java 5, I found two interface @Override in your patch, that are not detected by Java6's javac (it's a bug there).

          Show
          Uwe Schindler added a comment - In 3.x we depend on Java 5, I found two interface @Override in your patch, that are not detected by Java6's javac (it's a bug there).
          Hide
          Uwe Schindler added a comment -

          Attached a patch that fixes the interface @Override, including all merge-props needed.

          Show
          Uwe Schindler added a comment - Attached a patch that fixes the interface @Override, including all merge-props needed.
          Hide
          Adriano Crestani added a comment -

          Adriano: Can you summarize all your changes in both issues into a changes.txt entry for both branches?

          Hi Uwe, yes, I can do it, I will need to review the entire code again (3x and trunk). I plan to do this during the weekend. But if you want, commit Vinicius's code and I commit the changes to changes.txt later

          Show
          Adriano Crestani added a comment - Adriano: Can you summarize all your changes in both issues into a changes.txt entry for both branches? Hi Uwe, yes, I can do it, I will need to review the entire code again (3x and trunk). I plan to do this during the weekend. But if you want, commit Vinicius's code and I commit the changes to changes.txt later
          Hide
          Uwe Schindler added a comment -

          OK. Of course, also Vinicius can help

          Show
          Uwe Schindler added a comment - OK. Of course, also Vinicius can help
          Hide
          Uwe Schindler added a comment -

          Committed 3.x revision: 1159411

          I keep this issue open until CHANGES.txt for trunk and 3.x is finished (including upgrade guidelines for backwards breaks).

          Show
          Uwe Schindler added a comment - Committed 3.x revision: 1159411 I keep this issue open until CHANGES.txt for trunk and 3.x is finished (including upgrade guidelines for backwards breaks).
          Hide
          Uwe Schindler added a comment -

          Adriano: Can you take this issue, because I will fly to California on Sunday and have no time next week? Of course, I will fill out the GSoC evaluation form once it is available online.

          Show
          Uwe Schindler added a comment - Adriano: Can you take this issue, because I will fly to California on Sunday and have no time next week? Of course, I will fill out the GSoC evaluation form once it is available online.
          Hide
          Uwe Schindler added a comment -

          I had to fix Javadoc warnings in 3.x revision: 1159420

          Show
          Uwe Schindler added a comment - I had to fix Javadoc warnings in 3.x revision: 1159420
          Hide
          Vinicius Barros added a comment -

          Thanks for committing the patch Uwe.

          I tried to summarize everything I have changed so far for this JIRA. I hope this helps:

          • new query node interfaces: FieldValuePairQueryNode, ValueQueryNode
          • FieldQueryNode now implements FieldValuePairQueryNode
          • two new public methods added to StandardQueryParser: setNumericConfigMap and getNumericConfigMap
          • new query builders: DummyQueryNodeBuilder, NumericRangeQueryNodeBuilder, TermRangeQueryNodeBuilder
          • removed builders (trunk only): RangeQueryNodeBuilder
          • changes to StandardQueryTreeBuilder configuration: it now uses DummyQueryNodeBuilder for NumericQueryNodes and uses NumericRangeQueryNodeBuilder for NumericRangeQueryNodes
          • changes to StandardQueryTreeBuilder configuration (trunk only): removed builder for RangeQueryNodes, as this class was removed in trunk
          • added NumberDateFormat, that helps to parse/format dates into number and index it as numeric
          • added NumericConfig, used to set the numeric configuration of a certain field in the query parser. It contains information about its numeric type, how to parse/format from/to string and its precision step.
          • added NumericFieldConfigListener that sets the corresponding NumericConfig object to the FieldConfig object
          • added NUMERIC_CONFIG and NUMERIC_CONFIG_MAP constants to StandardQueryConfigHandler, used to set numeric configuration to StandardQueryConfigHandler
          • added AbstractRangeQueryNode that is the common parent of all currently available RangeQueryNodes
          • added RangeQueryNode interface, that should be implemented by all range query nodes, such as NumericRangeQueryNode and TermRangeQueryNode
          • added NumericQueryNode, which is equivalent to FieldQueryNode, but holds a Number instead of String
          • added NumericRangeQueryNode, which is equivalent to TermRangeQueryNode, but holds NumericQueryNodes instead of ParametricQueryNodes (3x)/FieldQueryNodes(trunk)
          • added NumericRangeQueryNodeProcessor, that processes RangeQueryNodes and convert them to NumericRangeQueryNodes when its field is configured as numeric
          • added NumericQueryNodeProcessor, which converts FieldQueryNodes into NumericRangeQueryNodes (lower bound == upper bound) when the field is configured as numeric
          • (trunk only) ParametricRangeQueryNodeProcessor now creates TermRangeQueryNode instead of RangeQueryNode from ParametricRangeQueryNode
          • changed StandardQueryNodeProcessorPipeline configuration: NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor are executed right after LowercaseExpandedTermsQueryNodeProcessor
          • (3x only) deprecated RangeQueryNode, TermRangeQueryNode should be used instead
          • (3x only) deprecated ParametricQueryNode, FieldQueryNode should be used instead and the compare operators should be set using RangeQueryNode.setLowerBoundInclusive and RangeQueryNode.setUpperBoundInclusive methods
          Show
          Vinicius Barros added a comment - Thanks for committing the patch Uwe. I tried to summarize everything I have changed so far for this JIRA. I hope this helps: new query node interfaces: FieldValuePairQueryNode, ValueQueryNode FieldQueryNode now implements FieldValuePairQueryNode two new public methods added to StandardQueryParser: setNumericConfigMap and getNumericConfigMap new query builders: DummyQueryNodeBuilder, NumericRangeQueryNodeBuilder, TermRangeQueryNodeBuilder removed builders (trunk only): RangeQueryNodeBuilder changes to StandardQueryTreeBuilder configuration: it now uses DummyQueryNodeBuilder for NumericQueryNodes and uses NumericRangeQueryNodeBuilder for NumericRangeQueryNodes changes to StandardQueryTreeBuilder configuration (trunk only): removed builder for RangeQueryNodes, as this class was removed in trunk added NumberDateFormat, that helps to parse/format dates into number and index it as numeric added NumericConfig, used to set the numeric configuration of a certain field in the query parser. It contains information about its numeric type, how to parse/format from/to string and its precision step. added NumericFieldConfigListener that sets the corresponding NumericConfig object to the FieldConfig object added NUMERIC_CONFIG and NUMERIC_CONFIG_MAP constants to StandardQueryConfigHandler, used to set numeric configuration to StandardQueryConfigHandler added AbstractRangeQueryNode that is the common parent of all currently available RangeQueryNodes added RangeQueryNode interface, that should be implemented by all range query nodes, such as NumericRangeQueryNode and TermRangeQueryNode added NumericQueryNode, which is equivalent to FieldQueryNode, but holds a Number instead of String added NumericRangeQueryNode, which is equivalent to TermRangeQueryNode, but holds NumericQueryNodes instead of ParametricQueryNodes (3x)/FieldQueryNodes(trunk) added NumericRangeQueryNodeProcessor, that processes RangeQueryNodes and convert them to NumericRangeQueryNodes when its field is configured as numeric added NumericQueryNodeProcessor, which converts FieldQueryNodes into NumericRangeQueryNodes (lower bound == upper bound) when the field is configured as numeric (trunk only) ParametricRangeQueryNodeProcessor now creates TermRangeQueryNode instead of RangeQueryNode from ParametricRangeQueryNode changed StandardQueryNodeProcessorPipeline configuration: NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor are executed right after LowercaseExpandedTermsQueryNodeProcessor (3x only) deprecated RangeQueryNode, TermRangeQueryNode should be used instead (3x only) deprecated ParametricQueryNode, FieldQueryNode should be used instead and the compare operators should be set using RangeQueryNode.setLowerBoundInclusive and RangeQueryNode.setUpperBoundInclusive methods
          Hide
          Vinicius Barros added a comment -

          Right now I am reviewing the API changes I did in 3x and trying to replicate it back to trunk (removing deprecated classes, etc). I should have a new patch soon.

          Show
          Vinicius Barros added a comment - Right now I am reviewing the API changes I did in 3x and trying to replicate it back to trunk (removing deprecated classes, etc). I should have a new patch soon.
          Hide
          Adriano Crestani added a comment -

          Hi Vinicius,

          Thanks for the info, but maybe it is too much details for the changes.txt. Let me try to simplify it.

          (3.x)

          New features:
          LUCENE-1768: added support for numeric ranges in contrib query parser; added support for simple numeric queries, such as <age:4>, in contrib query parser (Vinicius Barros via Uwe Schindler)

          Changes in runtime behavior:
          LUCENE-1768: StandardQueryConfigHandler now uses NumericFieldConfigListener to set a NumericConfig to its corresponding FieldConfig; StandardQueryTreeBuilder now uses DummyQueryNodeBuilder for NumericQueryNodes and uses NumericRangeQueryNodeBuilder for NumericRangeQueryNodes; StandardQueryNodeProcessorPipeline now executes NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor right after LowercaseExpandedTermsQueryNodeProcessor; (Vinicius Barros via Uwe Schindler)

          API changes:
          LUCENE-1768: setNumericConfigMap and getNumericConfigMap were added to StandardQueryParser; ParametricRangeQueryNode and oal.queryParser.standard.nodes.RangeQueryNode now implement oal.queryParser.core.nodes.RangeQueryNode; oal.queryParser.core.nodes.RangeQueryNode was deprecated and now extends TermRangeQueryNode, which extends AbstractRangeQueryNode; ParametricQueryNode was deprecated; FieldQueryNode now implements the new FieldValueQueryNode<CharSequence>, which this last one implements FieldableQueryNode and thew new ValueQueryNode; (Vinicius Barros via Uwe Schindler)

          (trunk)

          Changes in runtime behavior:
          LUCENE-1768: StandardQueryTreeBuilder uses RangeQueryNodeBuilder for RangeQueryNodes, since theses two classes were removed; ParametricRangeQueryNodeProcessor now creates TermRangeQueryNode, instead of RangeQueryNode, from ParametricRangeQueryNode (Vinicius Barros via Uwe Schindler)

          API Changes:
          LUCENE-1768: ParametricRangeQueryNode now implements RangeQueryNode<FieldQueryNode> instead of RangeQueryNode<ParametricQueryNode> (Vinicius Barros via Uwe Schindler)

          Vinicius, please, review the summary I wrote above, I hope I could simplify correctly what you summarized. The API Changes for trunk I am assuming you are going to remove ParametricQueryNode(deprecated) and replace by FieldQueryNode in that statement.

          Uwe, I hope this helps

          Show
          Adriano Crestani added a comment - Hi Vinicius, Thanks for the info, but maybe it is too much details for the changes.txt. Let me try to simplify it. (3.x) New features: LUCENE-1768 : added support for numeric ranges in contrib query parser; added support for simple numeric queries, such as <age:4>, in contrib query parser (Vinicius Barros via Uwe Schindler) Changes in runtime behavior: LUCENE-1768 : StandardQueryConfigHandler now uses NumericFieldConfigListener to set a NumericConfig to its corresponding FieldConfig; StandardQueryTreeBuilder now uses DummyQueryNodeBuilder for NumericQueryNodes and uses NumericRangeQueryNodeBuilder for NumericRangeQueryNodes; StandardQueryNodeProcessorPipeline now executes NumericQueryNodeProcessor followed by NumericRangeQueryNodeProcessor right after LowercaseExpandedTermsQueryNodeProcessor; (Vinicius Barros via Uwe Schindler) API changes: LUCENE-1768 : setNumericConfigMap and getNumericConfigMap were added to StandardQueryParser; ParametricRangeQueryNode and oal.queryParser.standard.nodes.RangeQueryNode now implement oal.queryParser.core.nodes.RangeQueryNode; oal.queryParser.core.nodes.RangeQueryNode was deprecated and now extends TermRangeQueryNode, which extends AbstractRangeQueryNode; ParametricQueryNode was deprecated; FieldQueryNode now implements the new FieldValueQueryNode<CharSequence>, which this last one implements FieldableQueryNode and thew new ValueQueryNode; (Vinicius Barros via Uwe Schindler) (trunk) Changes in runtime behavior: LUCENE-1768 : StandardQueryTreeBuilder uses RangeQueryNodeBuilder for RangeQueryNodes, since theses two classes were removed; ParametricRangeQueryNodeProcessor now creates TermRangeQueryNode, instead of RangeQueryNode, from ParametricRangeQueryNode (Vinicius Barros via Uwe Schindler) API Changes: LUCENE-1768 : ParametricRangeQueryNode now implements RangeQueryNode<FieldQueryNode> instead of RangeQueryNode<ParametricQueryNode> (Vinicius Barros via Uwe Schindler) Vinicius, please, review the summary I wrote above, I hope I could simplify correctly what you summarized. The API Changes for trunk I am assuming you are going to remove ParametricQueryNode(deprecated) and replace by FieldQueryNode in that statement. Uwe, I hope this helps
          Hide
          Vinicius Barros added a comment -

          I have changed again trunk to reflect the latest changes on 3x.

          I also realized that ParametricRangeQueryNode and TermRangeQueryNode are basically the same, so I decided to removed ParametricRangeQueryNode and only use TermRangeQueryNode. Everything seems to be working fine. Let me know if I should not have done that for any reason.

          I have some few changes on 3x to submit yet, but I was wondering: is it necessary to deprecate a class in 3x if it's ONLY going to be removed in 4.0? Not sure if I understand how these things work yet.

          Show
          Vinicius Barros added a comment - I have changed again trunk to reflect the latest changes on 3x. I also realized that ParametricRangeQueryNode and TermRangeQueryNode are basically the same, so I decided to removed ParametricRangeQueryNode and only use TermRangeQueryNode. Everything seems to be working fine. Let me know if I should not have done that for any reason. I have some few changes on 3x to submit yet, but I was wondering: is it necessary to deprecate a class in 3x if it's ONLY going to be removed in 4.0? Not sure if I understand how these things work yet.
          Hide
          Vinicius Barros added a comment -

          Thanks Adriano, your summary looks better!

          To trunk behavior changes, I would add that "StandardSyntaxParser" no longer outputs ParametricRangeQueryNodes for range queries, instead, it outputs TermRangeQueryNodes.

          Show
          Vinicius Barros added a comment - Thanks Adriano, your summary looks better! To trunk behavior changes, I would add that "StandardSyntaxParser" no longer outputs ParametricRangeQueryNodes for range queries, instead, it outputs TermRangeQueryNodes.
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius, any news for trunk? We want to release Lucene 3.4.0, just to confirm: Are we finished with the 3.x branch? I will just quickly add the changes entry as suggested by Adriano.

          Show
          Uwe Schindler added a comment - Hi Vinicius, any news for trunk? We want to release Lucene 3.4.0, just to confirm: Are we finished with the 3.x branch? I will just quickly add the changes entry as suggested by Adriano.
          Hide
          Vinicius Barros added a comment -

          Sorry Uwe, college is taking my time again.

          Here are the latest changes, for trunk and 3x.

          Tests are passing for both.

          Show
          Vinicius Barros added a comment - Sorry Uwe, college is taking my time again. Here are the latest changes, for trunk and 3x. Tests are passing for both.
          Hide
          Uwe Schindler added a comment -

          Hi Adriano,

          thanks. For applying the patch you have to first do a svn rename:
          modules/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/processors/ParametricRangeQueryNodeProcessor.java -> modules/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/processors/TermRangeQueryNodeProcessor.java

          After doing this, the patch applies. I will verify the changes and respond back later.
          I will apply the 3.x javadoc/code changes to the 3.x and 3.4 branches.

          Show
          Uwe Schindler added a comment - Hi Adriano, thanks. For applying the patch you have to first do a svn rename: modules/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/processors/ParametricRangeQueryNodeProcessor.java -> modules/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/processors/TermRangeQueryNodeProcessor.java After doing this, the patch applies. I will verify the changes and respond back later. I will apply the 3.x javadoc/code changes to the 3.x and 3.4 branches.
          Hide
          Uwe Schindler added a comment -

          The cleanups in trunk look fine, I think we can commit them. The tests did not change so I am confident that they break nothing.

          I think we can resolve this issue after those. Nice work!

          Show
          Uwe Schindler added a comment - The cleanups in trunk look fine, I think we can commit them. The tests did not change so I am confident that they break nothing. I think we can resolve this issue after those. Nice work!
          Hide
          Uwe Schindler added a comment -

          Committed final changes to trunk revision: 1166789
          Committed Javadocs/deprecation changes + changes.txt to 3.x branch revision: 1166536; 3.4 release branch revision: 1166538

          Show
          Uwe Schindler added a comment - Committed final changes to trunk revision: 1166789 Committed Javadocs/deprecation changes + changes.txt to 3.x branch revision: 1166536; 3.4 release branch revision: 1166538
          Hide
          Uwe Schindler added a comment -

          If you have any more comments, please reopen, I think this is resolved now.

          Thanks Vinicius, good work!

          Show
          Uwe Schindler added a comment - If you have any more comments, please reopen, I think this is resolved now. Thanks Vinicius, good work!
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          That's it. No, I don't have any other changes in my workspace.

          It was nice working with you, thanks!

          Show
          Vinicius Barros added a comment - Hi Uwe, That's it. No, I don't have any other changes in my workspace. It was nice working with you, thanks!

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development