Lucene.Net
  1. Lucene.Net
  2. LUCENENET-423

QueryParser differences between Java and .NET when parsing range queries involving dates

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g
    • Fix Version/s: Lucene.Net 3.0.3
    • Component/s: None
    • Labels:
      None

      Description

      When trying to do a RangeQuery that uses dates in a certain format, .NET behaves differently from its Java counterpart. The code is the same between them, but as far as I can tell, it appears that it is a difference in the way Java parses dates vs how .NET parses dates. To reproduce:

      var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "FullText", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
      var query = queryParser.Parse("Field:[2001-01-17 TO 2001-01-20]");
      

      You'll notice that query looks like the old DateField format (eg "0g1d64542"). If you do the same query in Java (or Luke), you'll notice the query gets parsed as if it were a RangeQuery of string. AFAIK, Java cannot parse a string formatted in that way. If you change the string to use / instead of - in the java, you'll get one that uses DateResolutions and DateTools.DateToString().

      It seems an appropriate fix for this, if we wanted to keep this behavior similar to Java, would be to write our own DateTime parser that behaved the same way to Java's parser.

        Issue Links

          Activity

          Hide
          Christopher Currens added a comment -

          Fixed with LUCENENET-478. Either it was old or ported incorrectly, but the Java version uses DateFormat.SHORT to parse the date, which has a .NET equivalent.

          It's not a perfect port, however. With java, in my locale, these two string parse just fine in java:

          "1/1/2002" and "1/1/2002jab89034jh134oijgb"

          They will both return the same date. With .NET, the latter fails. Mostly for performance reasons, I didn't want to first TryParseExact the string and then check try to emulate Java. Seems like something we could document for those who want to use DateFields in 3.x.

          If Version.LUCENE_29 or earlier is passed to the QueryParser, the old behavior where .NET parses more dates than Java does, specifically dates with dashes instead of forward slashes, for those who want the old behavior.

          Show
          Christopher Currens added a comment - Fixed with LUCENENET-478 . Either it was old or ported incorrectly, but the Java version uses DateFormat.SHORT to parse the date, which has a .NET equivalent. It's not a perfect port, however. With java, in my locale, these two string parse just fine in java: "1/1/2002" and "1/1/2002jab89034jh134oijgb" They will both return the same date. With .NET, the latter fails. Mostly for performance reasons, I didn't want to first TryParseExact the string and then check try to emulate Java. Seems like something we could document for those who want to use DateFields in 3.x. If Version.LUCENE_29 or earlier is passed to the QueryParser, the old behavior where .NET parses more dates than Java does, specifically dates with dashes instead of forward slashes, for those who want the old behavior.
          Hide
          Christopher Currens added a comment -

          For backwards compatibility, I've decided it's probably best to implement this conditionally based on the Version passed to the QueryParser's constructor.

          From LUCENE_30 and onward, it will now parse the dates similar to how java does it, using on the ShortDatePattern on the CurrentCulture's FormatInfo. The old behavior of parsing any date style will be present if any earlier version is specified.

          Show
          Christopher Currens added a comment - For backwards compatibility, I've decided it's probably best to implement this conditionally based on the Version passed to the QueryParser's constructor. From LUCENE_30 and onward, it will now parse the dates similar to how java does it, using on the ShortDatePattern on the CurrentCulture's FormatInfo. The old behavior of parsing any date style will be present if any earlier version is specified.
          Hide
          Christopher Currens added a comment -

          This is a difference of behavior between searching in Java and .NET. While .NET is more accurate about DateTime parsing, that difference in behavior seems to be against what was discussed in this email thread, regarding people's wishes for the future of the project. Much of what was agreed upon was the project having much of the same behavior of Java, in that indexes are compatible and searches in each returning the same results for the same search on the same index.

          Show
          Christopher Currens added a comment - This is a difference of behavior between searching in Java and .NET. While .NET is more accurate about DateTime parsing, that difference in behavior seems to be against what was discussed in this email thread , regarding people's wishes for the future of the project. Much of what was agreed upon was the project having much of the same behavior of Java, in that indexes are compatible and searches in each returning the same results for the same search on the same index.
          Hide
          Troy Howard added a comment -

          I don't think this is the same debate of ".NET vs Java".

          Essentially, two things should be true about any Lucene implementation: the index files are interchangeable and the same query will return the same results regardless. How it works under the hood and what the API looks like is where the debate normally focuses. There's no reason we couldn't enforce compatibility on the query parser so that it behaves the same on both platforms.

          In my opinion this is a bug that needs to be fixed, possibly in both Java Lucene and .NET Lucene... There should be a standardized method of expressing dates as strings, which is consistent across all implementations VS just using whatever the various platforms support via their datetime parser.

          Show
          Troy Howard added a comment - I don't think this is the same debate of ".NET vs Java". Essentially, two things should be true about any Lucene implementation: the index files are interchangeable and the same query will return the same results regardless. How it works under the hood and what the API looks like is where the debate normally focuses. There's no reason we couldn't enforce compatibility on the query parser so that it behaves the same on both platforms. In my opinion this is a bug that needs to be fixed, possibly in both Java Lucene and .NET Lucene... There should be a standardized method of expressing dates as strings, which is consistent across all implementations VS just using whatever the various platforms support via their datetime parser.
          Hide
          Prescott Nasser added a comment -

          I'm going to leave this as is and not do anything for 2.9.4. The .NET vs similar to Java is a bigger conversation that we definitely should have.

          Show
          Prescott Nasser added a comment - I'm going to leave this as is and not do anything for 2.9.4. The .NET vs similar to Java is a bigger conversation that we definitely should have.
          Hide
          Christopher Currens added a comment -

          I still don't know 100% how I feel about this. To be honest, I guess I don't understand the group's stance on how compatible we want to be with java. I don't want to get into it too deep here, but my understanding is we wanted to be as close to java as possible, and the differences in how .NET and Java can parse a datetime seem minor, but are just as important, in my opinion.

          The problem is that .NET actually can parse more kinds of strings into date times than Java, and that may not be a bad thing, a .NET developer may expect that, but the difference in behavior is still my main concern. Either way, I'm willing to go in whatever direction the group decides.

          Show
          Christopher Currens added a comment - I still don't know 100% how I feel about this. To be honest, I guess I don't understand the group's stance on how compatible we want to be with java. I don't want to get into it too deep here, but my understanding is we wanted to be as close to java as possible, and the differences in how .NET and Java can parse a datetime seem minor, but are just as important, in my opinion. The problem is that .NET actually can parse more kinds of strings into date times than Java, and that may not be a bad thing, a .NET developer may expect that, but the difference in behavior is still my main concern. Either way, I'm willing to go in whatever direction the group decides.
          Hide
          Prescott Nasser added a comment -

          Seems like we want to keep this as it? Should we close this?

          Show
          Prescott Nasser added a comment - Seems like we want to keep this as it? Should we close this?
          Hide
          Digy added a comment -

          I don't think there is an inconsistency between the Java version and .NET.
          If you know that the field is indexed as "date", then you should give your date-string (while searching) in the form the language can parse.
          (And both languages UIs return datetime string parseble by other libraries. It is not common that the user types the datetime string in a textbox)

          DIGY

          Show
          Digy added a comment - I don't think there is an inconsistency between the Java version and .NET. If you know that the field is indexed as "date", then you should give your date-string (while searching) in the form the language can parse. (And both languages UIs return datetime string parseble by other libraries. It is not common that the user types the datetime string in a textbox) DIGY
          Hide
          Christopher Currens added a comment -

          .NET is far better at parsing date string, but it's the inconsistency between the Java version and .NET version that I'm worried about. Search the index with one query from java and you get different results with the same query in .Net.

          How compatible do we want to be with Java?

          Show
          Christopher Currens added a comment - .NET is far better at parsing date string, but it's the inconsistency between the Java version and .NET version that I'm worried about. Search the index with one query from java and you get different results with the same query in .Net. How compatible do we want to be with Java?
          Hide
          Digy added a comment -

          You are right, I used a different date string.

          .Net seems to parse the date-strings better.
          I would leave it as is.

          DIGY

          Show
          Digy added a comment - You are right, I used a different date string. .Net seems to parse the date-strings better. I would leave it as is. DIGY
          Hide
          Digy added a comment -

          Maybe I am missing something,
          but I run your code both in .NET & Java(not Luke) and printed query.ToString().
          >>Same Result(in base36).

          DIGY

          Show
          Digy added a comment - Maybe I am missing something, but I run your code both in .NET & Java(not Luke) and printed query.ToString(). >>Same Result(in base36). DIGY

            People

            • Assignee:
              Unassigned
              Reporter:
              Christopher Currens
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development