Details

      Description

      I've been hunting a weird bug for a long time. I finally found it's cause.
      I'm Danish, thus my .NET culture is "da-DK". In this culture "Gaard", doesn't start with "Ga" because it thinks that "aa" is "å" (in Danish it was before 1948).
      That gives some unexpected results when doing prefix queries.

      The solution is to add StringComparison.InvariantCulture in all StartsWith comparisons.

      To verify my claim, try running:

      Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo("da-DK");
      Assert.IsFalse("Gaard".StartsWith("Ga"));
      Assert.IsTrue("Gaard".StartsWith("Ga", StringComparison.InvariantCulture));

      Cheers,
      Niels Kühnel

        Activity

        Hide
        Digy added a comment -

        Hi Niels,
        I am closing this issue.
        But, feel free to reopen it if you think it is something that should be handled by Lucene.Net.

        DIGY

        Show
        Digy added a comment - Hi Niels, I am closing this issue. But, feel free to reopen it if you think it is something that should be handled by Lucene.Net. DIGY
        Hide
        Niels Kühnel added a comment -

        I think that the WildcardTermEnum.TermCompare should just compare the string chars without any special local considerations. It's confusing that the thread's culture may cause issues like this.

        If String.StartWith("...", StringComparison.InvariantCulture) was default a special tokenizer could be used if you actually wanted some special localized behavior.

        Again, thanks for looking at it.

        Show
        Niels Kühnel added a comment - I think that the WildcardTermEnum.TermCompare should just compare the string chars without any special local considerations. It's confusing that the thread's culture may cause issues like this. If String.StartWith("...", StringComparison.InvariantCulture) was default a special tokenizer could be used if you actually wanted some special localized behavior. Again, thanks for looking at it.
        Hide
        Digy added a comment -

        > because it thinks that "aa" is "å"
        In my test case, "Gaard".StartsWith("Gå") also returns false.

        I am still not sure, whether it is a Lucene.Net bug, or something that should be handled by the user.
        I'll think about it.

        DIGY

        Show
        Digy added a comment - > because it thinks that "aa" is "å" In my test case, "Gaard".StartsWith("Gå") also returns false. I am still not sure, whether it is a Lucene.Net bug, or something that should be handled by the user. I'll think about it. DIGY
        Hide
        Digy added a comment -

        The failing function seems to be
        WildcardTermEnum.TermCompare

        DIGY

        Show
        Digy added a comment - The failing function seems to be WildcardTermEnum.TermCompare DIGY
        Hide
        Niels Kühnel added a comment -

        Exactly. Thanks for the quick reply btw It works if I set the thread's culture to en-US but not with da-DK

        Show
        Niels Kühnel added a comment - Exactly. Thanks for the quick reply btw It works if I set the thread's culture to en-US but not with da-DK
        Hide
        Digy added a comment -

        And you are using WildcardQuery like sometext* ?
        DIGY

        Show
        Digy added a comment - And you are using WildcardQuery like sometext* ? DIGY
        Hide
        Niels Kühnel added a comment -

        Whitespace+lowercase. "gaard".StartsWith("ga") returns false for the da-DK culture.

        Show
        Niels Kühnel added a comment - Whitespace+lowercase. "gaard".StartsWith("ga") returns false for the da-DK culture.
        Hide
        Digy added a comment -

        Which analyzer do you use?
        DIGY

        Show
        Digy added a comment - Which analyzer do you use? DIGY

          People

          • Assignee:
            Unassigned
            Reporter:
            Niels Kühnel
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 4h
              4h
              Remaining:
              Remaining Estimate - 4h
              4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development