Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2519

Improve the defaults for the "text" field type in default schema.xml

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.3, 4.0-ALPHA
    • None
    • None

    Description

      Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5

      The text fieldType in schema.xml is unusable for non-whitespace
      languages, because it has the dangerous auto-phrase feature (of
      Lucene's QP – see LUCENE-2458) enabled.

      Lucene leaves this off by default, as does ElasticSearch
      (http://http://www.elasticsearch.org/).

      Furthermore, the "text" fieldType uses WhitespaceTokenizer when
      StandardTokenizer is a better cross-language default.

      Until we have language specific field types, I think we should fix
      the "text" fieldType to work well for all languages, by:

      • Switching from WhitespaceTokenizer to StandardTokenizer
      • Turning off auto-phrase

      Attachments

        1. SOLR-2519.patch
          7 kB
          Michael McCandless
        2. SOLR-2519.patch
          26 kB
          Michael McCandless
        3. SOLR-2519.patch
          40 kB
          Michael McCandless
        4. SOLR-2519.patch
          39 kB
          Michael McCandless

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment