Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2519

Improve the defaults for the "text" field type in default schema.xml

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Spinoff from: http://lucene.markmail.org/thread/ww6mhfi3rfpngmc5

      The text fieldType in schema.xml is unusable for non-whitespace
      languages, because it has the dangerous auto-phrase feature (of
      Lucene's QP – see LUCENE-2458) enabled.

      Lucene leaves this off by default, as does ElasticSearch
      (http://http://www.elasticsearch.org/).

      Furthermore, the "text" fieldType uses WhitespaceTokenizer when
      StandardTokenizer is a better cross-language default.

      Until we have language specific field types, I think we should fix
      the "text" fieldType to work well for all languages, by:

      • Switching from WhitespaceTokenizer to StandardTokenizer
      • Turning off auto-phrase

        Attachments

        1. SOLR-2519.patch
          39 kB
          Michael McCandless
        2. SOLR-2519.patch
          40 kB
          Michael McCandless
        3. SOLR-2519.patch
          26 kB
          Michael McCandless
        4. SOLR-2519.patch
          7 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: