Solr
  1. Solr
  2. SOLR-5200 Add REST support for reading and modifying Solr configuration
  3. SOLR-5655

Create a stopword filter factory that is (re)configurable, and capable of reporting its configuration, via REST API

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      A stopword filter factory could be (re)configurable via REST API by registering with the RESTManager described in SOLR-5653, and then responding to REST API calls to modify its init params and its stopwords resource file.

      Read-only (GET) REST API calls should also be provided, both for init params and the stopwords resource file.

      It should be possible to add/remove one or more entries in the stopwords resource file.

      We should probably use JSON for the REST request body, as is done in the Schema REST API methods.

      1. SOLR-5655.patch
        24 kB
        Steve Rowe
      2. SOLR-5655.patch
        15 kB
        Timothy Potter
      3. SOLR-5655.patch
        23 kB
        Timothy Potter
      4. SOLR-5655.patch
        23 kB
        Timothy Potter
      5. SOLR-5655.patch
        11 kB
        Timothy Potter

        Issue Links

          Activity

          Hide
          Timothy Potter added a comment -

          Depends on the patch posted for SOLR-5653.

          Deletes are implemented but not active from the REST API yet ... coming soon.

          Show
          Timothy Potter added a comment - Depends on the patch posted for SOLR-5653 . Deletes are implemented but not active from the REST API yet ... coming soon.
          Hide
          Timothy Potter added a comment -

          Should have provided some details about the API ...

          To activate, you would need to declare a filter in schema.xml as:

          <fieldType name="managed_en" class="solr.TextField" positionIncrementGap="100">
          <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory" managed="english" />
          </analyzer>
          </fieldType>

          To see the list of managed stopwords for the "english" handle:

          curl -i -v "http://localhost:8984/solr/<collection|core>/schema/analysis/stopwords/english"

          This would return a JSON object/map that looks like:

          {
          "initArgs":

          {"ignoreCase":"true"}

          ,
          "initializedOn":"2014-02-10T16:23:55.247Z",
          "managedList":[
          "a",
          "an",
          "and",
          "are",
          "as", … ] }

          To add some stop words to the set, you'd do:

          curl -v -X PUT \
          -H 'Content-type:application/json' \
          --data-binary '["foo"]' \
          'http://localhost:8984/solr/<collection|core>/schema/analysis/stopwords/english'

          You can also just get a single word, which will raise a 404 if it is not in the set:

          curl -i -v "http://localhost:8984/solr/<collection|core>/schema/analysis/stopwords/english/the"

          Lastly, just to be clear, none of the changes made by the API will be "applied" to the underlying analysis components (in this case the StopFilter) until the core is reloaded.

          Show
          Timothy Potter added a comment - Should have provided some details about the API ... To activate, you would need to declare a filter in schema.xml as: <fieldType name="managed_en" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="org.apache.solr.rest.schema.analysis.ManagedStopFilterFactory" managed="english" /> </analyzer> </fieldType> To see the list of managed stopwords for the "english" handle: curl -i -v "http://localhost:8984/solr/<collection|core>/schema/analysis/stopwords/english" This would return a JSON object/map that looks like: { "initArgs": {"ignoreCase":"true"} , "initializedOn":"2014-02-10T16:23:55.247Z", "managedList":[ "a", "an", "and", "are", "as", … ] } To add some stop words to the set, you'd do: curl -v -X PUT \ -H 'Content-type:application/json' \ --data-binary ' ["foo"] ' \ 'http://localhost:8984/solr/<collection|core>/schema/analysis/stopwords/english' You can also just get a single word, which will raise a 404 if it is not in the set: curl -i -v "http://localhost:8984/solr/<collection|core>/schema/analysis/stopwords/english/the" Lastly, just to be clear, none of the changes made by the API will be "applied" to the underlying analysis components (in this case the StopFilter) until the core is reloaded.
          Hide
          Timothy Potter added a comment -

          Updated patch to work with the changes in the latest patch for SOLR-5653

          Show
          Timothy Potter added a comment - Updated patch to work with the changes in the latest patch for SOLR-5653
          Hide
          Timothy Potter added a comment -

          Doh! Last minute change kabroke a unit test for the stop filter factory ... this latest patch fixes that.

          Show
          Timothy Potter added a comment - Doh! Last minute change kabroke a unit test for the stop filter factory ... this latest patch fixes that.
          Hide
          Timothy Potter added a comment -

          Updated patch to work with the latest patch Steve posted to SOLR-5653.

          Show
          Timothy Potter added a comment - Updated patch to work with the latest patch Steve posted to SOLR-5653 .
          Hide
          Steve Rowe added a comment -

          Looks great, Tim.

          This version of the patch adds some javadocs, and adds testing of retrieving indexed docs, especially around reloading, to demonstrate that updates aren't applied until after reload, and that once updates and a reload have occurred, newly indexed docs are affected by the updates.

          The patch also fixes RestTestHarness.reload(), which previously didn't work.

          I think it's ready to go. I'll commit to trunk shortly.

          Show
          Steve Rowe added a comment - Looks great, Tim. This version of the patch adds some javadocs, and adds testing of retrieving indexed docs, especially around reloading, to demonstrate that updates aren't applied until after reload, and that once updates and a reload have occurred, newly indexed docs are affected by the updates. The patch also fixes RestTestHarness.reload(), which previously didn't work. I think it's ready to go. I'll commit to trunk shortly.
          Hide
          ASF subversion and git services added a comment -

          Commit 1584971 from sarowe@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1584971 ]

          SOLR-5655: Create a stopword filter factory that is (re)configurable, and capable of reporting its configuration, via REST API (merged trunk r1577540)

          Show
          ASF subversion and git services added a comment - Commit 1584971 from sarowe@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1584971 ] SOLR-5655 : Create a stopword filter factory that is (re)configurable, and capable of reporting its configuration, via REST API (merged trunk r1577540)
          Hide
          Steve Rowe added a comment -

          Committed to trunk and branch_4x.

          Thanks Tim!

          Show
          Steve Rowe added a comment - Committed to trunk and branch_4x. Thanks Tim!

            People

            • Assignee:
              Steve Rowe
              Reporter:
              Steve Rowe
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development