Solr
  1. Solr
  2. SOLR-5800

Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.7
    • Fix Version/s: 4.7.1, 4.8, 6.0
    • Component/s: web gui
    • Labels:
      None

      Description

      I have an example in Solr In Action that uses the
      PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0.
      Specifically, the <fieldType> is:

      <fieldType name="text_microblog" class="solr.TextField"
      positionIncrementGap="100">
      <analyzer>
      <charFilter class="solr.PatternReplaceCharFilterFactory"
      pattern="([a-zA-Z])\1+"
      replacement="$1$1"/>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.WordDelimiterFilterFactory"
      generateWordParts="1"
      splitOnCaseChange="0"
      splitOnNumerics="0"
      stemEnglishPossessive="1"
      preserveOriginal="0"
      catenateWords="1"
      generateNumberParts="1"
      catenateNumbers="0"
      catenateAll="0"
      types="wdfftypes.txt"/>
      <filter class="solr.StopFilterFactory"
      ignoreCase="true"
      words="lang/stopwords_en.txt"
      />
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.KStemFilterFactory"/>
      </analyzer>
      </fieldType>

      The PatternReplaceCharFilterFactory (PRCF) is used to collapse
      repeated letters in a term down to a max of 2, such as #yummmm would
      be #yumm

      When I run some text through this analyzer using the Analysis form,
      the output is as if the resulting text is unavailable to the
      tokenizer. In other words, the only results being displayed in the
      output on the form is for the PRCF

      This example stopped working in 4.7.0 and I've verified it worked
      correctly in 4.6.1.

      Initially, I thought this might be an issue with the actual analysis,
      but the analyzer actually works when indexing / querying. Then,
      looking at the JSON response in the Developer console with Chrome, I
      see the JSON that comes back includes output for all the components in
      my chain (see below) ... so looks like a UI rendering issue to me?

      {"responseHeader":

      {"status":0,"QTime":24}

      ,"analysis":{"field_types":{"text_microblog":{"index":["org.apache.lucene.analysis.pattern.PatternReplaceCharFilter","#Yumm
      Drinking a latte at Caffe Grecco in SF's historic North Beach...
      Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad
      foo5","org.apache.lucene.analysis.core.WhitespaceTokenizer",[

      {"text":"#Yumm","raw_bytes":"[23 59 75 6d 6d]","start":0,"end":6,"position":1,"positionHistory":[1],"type":"word"}

      ,

      {"text":":)","raw_bytes":"[3a 29]","start":7,"end":9,"position":2,"positionHistory":[2],"type":"word"}

      ,

      {"text":"Drinking","raw_bytes":"[44 72 69 6e 6b 69 6e 67]","start":10,"end":18,"position":3,"positionHistory":[3],"type":"word"}

      ,

      {"text":"a","raw_bytes":"[61]","start":19,"end":20,"position":4,"positionHistory":[4],"type":"word"}

      ,{"text":"latte","raw_bytes":"[6c ...

      the JSON returned to the browser has evidence that the full analysis chain was applied, so this seems to just be a rendering issue.

      1. SOLR-5800.patch
        2 kB
        Stefan Matheis (steffkes)
      2. SOLR-5800-sample.json
        68 kB
        Stefan Matheis (steffkes)

        Issue Links

          Activity

          Hide
          Stefan Matheis (steffkes) added a comment -

          Timothy could you attach the (raw) JSON-Output as a file here? if you can, it would be good to see a before/after screenshot?

          quick guess, because it's the latest change i remember regarding the Analysis-Screen and it went into 4.7: SOLR-4612 - perhaps it works not as expected in all cases?

          Show
          Stefan Matheis (steffkes) added a comment - Timothy could you attach the (raw) JSON-Output as a file here? if you can, it would be good to see a before/after screenshot? quick guess, because it's the latest change i remember regarding the Analysis-Screen and it went into 4.7: SOLR-4612 - perhaps it works not as expected in all cases?
          Hide
          Stefan Matheis (steffkes) added a comment -

          since there was no reference i didn't realize that there was a former question on the user-list: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3CCAJt9WnjooX3mHJN-02%2BaRAK2uKn6%3DF1yaue2CBVsbKxgStTnuA%40mail.gmail.com%3E

          attaching the sample Timothy provided on this list already

          Show
          Stefan Matheis (steffkes) added a comment - since there was no reference i didn't realize that there was a former question on the user-list: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3CCAJt9WnjooX3mHJN-02%2BaRAK2uKn6%3DF1yaue2CBVsbKxgStTnuA%40mail.gmail.com%3E attaching the sample Timothy provided on this list already
          Hide
          Stefan Matheis (steffkes) added a comment -

          after a bit digging, it's clear that SOLR-4612 is responsible for the chance - to remove the empty columns, i've used the first element to distinguish how many columns the table might have .. i can of the PatternReplaceCharFilter that's only .. one.

          if i'm not mistaken, the fix should be, that we loop over all records to get the "over all" column count - working on it.

          Show
          Stefan Matheis (steffkes) added a comment - after a bit digging, it's clear that SOLR-4612 is responsible for the chance - to remove the empty columns, i've used the first element to distinguish how many columns the table might have .. i can of the PatternReplaceCharFilter that's only .. one. if i'm not mistaken, the fix should be, that we loop over all records to get the "over all" column count - working on it.
          Hide
          Stefan Matheis (steffkes) added a comment -

          Timothy Potter, Hossein Taghi-Zadeh would you mind giving this patch a try? at least your provided example works (again) and looks like expected, while still maintaing the correct column count (as initially tried on SOLR-4612)

          Show
          Stefan Matheis (steffkes) added a comment - Timothy Potter , Hossein Taghi-Zadeh would you mind giving this patch a try? at least your provided example works (again) and looks like expected, while still maintaing the correct column count (as initially tried on SOLR-4612 )
          Hide
          Hossein Taghi-Zadeh added a comment -

          Stefan Matheis (steffkes), It works for me.
          Thanks.

          Show
          Hossein Taghi-Zadeh added a comment - Stefan Matheis (steffkes) , It works for me. Thanks.
          Hide
          Stefan Matheis (steffkes) added a comment -

          Timothy Potter did you have a chance? otherwise i would commit that one tomorrow

          Show
          Stefan Matheis (steffkes) added a comment - Timothy Potter did you have a chance? otherwise i would commit that one tomorrow
          Hide
          Doug Turnbull added a comment -

          Thanks for the patch Stefan. Will this be released in a Solr 4.7.1? This is a fairly major issue for folks that depend on the analysis UI.

          Show
          Doug Turnbull added a comment - Thanks for the patch Stefan. Will this be released in a Solr 4.7.1? This is a fairly major issue for folks that depend on the analysis UI.
          Hide
          ASF subversion and git services added a comment -

          Commit 1576652 from Stefan Matheis (steffkes) in branch 'dev/trunk'
          [ https://svn.apache.org/r1576652 ]

          SOLR-5800: Admin UI - Analysis form doesn't render results correctly when a CharFilter is used

          Show
          ASF subversion and git services added a comment - Commit 1576652 from Stefan Matheis (steffkes) in branch 'dev/trunk' [ https://svn.apache.org/r1576652 ] SOLR-5800 : Admin UI - Analysis form doesn't render results correctly when a CharFilter is used
          Hide
          ASF subversion and git services added a comment -

          Commit 1576671 from Stefan Matheis (steffkes) in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1576671 ]

          SOLR-5800: Admin UI - Analysis form doesn't render results correctly when a CharFilter is used (merge r1576652)

          Show
          ASF subversion and git services added a comment - Commit 1576671 from Stefan Matheis (steffkes) in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1576671 ] SOLR-5800 : Admin UI - Analysis form doesn't render results correctly when a CharFilter is used (merge r1576652)
          Hide
          Stefan Matheis (steffkes) added a comment -

          Hey Doug, that depends a bit - 'the next available release' i'd say, might be 4.7.1 if it's needed otherwise it would be 4.8

          Show
          Stefan Matheis (steffkes) added a comment - Hey Doug, that depends a bit - 'the next available release' i'd say, might be 4.7.1 if it's needed otherwise it would be 4.8
          Hide
          ASF subversion and git services added a comment -

          Commit 1578444 from Stefan Matheis (steffkes) in branch 'dev/branches/lucene_solr_4_7'
          [ https://svn.apache.org/r1578444 ]

          SOLR-5800: Admin UI - Analysis form doesn't render results correctly when a CharFilter is used (merge r1576652)

          Show
          ASF subversion and git services added a comment - Commit 1578444 from Stefan Matheis (steffkes) in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1578444 ] SOLR-5800 : Admin UI - Analysis form doesn't render results correctly when a CharFilter is used (merge r1576652)
          Hide
          ASF subversion and git services added a comment -

          Commit 1578617 from Stefan Matheis (steffkes) in branch 'dev/branches/lucene_solr_4_7'
          [ https://svn.apache.org/r1578617 ]

          SOLR-5800, SOLR-5870: fix changes entry on lucene_solr_4_7

          Show
          ASF subversion and git services added a comment - Commit 1578617 from Stefan Matheis (steffkes) in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1578617 ] SOLR-5800 , SOLR-5870 : fix changes entry on lucene_solr_4_7
          Hide
          Steve Rowe added a comment -

          Bulk close 4.7.1 issues

          Show
          Steve Rowe added a comment - Bulk close 4.7.1 issues

            People

            • Assignee:
              Stefan Matheis (steffkes)
              Reporter:
              Timothy Potter
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development