Solr
  1. Solr
  2. SOLR-2013

ASCIIFoldingFilter => MappingCharFilterFactory as a mapping file

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.1
    • Component/s: None
    • Labels:
      None

      Description

      Attached is a mapping file to provide the equivalent of ASCIIFoldingFilter through the MappingCharFilterFactory.

      I'm not sure where this should go in the source tree.

      1. mapping-FoldToASCII.txt
        77 kB
        Steve Rowe
      2. mapping-FoldToASCII.txt
        77 kB
        Steve Rowe

        Issue Links

          Activity

          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Uwe Schindler added a comment -

          Bulk close after release of 3.1

          Show
          Uwe Schindler added a comment - Bulk close after release of 3.1
          Hoss Man made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Affects Version/s Next [ 12315093 ]
          Hoss Man made changes -
          Resolution Fixed [ 1 ]
          Status Closed [ 6 ] Reopened [ 4 ]
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release
          Steve Rowe made changes -
          Link This issue relates to LUCENE-2244 [ LUCENE-2244 ]
          Hide
          Steve Rowe added a comment -

          Thanks Koji.

          Show
          Steve Rowe added a comment - Thanks Koji.
          Koji Sekiguchi made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Koji Sekiguchi added a comment -

          trunk: Committed revision 991191.
          branch_3x: Committed revision 991196.

          Show
          Koji Sekiguchi added a comment - trunk: Committed revision 991191. branch_3x: Committed revision 991196.
          Koji Sekiguchi committed 991196 (55 files)
          Reviews: none

          SOLR-2013: Add mapping-FoldToASCII.txt to example conf directory

          Lucene branch_3x
          Koji Sekiguchi committed 991191 (2 files)
          Reviews: none

          SOLR-2013: Add mapping-FoldToASCII.txt to example conf directory

          Koji Sekiguchi made changes -
          Assignee Koji Sekiguchi [ koji ]
          Hide
          Koji Sekiguchi added a comment -

          I'm going to commit the attached file (w/ perl script) to example conf directory of trunk and 3.x.

          Show
          Koji Sekiguchi added a comment - I'm going to commit the attached file (w/ perl script) to example conf directory of trunk and 3.x.
          Hide
          Koji Sekiguchi added a comment -

          I think this is ready to go. Any objections?

          Show
          Koji Sekiguchi added a comment - I think this is ready to go. Any objections?
          Hide
          Steve Rowe added a comment -

          I was referring to mapping-ISOLatin1Accent.txt in the example solr/conf

          Here's a link to the version on the 3.x branch:

          http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/example/solr/conf/mapping-ISOLatin1Accent.txt?revision=940784&view=markup

          Show
          Steve Rowe added a comment - I was referring to mapping-ISOLatin1Accent.txt in the example solr/conf Here's a link to the version on the 3.x branch: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/example/solr/conf/mapping-ISOLatin1Accent.txt?revision=940784&view=markup
          Hide
          Robert Muir added a comment -

          Hi Tom: I was referring to mapping-ISOLatin1Accent.txt in the example solr/conf

          by the way, there is also a newer alternative to ASCIIFoldingFilter (but for all of Unicode), if you use the icu contrib.

          http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/icu/src/java/org/apache/lucene/analysis/icu/ICUFoldingFilter.java?view=markup

          Show
          Robert Muir added a comment - Hi Tom: I was referring to mapping-ISOLatin1Accent.txt in the example solr/conf by the way, there is also a newer alternative to ASCIIFoldingFilter (but for all of Unicode), if you use the icu contrib. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/contrib/icu/src/java/org/apache/lucene/analysis/icu/ICUFoldingFilter.java?view=markup
          Hide
          Tom Burton-West added a comment -

          Steven and Robert,

          Thanks for contributing this Steven. It is a really good idea. A map file seems much more flexible than the hard-coded case statements.

          Robert, in your comment above you mention Solr includes a mapping based on the deprecated ISOLatin1AccentFilter.
          Could you please point me to where I can find this mapping file for the deprecated ISOLatin1AccentFilter in SVN?

          Otherwise, I'll just run or adapt the perl code here and run it against the ISOLatin1AccentFilter code.

          We haven't switched to the newer ASCIIFoldingFilter and need to emulate the ISOLatin1AccentFilter in some custom non-java code until we make the switch and re-index all 6 million volumes.

          Tom Burton-West

          Show
          Tom Burton-West added a comment - Steven and Robert, Thanks for contributing this Steven. It is a really good idea. A map file seems much more flexible than the hard-coded case statements. Robert, in your comment above you mention Solr includes a mapping based on the deprecated ISOLatin1AccentFilter. Could you please point me to where I can find this mapping file for the deprecated ISOLatin1AccentFilter in SVN? Otherwise, I'll just run or adapt the perl code here and run it against the ISOLatin1AccentFilter code. We haven't switched to the newer ASCIIFoldingFilter and need to emulate the ISOLatin1AccentFilter in some custom non-java code until we make the switch and re-index all 6 million volumes. Tom Burton-West
          Steve Rowe made changes -
          Attachment mapping-FoldToASCII.txt [ 12450331 ]
          Hide
          Steve Rowe added a comment -

          Fixed a mistake in the Perl conversion script and the resulting map for FullWidth Reverse Solidus: now mapping to a single escaped backslash, rather than two of them.

          Show
          Steve Rowe added a comment - Fixed a mistake in the Perl conversion script and the resulting map for FullWidth Reverse Solidus: now mapping to a single escaped backslash, rather than two of them.
          Hide
          Robert Muir added a comment -

          This seems like a really good idea. Currently it looks like solr includes a mapping based on the deprecated ISOLatin1AccentFilter, which we really should have removed in trunk already:

          This class is included for use with existing
           * indexes and will be removed in a future release (possibly Lucene 4.0)
          
          Show
          Robert Muir added a comment - This seems like a really good idea. Currently it looks like solr includes a mapping based on the deprecated ISOLatin1AccentFilter, which we really should have removed in trunk already: This class is included for use with existing * indexes and will be removed in a future release (possibly Lucene 4.0)
          Steve Rowe made changes -
          Field Original Value New Value
          Attachment mapping-FoldToASCII.txt [ 12450330 ]
          Hide
          Steve Rowe added a comment -

          Mapping file attached.

          The Perl script used to convert the mappings in ASCIIFoldingFilter.java into the mapping file format required by MappingCharFilterFactory is included in a comment at the bottom of the file.

          Show
          Steve Rowe added a comment - Mapping file attached. The Perl script used to convert the mappings in ASCIIFoldingFilter.java into the mapping file format required by MappingCharFilterFactory is included in a comment at the bottom of the file.
          Steve Rowe created issue -

            People

            • Assignee:
              Koji Sekiguchi
              Reporter:
              Steve Rowe
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development