Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10098

Add note/link to GermanAnalyzer for decompounding nouns

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0, 8.11
    • None
    • None
    • New

    Description

      The GermanAnalyzer doesn't split compound nouns.

      Doing this requires some auxiliary data files with strange licenses. But uschindler has documented and packaged everything up to make this easy: https://github.com/uschindler/german-decompounder

      We added a Lucene API example (using CustomAnalyzer) to the README: https://github.com/uschindler/german-decompounder/pull/6

      So I think it would be nice to link to this from the javadocs, it makes it really easy to download the datafiles and configure an appropriate analyzer, if you are OK with Latex and LGPL licenses for the data files (which many folks might be).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rcmuir Robert Muir
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h