Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4293

ArabicRootsAnalyzer

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      ArabicRootsAnalyzer is using an index of Arabic terms associated with its roots. each Arabic word has a root. There is no automatic way of deciding the root.

      This Analyzer will match any term with its root, searching/indexing will be based on roots. It gives me great results in my application.

      attached all the required files with the db. the problem with it is the size of the db (16MB). number of terms is around 300,000. I have another db with 600,000 but the attached one is summarized and better i believe.

        Attachments

        1. rootsTableIndex.zip
          4.19 MB
          Ibrahim
        2. ArabicTokens.txt
          145 kB
          Ibrahim
        3. ArabicTokenizer.java
          3 kB
          Ibrahim
        4. ArabicRootsAnalyzer.java
          1 kB
          Ibrahim
        5. ArabicRootFilter.java
          1 kB
          Ibrahim

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ibrahim Ibrahim
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: