Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-180

[PATCH] Language guesser contribution

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • core/other
    • None
    • Operating System: other
      Platform: Other

    • 26763

    Description

      Hello,

      I'd like to contribute this language guesser to Lucene.

      It contains language guessing interfaces and classes as well as trigram
      specific classes and some language reference files I generated myself using the
      trigram file generation utily in there. I included a unit test as well.

      I didn't do any extensive tests on guessing quality and performance but I would
      tend to think that they are both OK for a first pass.

      I thought about writing a custom Analyzer for this but realized that this
      wouldn't be the way to go and that probably the language decision should be
      left to the developper, definitely when the Analyzer is used to tokenize a
      query.

      Have fun,

      Jean-François Halleux

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--LanguageGuesser.zip
          450 kB
          Jean-François Halleux
        2. ASF.LICENSE.NOT.GRANTED--tlg.zip
          578 kB
          Jean-François Halleux

        Activity

          People

            Unassigned Unassigned
            halleux.jf@skynet.be Jean-François Halleux
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: