Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-327

Doccats bag of word feature generator should not use numbers as features

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Delivered
    • None
    • None
    • Doccat
    • None

    Description

      It turned out that Doccats bag of word feature generator can be very sensitive to numbers when used for language identification. Therefore numbers should not be included in the bag of words.

      Attachments

        Activity

          People

            joern Jörn Kottmann
            joern Jörn Kottmann
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: