XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: tools-1.5.3
    • Fix Version/s: None
    • Component/s: Parser, Stemmer
    • Environment:
      all

      Description

      This feature request is for inclusion of list of stop words for various languages. These stop word lists can be used to reduce the noise caused by by frequent but irrelevant words, e.g. when tokenizing texts. The list could be a simple list of words for a first iteration, but could also include multi-stopwords, which will apply to n-grams (i.e. a word in the list will serve to "stop" a multi-word n-gram).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mwunderlich Martin Wunderlich
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 0.05h
                0.05h
                Remaining:
                Remaining Estimate - 0.05h
                0.05h
                Logged:
                Time Spent - Not Specified
                Not Specified