Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-367

File Encoding Issues

    XMLWordPrintableJSON

Details

    Description

      The input and output encodings are not working correctly or are not properly handled. A good example is the CoNLL 2002 data if correctly encoded in UTF-8 does not correctly work for training without specifying -Dfile.encoding=UTF-8 for the Java Command.

      We already specify the input and expected output encoding on the cmdline interface with the -encoding paramter. For some reason this isn't being followed.

      I'll work on fixing this for the next major release...

      Attachments

        1. encoding.patch
          3 kB
          James Kosin

        Activity

          People

            jkosin James Kosin
            jkosin James Kosin
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified