Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-367

File Encoding Issues

    XMLWordPrintableJSON

    Details

      Description

      The input and output encodings are not working correctly or are not properly handled. A good example is the CoNLL 2002 data if correctly encoded in UTF-8 does not correctly work for training without specifying -Dfile.encoding=UTF-8 for the Java Command.

      We already specify the input and expected output encoding on the cmdline interface with the -encoding paramter. For some reason this isn't being followed.

      I'll work on fixing this for the next major release...

        Attachments

        1. encoding.patch
          3 kB
          James Kosin

          Activity

            People

            • Assignee:
              jkosin James Kosin
              Reporter:
              jkosin James Kosin
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 672h
                672h
                Remaining:
                Remaining Estimate - 672h
                672h
                Logged:
                Time Spent - Not Specified
                Not Specified