Details
-
Documentation
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0, 2.1.0, 2.2.0, 2.3.0
-
None
Description
The NER training code example needs updated.
https://opennlp.apache.org/docs/2.3.2/manual/opennlp.html#tools.namefind.training.api
- The `TokenNameFinderFactory nameFinderFactory` part won't compile.
- The `model.serizialize(...)` part won't compile.
- This code might be outdated in general.
ObjectStream<String> lineStream = new PlainTextByLineStream(new MarkableFileInputStreamFactory(new File("en-ner-person.train")), StandardCharsets.UTF_8); TokenNameFinderModel model; try (ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream)) { model = NameFinderME.train("eng", "person", sampleStream, TrainingParameters.defaultParams(), nameFinderFactory); } try (ObjectStream modelOut = new BufferedOutputStream(new FileOutputStream(modelFile)){ model.serialize(modelOut); }
For reference (but not tested):
final InputStreamFactory in = new MarkableFileInputStreamFactory(convertedTrainingFile); final ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(in, StandardCharsets.UTF_8)); final TokenNameFinderModel nameFinderModel = NameFinderME.train("en", null, sampleStream, TrainingParameters.defaultParams(), TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec()));