[OPENNLP-1546] NER training code example in documentation requires update - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Documentation
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0, 2.1.0, 2.2.0, 2.3.0
Fix Version/s: 2.3.3
Component/s: Documentation
Labels:
None

Description

The NER training code example needs updated.

https://opennlp.apache.org/docs/2.3.2/manual/opennlp.html#tools.namefind.training.api

The `TokenNameFinderFactory nameFinderFactory` part won't compile.
The `model.serizialize(...)` part won't compile.
This code might be outdated in general.

ObjectStream<String> lineStream =
		new PlainTextByLineStream(new MarkableFileInputStreamFactory(new File("en-ner-person.train")), StandardCharsets.UTF_8);

TokenNameFinderModel model;

try (ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream)) {
  model = NameFinderME.train("eng", "person", sampleStream, TrainingParameters.defaultParams(), nameFinderFactory);
}

try (ObjectStream modelOut = new BufferedOutputStream(new FileOutputStream(modelFile)){
  model.serialize(modelOut);
}

For reference (but not tested):

        final InputStreamFactory in = new MarkableFileInputStreamFactory(convertedTrainingFile);
        final ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(in, StandardCharsets.UTF_8));
        final TokenNameFinderModel nameFinderModel = NameFinderME.train("en", null, sampleStream, TrainingParameters.defaultParams(), TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec()));

Attachments

Activity

People

Assignee:: Martin Wiesner

Reporter:: Jeff Zemerick

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Mar/24 19:55

Updated:: 15/Apr/24 09:30

Resolved:: 15/Apr/24 09:30