[OPENNLP-203] UIMA Sentence Detector Trainer builds models which do not split correctly the sentences - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: tools-1.5.1-incubating
Fix Version/s: tools-1.5.2-incubating
Component/s: Sentence Detector, UIMA Integration
Labels:
None
Environment:

Hide
OS
Linux version 2.6.32-30-generic (buildd@vernadsky) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #59-Ubuntu SMP Tue Mar 1 21:30:21 UTC 2011

JVM
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

Show
OS Linux version 2.6.32-30-generic ( buildd@vernadsky ) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #59-Ubuntu SMP Tue Mar 1 21:30:21 UTC 2011 JVM java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

Description

The models trained with the UIMA component give wrong begin/end offset despite the fact they manage to split text in sentences.
I observed that the begin of a current sentence starts including as a first token the punctuation character of the previous one while the
previous one does not include it as its last one.

Attachments

Activity

People

Assignee:: Jörn Kottmann

Reporter:: Nicolas Hernandez

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 22/Jun/11 10:02

Updated:: 06/Jul/11 09:20

Resolved:: 06/Jul/11 09:20