OpenNLP
  1. OpenNLP
  2. OPENNLP-203

UIMA Sentence Detector Trainer builds models which do not split correctly the sentences

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tools-1.5.1-incubating
    • Fix Version/s: tools-1.5.2-incubating
    • Labels:
      None
    • Environment:

      Description

      The models trained with the UIMA component give wrong begin/end offset despite the fact they manage to split text in sentences.
      I observed that the begin of a current sentence starts including as a first token the punctuation character of the previous one while the
      previous one does not include it as its last one.

        Activity

        Nicolas Hernandez created issue -
        Nicolas Hernandez made changes -
        Field Original Value New Value
        Summary UIMA Sentence Detector Trainer build models which does not split correctly the sentences UIMA Sentence Detector Trainer builds models which do not split correctly the sentences
        Joern Kottmann made changes -
        Fix Version/s tools-1.5.2-incubating [ 12316400 ]
        Affects Version/s tools-1.5.1-incubating [ 12315983 ]
        Component/s Sentence Detector [ 12314114 ]
        Joern Kottmann made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Assignee Jörn Kottmann [ joern ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Joern Kottmann
            Reporter:
            Nicolas Hernandez
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development