Uploaded image for project: 'cTAKES'
  1. cTAKES
  2. CTAKES-449

PolarityCleartkAnalysisEngine slow for large documents

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Workaround
    • None
    • 4.0.1
    • ctakes-assertion
    • None
    • Important

    Description

      As soon as I add at the end of my pipeline the negation AE:
      aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() );

      The pipeline becomes 50-100 times slower. This likely has to do with the line:
      List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class, entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

      in AssertionCleartkAnalysisEngine. I am running the pipeline on large files (i.e. having a large number of sentences). The slowdown is caused by the code's obtaining all sentences in a document for each identified annotation.

      The full pipeline is here:
      https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java

      Attachments

        Activity

          People

            seanfinan Sean Finan
            dligach Dmitriy Dligach
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: