Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1654

Reset cTAKES CAS into CTAKESParser

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.10
    • Component/s: parser
    • Labels:
    • Flags:
      Patch

      Description

      Using CTAKESParser from Tika Server, I noticed that an exception occurs when the CTAKESParser is used multiple times:

      org.apache.uima.cas.CASRuntimeException: Data for Sofa feature setLocalSofaData() has already been set.
      

      This is due to the CAS (Common Analysis System) used by CTAKESParser. The CAS, as the AE (AnalysisEngine), is a static field into CTAKESParser to make a sort of singleton.

      By the way, An Analysis Engine is a cTAKES/UIMA component responsible for analyzing unstructured information, discovering and representing semantic content. An AnalysisEngine operates on an "analysis structure" (implemented by CAS).

      It is highly recommended to reuse the CAS, but it has to be reset before the next run. The CTAKESUtils class (org.apache.tika.parser.ctakes) provides the reset method to release all resources held by both AnalysisEngine and CAS and then "destroy" them. This method prevents the CASRuntimeException error.

      You can find in attachment the patch including two new methods (resetCAS and resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine respectively.
      By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of building them again for each run.

        Attachments

        1. TIKA-1654.patch
          4 kB
          Giuseppe Totaro
        2. TIKA-1654.v02.patch
          24 kB
          Giuseppe Totaro

          Issue Links

            Activity

              People

              • Assignee:
                gostep Giuseppe Totaro
                Reporter:
                gostep Giuseppe Totaro
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: