UIMA
  1. UIMA
  2. UIMA-1818

Provide simple mechanism to capture all CASes input to specified delegate

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.1AS
    • Component/s: Async Scaleout
    • Labels:
      None

      Description

      The existing approach to capturing CASes sent to a component is to insert a new CAS-serializer-annotator just before it in the flow, or modify the component itself to serialize CASes. Both of these approaches require modifications to existing code and/or component descriptors, are somewhat time consuming and error prone.

      A much simpler approach is to just "turn on" CAS logging for a particular component using Java properties before starting the process, or to turn CAS logging on/off for an already running process using JMX operations.

      This issue covers using Java properties to turn on CAS logging for any delegate of an asynchronous aggregate.

      CAS logging would be controlled by the following properties:

      UIMA_CASLOG_BASE_DIRECTORY - optional; this is the directory under which other directories with XmiCas files will be created. If not specified, the processes current directory will be the base.

      UIMA_CASLOG_COMPONENT_ARRAY - This is a space separated list of delegates keys. If a delegate is nested inside a co-located async aggregate, the name would include the key name of the aggregate, e.g. "someAggName/someDelName". The XmiCas files will then be written into $UIMA_CASLOG_BASE_DIRECTORY/someAggName/someDelName/

      UIMA_CASLOG_TYPE_NAME - optional; this is the name of a FeatureStructure in the CAS containing a unique string to use the name each XmiCas file. If not specified, XmiCas file name will be NNN.xmi, where NNN is the time in microseconds since the component was initialized.

      UIMA_CASLOG_FEATURE_NAME - optional unless if the TYPE_NAME is specified; this parameter gives the string feature to use. An example of type and feature names to use would be "org.apache.uima.examples.SourceDocumentInformation" and "uri".

        Activity

        Hide
        Marshall Schor added a comment -

        Sounds like a valuable debugging aide.

        Is the idea that every CAS that comes thru a particular specified annotator would be saved to the file system?

        • if so - maybe some parameter to control how many, or how frequently to sample, etc.?

        The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc. This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases.

        Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after (showing both). "before" could be the default.

        Would it be valuable to dump only the changed data (a/la "delta cas")? (possible syntax: add modifier :delta)

        It would be good if the output was consumable by the CAS Viewer, too .

        Show
        Marshall Schor added a comment - Sounds like a valuable debugging aide. Is the idea that every CAS that comes thru a particular specified annotator would be saved to the file system? if so - maybe some parameter to control how many, or how frequently to sample, etc.? The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc. This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases. Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after (showing both). "before" could be the default. Would it be valuable to dump only the changed data (a/la "delta cas")? (possible syntax: add modifier :delta) It would be good if the output was consumable by the CAS Viewer, too .
        Hide
        Eddie Epstein added a comment -

        Is the idea that every CAS that comes thru a particular specified annotator would be saved to the file system?

        Yes.

        if so - maybe some parameter to control how many, or how frequently to sample, etc.?

        Implementing JMX control to dynamically turn on/off CAS logging would accomplish this.

        The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc. This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases.

        Right. The same syntax should work for UIMA CASes. To clarify, the code to implement this is in the aggregate controller, of which there is one for UIMA AS and another for core UIMA. The UIMA AS controller only sees asynchronous delegates and visa versa for the core UIMA controller. This issue is only covering implementation for asynchronous delegates.

        Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after (showing both). "before" could be the default.

        To me, much less valuable to capture output CASes, and more complicated to implement. The main use of capturing CASes going into a delegate is to be able to later run the delegate stand-alone in a debug environment. In my case, a scaled out delegate is hanging on one or more CASes and timing out. This utility will allow one to easily capture all the CASes sent to the queue, find the problem CAS and ultimately the cause.

        Would it be valuable to dump only the changed data (a/la "delta cas")? (possible syntax: add modifier :delta)

        This sounds more appropriately handled by CAS journaling, where all CAS modifications can be attributed to specific annotators.

        It would be good if the output was consumable by the CAS Viewer, too

        Interesting. The XmiCASes will be, but only if the CAS typesystem is available. The typesystem description should be written into the directory along with the CAS files.

        Show
        Eddie Epstein added a comment - Is the idea that every CAS that comes thru a particular specified annotator would be saved to the file system? Yes. if so - maybe some parameter to control how many, or how frequently to sample, etc.? Implementing JMX control to dynamically turn on/off CAS logging would accomplish this. The "COMPONENT_ARRAY" delegate keys need the x/y/z syntax for non UIMA-AS cases - where an aggregate contains another aggregate, etc. This is already a convention in UIMA. So it would be good to just continue using it both for UIMA-AS cases and non-UIMA-AS cases. Right. The same syntax should work for UIMA CASes. To clarify, the code to implement this is in the aggregate controller, of which there is one for UIMA AS and another for core UIMA. The UIMA AS controller only sees asynchronous delegates and visa versa for the core UIMA controller. This issue is only covering implementation for asynchronous delegates. Would it be valuable to have a spec to say if the logging was to be before or after the AnalysisEnging, for each delegate? For instance, the spec could be e.g., someAggName/somePrimName:before:after (showing both). "before" could be the default. To me, much less valuable to capture output CASes, and more complicated to implement. The main use of capturing CASes going into a delegate is to be able to later run the delegate stand-alone in a debug environment. In my case, a scaled out delegate is hanging on one or more CASes and timing out. This utility will allow one to easily capture all the CASes sent to the queue, find the problem CAS and ultimately the cause. Would it be valuable to dump only the changed data (a/la "delta cas")? (possible syntax: add modifier :delta) This sounds more appropriately handled by CAS journaling, where all CAS modifications can be attributed to specific annotators. It would be good if the output was consumable by the CAS Viewer, too Interesting. The XmiCASes will be, but only if the CAS typesystem is available. The typesystem description should be written into the directory along with the CAS files.
        Hide
        Eddie Epstein added a comment -

        When accessing UIMA_CASLOG_TYPE_NAME, the featurestructure may be in a named View. Another optional specification of the view is needed: UIMA_CASLOG_VIEW_NAME.

        Show
        Eddie Epstein added a comment - When accessing UIMA_CASLOG_TYPE_NAME, the featurestructure may be in a named View. Another optional specification of the view is needed: UIMA_CASLOG_VIEW_NAME.

          People

          • Assignee:
            Eddie Epstein
            Reporter:
            Eddie Epstein
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development