UIMA
  1. UIMA
  2. UIMA-2758

Ruta: Provide support for tree structures and parse trees in rule language

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0ruta
    • Component/s: ruta
    • Labels:
      None

      Description

      Manipulation of features which refer to annotations and matching on simple features is currently supported, but matching on the complex values of some feature is not. A first step can be something like (Type Person with feature "title" of type Annotation):

      Person.title;

      This rule matches on all annotations, which are values of features of annotations of the type Person.

      This new language element can also be used for syntactic sugar when checking primitive feature values:

      Person.begin=0 (A Person annotation, which starts a offset 0)

      This can only be a first step towards supporting tree structures. Maybe there is no way around something for explicitly and directly referring to certain annotations (which is not possible right now, but is done by using the type).

        Activity

        Hide
        Peter Klügl added a comment -

        Some support for tree strcutures is provided. If other requirements will show up , then they will be tackled in new issues...

        Show
        Peter Klügl added a comment - Some support for tree strcutures is provided. If other requirements will show up , then they will be tackled in new issues...
        Hide
        Peter Klügl added a comment -

        Right now, the feature matches are restricted to short name type references. I do not know if I will change that, because if the short name is ambiguous then the user should use type variables anyway. The IDE actually supports types with complete namespaces, but the engine doesn't yet.

        Show
        Peter Klügl added a comment - Right now, the feature matches are restricted to short name type references. I do not know if I will change that, because if the short name is ambiguous then the user should use type variables anyway. The IDE actually supports types with complete namespaces, but the engine doesn't yet.
        Hide
        Peter Klügl added a comment - - edited

        There is one thing I keep thinking about:
        How does the feature match influence the sequential matching? Or better: Is there only one reasonable interpretation of the sequential matching.

        Here's an example to talk about (the test case I am using right now to develop that stuff):

        Input text:

        Peter Kluegl, Joern Kottmann, Marshall Schor
        

        Rules:

        PACKAGE org.apache.uima.ruta;
        
        //A = full name
        //B = last name
        //C = first name
        DECLARE Annotation D(STRING ds);
        DECLARE D C(INT ci, BOOLEAN cb);
        DECLARE D B(C bc);
        DECLARE Annotation A(B ab, C ac);
        
        INT count;
        CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb" = false)} CW{ -> 
            GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")};
        C{REGEXP("M.*") -> SETFEATURE("cb", true)};
        (CW CW){-> CREATE(A, "ab" = B, "ac" = C)};
        

        So, if I write a rule like:

        (A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3);
        

        ... then on what should the wildcard (#) match?
        Right now, only the annotation, which is actually used in the sequential matching, determines the possible annotations of the next rule element. Therefore, the wildcard matched on " Kluegl, ", because "ac" is only the first name. One would maybe expect that the rule element matches on the complete name since the rule element starts with "A", which refers to the complete name. The rule itself would now create an annotation covering "Peter Kluegl, Joern Kottmann, Marshall" (missing " Schor"). Is this behavior intelligible/reasonable to others?

        Well, I can imagine that there are use cases where not the match of the feature-annotation is important, but the match of the annotation containing the feature.

        I could think of a solution introducing some operator, which enables navigation in the feature structure for different parts of a rule element, but that seems not really straight forward.

        My favorite solution would be a simple extension: Allow deep feature checks as conditions.

        (A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3});
        

        Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted as an IS condition combined with a FEATURE condition.

        Are there any opinions about this problem? I should search for some real use cases with parse trees.

        Show
        Peter Klügl added a comment - - edited There is one thing I keep thinking about: How does the feature match influence the sequential matching? Or better: Is there only one reasonable interpretation of the sequential matching. Here's an example to talk about (the test case I am using right now to develop that stuff): Input text: Peter Kluegl, Joern Kottmann, Marshall Schor Rules: PACKAGE org.apache.uima.ruta; //A = full name //B = last name //C = first name DECLARE Annotation D(STRING ds); DECLARE D C(INT ci, BOOLEAN cb); DECLARE D B(C bc); DECLARE Annotation A(B ab, C ac); INT count; CW{ -> ASSIGN(count, count+1), CREATE(C, "ds" = "firstname", "ci" = count, "cb" = false)} CW{ -> GATHER(B, "bc" = 1), FILL(B, "ds" = "lastname")}; C{REGEXP("M.*") -> SETFEATURE("cb", true)}; (CW CW){-> CREATE(A, "ab" = B, "ac" = C)}; So, if I write a rule like: (A.ac.ci==1 # A.ac.ci==2 # A.ac.ci==3); ... then on what should the wildcard (#) match? Right now, only the annotation, which is actually used in the sequential matching, determines the possible annotations of the next rule element. Therefore, the wildcard matched on " Kluegl, ", because "ac" is only the first name. One would maybe expect that the rule element matches on the complete name since the rule element starts with "A", which refers to the complete name. The rule itself would now create an annotation covering "Peter Kluegl, Joern Kottmann, Marshall" (missing " Schor"). Is this behavior intelligible/reasonable to others? Well, I can imagine that there are use cases where not the match of the feature-annotation is important, but the match of the annotation containing the feature. I could think of a solution introducing some operator, which enables navigation in the feature structure for different parts of a rule element, but that seems not really straight forward. My favorite solution would be a simple extension: Allow deep feature checks as conditions. (A{A.ac.ci==1} # A{A.ac.ci==2} # A{A.ac.ci==3}); Here, the wildcards would only match on " , ". A.ac.ci==1 could be interpeted as an IS condition combined with a FEATURE condition. Are there any opinions about this problem? I should search for some real use cases with parse trees.
        Hide
        Peter Klügl added a comment -

        I actually prefer an ambiguous operator (I have to think about a solution for the equal type/feature problem)
        Parsing works already, but the inference has still many bugs. The syntax is right now:

        PACKAGE uima.ruta.example;
        
        DECLARE Annotation A(STRING s);
        DECLARE Annotation C(A a);
        
        (W{ -> CREATE(A, "s" = "Name")} COMMA W PERIOD W PERIOD){-> CREATE(C, "a" = A)};
        
        ANY C.a{IS(A)} COMMA;
        C.a COMMA;
        C.a.s=="Name";
        
        Show
        Peter Klügl added a comment - I actually prefer an ambiguous operator (I have to think about a solution for the equal type/feature problem) Parsing works already, but the inference has still many bugs. The syntax is right now: PACKAGE uima.ruta.example; DECLARE Annotation A(STRING s); DECLARE Annotation C(A a); (W{ -> CREATE(A, "s" = "Name")} COMMA W PERIOD W PERIOD){-> CREATE(C, "a" = A)}; ANY C.a{IS(A)} COMMA; C.a COMMA; C.a.s=="Name";
        Hide
        Richard Eckart de Castilho added a comment - - edited

        It may be possible to mitigate the problem via the "aliases". Since you can import a type "my.namespace.Type" as "MyType" now, you could assume that MyType.feature always refer to a feature. You could use some quoting mechanism to keep the "." operator unambiguous (which I think is important). E.g. one could write

        'my.namespace.Type'.feature

        or

        {my.namespace.Type}.feature

        or something like that. Its probably not a good idea that the "." operator can be a package separator and a "navigate to feature value" operator at the same time. Just imagine somebody would define a type "my.namespace" with a feature "Type" and at the same time a type "my.namespace.Type"...

        Show
        Richard Eckart de Castilho added a comment - - edited It may be possible to mitigate the problem via the "aliases". Since you can import a type "my.namespace.Type" as "MyType" now, you could assume that MyType.feature always refer to a feature. You could use some quoting mechanism to keep the "." operator unambiguous (which I think is important). E.g. one could write 'my.namespace.Type'.feature or {my.namespace.Type}.feature or something like that. Its probably not a good idea that the "." operator can be a package separator and a "navigate to feature value" operator at the same time. Just imagine somebody would define a type "my.namespace" with a feature "Type" and at the same time a type "my.namespace.Type"...
        Hide
        Richard Eckart de Castilho added a comment -

        In initialize() not, but if you run a component based on CasAnnotator_ImplBase there's at least the typeSystemInit() "event".

        Show
        Richard Eckart de Castilho added a comment - In initialize() not, but if you run a component based on CasAnnotator_ImplBase there's at least the typeSystemInit() "event".
        Hide
        Peter Klügl added a comment -

        A short update:

        Annoyingly, I see no way how type references with the complete namespace and feature references can be distinguished in the grammar without having access to the list of available types. Please correct me if I am wrong, but I fear that the types of an AE cannot be computed in the initialize method (where the script should be parsed once, normally).

        I will resolve the semantics of the matching conditions now during runtime, something I was trying to avoid...

        Show
        Peter Klügl added a comment - A short update: Annoyingly, I see no way how type references with the complete namespace and feature references can be distinguished in the grammar without having access to the list of available types. Please correct me if I am wrong, but I fear that the types of an AE cannot be computed in the initialize method (where the script should be parsed once, normally). I will resolve the semantics of the matching conditions now during runtime, something I was trying to avoid...

          People

          • Assignee:
            Peter Klügl
            Reporter:
            Peter Klügl
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development