Commons Digester
  1. Commons Digester
  2. DIGESTER-101

Pass namespace URI and name args to the Rule methods

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      As an enhancement to the current Digester API, I propose to add new variants to
      the Rule methods begin(), body() and end(). Those new variants would accept the
      namespace URI and local/qualified name of the current element.

      As Rule is not an interface, but an abstract class, this change can be done
      without hurting backwards compatibility. It will loosen the coupling between
      Rule implementations and Digester, as the rule would get sufficient information
      about the current element, without needing to call
      Digester.getCurrentElementName(). In addition, it provides access to the current
      namespace URI, which is not possible with the current version of Digester.

      I'll be attaching a patch that implements these changes. This involves changes
      to Rule.java, Digester.java and BeanPropertySetterRule.java. The latter change
      is not totally necessary, but rather demonstrates the benefits of the proposed
      API. All other Rule implementations could be changed later to remove the plenty
      deprecation warnings.

      [I've posted this patch to the commons-dev list before, I'm just adding it here
      so it doesn't get lost]

        Activity

        Hide
        Christopher Lenz added a comment -

        Created an attachment (id=3235)
        Unified diff containing the changes described above

        Show
        Christopher Lenz added a comment - Created an attachment (id=3235) Unified diff containing the changes described above
        Hide
        Emmanuel Bourg added a comment -

        I don't think this is the best way to get info about the parsed element. With
        this patch a rule will retrieve its state from 2 differents sources :

        • the digester's parameter stack
        • the arguments of begin()/body()/end() with the tag name and the URI

        I believe it would be cleaner to widen the concept of parameter stack to a rule
        state stack. A rule state object could contain :

        • an org.w3c.dom.Element object providing the element name, its URI and
          attributes (we don't need a complete implementation of the interface)
        • a Map containing properties (the parameter array and the parent rule could fit
          here)
        Show
        Emmanuel Bourg added a comment - I don't think this is the best way to get info about the parsed element. With this patch a rule will retrieve its state from 2 differents sources : the digester's parameter stack the arguments of begin()/body()/end() with the tag name and the URI I believe it would be cleaner to widen the concept of parameter stack to a rule state stack. A rule state object could contain : an org.w3c.dom.Element object providing the element name, its URI and attributes (we don't need a complete implementation of the interface) a Map containing properties (the parameter array and the parent rule could fit here)
        Hide
        Christopher Lenz added a comment -

        First of all, thanks for looking at this

        However, I don't agree with your analysis.

        The parameter stack is (IMHO) a kind of hackish way of enabling collaboration
        between CallMethodRule and CallParamRule. It has nothing to do with the state of
        a rule in general (the state of a rule should always be stored in it's own
        instance variables). A better way to enable rule collaboration is out of the
        scope of this bug, however .

        Second, when we talk about rule state, we need to differentiate between state
        regarding the input (i.e. the context in the XML document being parsed) and the
        output (i.e. the object and parameter stacks). Originally, the rules got
        information about the input state through the method args, like in
        begin(Attributes) and body(bodyText). Later in the development of Digester the
        need has emerged to also access the current element's name, specifically in the
        BeanPropertySetterRule. At that time, changing the Rule class was avoided, and
        the immediate need was fulfilled by directly accessing the "match" instance
        variable of Digester and extracting the last segment. Digester does provide the
        method getCurrentElementName() that does the same thing, but this method is also
        a result of Rule not having the interface I propose here. Finally, as namespace
        support was added relatively late in the process, their is as of today no way
        for a Rule to get the current namespace URI.

        There are two options to achieve giving the rules more information about the
        input document context:
        1. Let Digester maintain the context using a stack (Element or a more
        lightweight class, whatever), and have the rule access that stack, or
        2. Pass information about the current document context into the begin(), body()
        and end() methods of the rule.

        I see a lot of advantages about approach number 2.

        • 90% of the current rules in Digester don't need information about the input
          context, so maintaining the Element stack would be a total waste of resources.
        • The Rule methods begin(), body() and end() themselves, and their current
          arguments (Attributes and bodyText), already inform the rule about the input
          context.
        • Rule implementations are more decoupled from Digester.
        • Arguably, a rule shouldn't have access to the full input context (i.e. the
          elements under the top element in the stack), it should only need to access
          information about the element it was matched on
        • As said before, the class Digester is already bloated

        Sorry for the long explanation, I hope it is understandable.

        Show
        Christopher Lenz added a comment - First of all, thanks for looking at this However, I don't agree with your analysis. The parameter stack is (IMHO) a kind of hackish way of enabling collaboration between CallMethodRule and CallParamRule. It has nothing to do with the state of a rule in general (the state of a rule should always be stored in it's own instance variables). A better way to enable rule collaboration is out of the scope of this bug, however . Second, when we talk about rule state, we need to differentiate between state regarding the input (i.e. the context in the XML document being parsed) and the output (i.e. the object and parameter stacks). Originally, the rules got information about the input state through the method args, like in begin(Attributes) and body(bodyText). Later in the development of Digester the need has emerged to also access the current element's name, specifically in the BeanPropertySetterRule. At that time, changing the Rule class was avoided, and the immediate need was fulfilled by directly accessing the "match" instance variable of Digester and extracting the last segment. Digester does provide the method getCurrentElementName() that does the same thing, but this method is also a result of Rule not having the interface I propose here. Finally, as namespace support was added relatively late in the process, their is as of today no way for a Rule to get the current namespace URI. There are two options to achieve giving the rules more information about the input document context: 1. Let Digester maintain the context using a stack (Element or a more lightweight class, whatever), and have the rule access that stack, or 2. Pass information about the current document context into the begin(), body() and end() methods of the rule. I see a lot of advantages about approach number 2. 90% of the current rules in Digester don't need information about the input context, so maintaining the Element stack would be a total waste of resources. The Rule methods begin(), body() and end() themselves, and their current arguments (Attributes and bodyText), already inform the rule about the input context. Rule implementations are more decoupled from Digester. Arguably, a rule shouldn't have access to the full input context (i.e. the elements under the top element in the stack), it should only need to access information about the element it was matched on As said before, the class Digester is already bloated Sorry for the long explanation, I hope it is understandable.
        Hide
        rdonkin@apache.org added a comment -

        thanks very much for the perseverance Christopher. i think that after all that discussion
        we've come to a very good solution.

        • robert
        Show
        rdonkin@apache.org added a comment - thanks very much for the perseverance Christopher. i think that after all that discussion we've come to a very good solution. robert

          People

          • Assignee:
            Unassigned
            Reporter:
            Christopher Lenz
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development