Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None

      Description

      Writing an EntityProcessor is deceptively complex. There are so many gotchas.

      I propose the following:

      1. Extract out the Transformer application logic from EntityProcessor and add it to DocBuilder. Then EntityProcessor do not need to call applyTransformer or know about rowIterator and getFromRowCache() methods.
      2. Change the meaning of EntityProcessor#destroy to be called on end of parent's row – Right now init is called once per parent row but destroy actually means the end of import. In fact, there is no correct way for an entity processor to do clean up right now. Most do clean up when returning null (end of data) but with the introduction of $skipDoc, a transformer can return $skipDoc and the entity processor will never get a chance to clean up for the current init.
      3. EntityProcessor will use the EventListener API to listen for import end. This should be used by EntityProcessor to do a final cleanup.
      1. SOLR-1120.patch
        2 kB
        Noble Paul
      2. SOLR-1120.patch
        1 kB
        Noble Paul
      3. SOLR-1120.patch
        9 kB
        Shalin Shekhar Mangar
      4. SOLR-1120.patch
        11 kB
        Noble Paul
      5. SOLR-1120.patch
        12 kB
        Noble Paul
      6. SOLR-1120.patch
        40 kB
        Shalin Shekhar Mangar
      7. SOLR-1120.patch
        38 kB
        Shalin Shekhar Mangar
      8. SOLR-1120.patch
        32 kB
        Noble Paul
      9. SOLR-1120.patch
        27 kB
        Noble Paul

        Activity

        Hide
        Fergus McMenemie added a comment -

        Good idea. Dont know if you are interested in the following.

        • Further to extracting out the Transformer application logic, I was wondering if every entity attribute read should automatically be processed by replaceTokens. Is there any ligitimate place where one would want to disallow replaceTokens? The following snippet of code is repeated far too many times; but is important if DIH is to provide simple predictable behaviour.
            s = context.getEntityAttribute(CHANGELIST_OMIT);
            if (s != null) s = resolver.replaceTokens(s);
        
        
        • The regexp transformer now has several combinations of mutually exclusive attributes. It would be nice to check the attributes for nonsensical combinations. However given that the transformer is invoked for every row such checking code could be a nasty overhead. I dont know how to sort this, but somehow we need to catch the first invocation of a fields transformer and allow far more detailed checking of the attributes
        Show
        Fergus McMenemie added a comment - Good idea. Dont know if you are interested in the following. Further to extracting out the Transformer application logic, I was wondering if every entity attribute read should automatically be processed by replaceTokens. Is there any ligitimate place where one would want to disallow replaceTokens? The following snippet of code is repeated far too many times; but is important if DIH is to provide simple predictable behaviour. s = context.getEntityAttribute(CHANGELIST_OMIT); if (s != null ) s = resolver.replaceTokens(s); The regexp transformer now has several combinations of mutually exclusive attributes. It would be nice to check the attributes for nonsensical combinations. However given that the transformer is invoked for every row such checking code could be a nasty overhead. I dont know how to sort this, but somehow we need to catch the first invocation of a fields transformer and allow far more detailed checking of the attributes
        Hide
        Noble Paul added a comment -

        Is there any ligitimate place where one would want to disallow replaceTokens?

        yes . the XPathEntityProcessor uses it directly just to know what are the variables in the url so that it can read them and store . probably we can add amethod getEntityAttributeResolved() to get the resolved value

        Show
        Noble Paul added a comment - Is there any ligitimate place where one would want to disallow replaceTokens? yes . the XPathEntityProcessor uses it directly just to know what are the variables in the url so that it can read them and store . probably we can add amethod getEntityAttributeResolved() to get the resolved value
        Hide
        Noble Paul added a comment -

        A new class added EntityProcessorWrapper which does all the transformer related actions. EntityProcessors are now agnostic of Transformers alltogether

        Show
        Noble Paul added a comment - A new class added EntityProcessorWrapper which does all the transformer related actions. EntityProcessors are now agnostic of Transformers alltogether
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Noble.

        This patch has the following changes on top of Noble's patch:

        1. Added EntityProcessor#close method which will be called at the end of import. This is simpler than using EventListeners.
        2. Fixed a bug with calling destroy in the wrong place in DataConfig

        All tests pass. I'll commit this shortly.

        Show
        Shalin Shekhar Mangar added a comment - Thanks Noble. This patch has the following changes on top of Noble's patch: Added EntityProcessor#close method which will be called at the end of import. This is simpler than using EventListeners. Fixed a bug with calling destroy in the wrong place in DataConfig All tests pass. I'll commit this shortly.
        Hide
        Shalin Shekhar Mangar added a comment -

        Another patch which adds a new method to Context:

        1. Context#getResolvedEntityAttribute(String) which returns the resolved value of the entity attribute

        I'll commit shortly.

        Show
        Shalin Shekhar Mangar added a comment - Another patch which adds a new method to Context: Context#getResolvedEntityAttribute(String) which returns the resolved value of the entity attribute I'll commit shortly.
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 766608.

        Thanks Noble!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 766608. Thanks Noble!
        Hide
        Noble Paul added a comment -

        futher simplifying DocBuilder

        Show
        Noble Paul added a comment - futher simplifying DocBuilder
        Hide
        Shalin Shekhar Mangar added a comment -

        Same as Noble's patch but DocWrapper extends SolrInputDocument now.

        I'll commit this shortly.

        Show
        Shalin Shekhar Mangar added a comment - Same as Noble's patch but DocWrapper extends SolrInputDocument now. I'll commit this shortly.
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 769058.

        Thanks Noble!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 769058. Thanks Noble!
        Hide
        Shalin Shekhar Mangar added a comment -
        Show
        Shalin Shekhar Mangar added a comment - The initial commit for this issue broke the Debug functionality. Refer to http://www.lucidimagination.com/search/document/42c345a606820f9/npe_in_dataimport_debuglogger_peekstack_dih_development_console
        Hide
        Noble Paul added a comment -

        committed revision:781272

        thanks Steffen Baumgart

        Show
        Noble Paul added a comment - committed revision:781272 thanks Steffen Baumgart
        Hide
        Noble Paul added a comment -

        this has broken some features in XPathEntityProcessor

        if a field is added by a transformer then XPathEntityProcessor is unable to use it in common fields .Features like $nextUrl , $hasMore cannot work

        Show
        Noble Paul added a comment - this has broken some features in XPathEntityProcessor if a field is added by a transformer then XPathEntityProcessor is unable to use it in common fields .Features like $nextUrl , $hasMore cannot work
        Hide
        Noble Paul added a comment -

        a new method added to EntityProcessor (postTransform) . this can be used by the EntityProcessor implementations to get a callback after the transformations are done

        Show
        Noble Paul added a comment - a new method added to EntityProcessor (postTransform) . this can be used by the EntityProcessor implementations to get a callback after the transformations are done
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 783750.

        Thanks Noble!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 783750. Thanks Noble!
        Hide
        Grant Ingersoll added a comment -

        Bulk close Solr 1.4 issues

        Show
        Grant Ingersoll added a comment - Bulk close Solr 1.4 issues

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Shalin Shekhar Mangar
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development