Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Labels:
      None

      Description

      Currently DIH fails completely on any error. We must have better control on error behavior

      mail thread: http://markmail.org/message/xvfbfaskfmlj2pnm

      an entity can have an attribute onError the values can be abort, continue,skip

      abort is the default . It aborts the import. continue or skip does not fail the import it continues from there. skip skips all rows in an xml (only if stream != true)if there is an error in xml but continues with the next xml.

      1. SOLR-842.patch
        26 kB
        Shalin Shekhar Mangar
      2. SOLR-842.patch
        16 kB
        Shalin Shekhar Mangar
      3. SOLR-842.patch
        15 kB
        Noble Paul
      4. SOLR-842.patch
        4 kB
        Noble Paul

        Activity

        Hide
        Noble Paul added a comment -

        implemented for XPathEntityProcessor

        Show
        Noble Paul added a comment - implemented for XPathEntityProcessor
        Hide
        Noble Paul added a comment -

        even parsing failure is taken care of and better logging

        Show
        Noble Paul added a comment - even parsing failure is taken care of and better logging
        Hide
        Shalin Shekhar Mangar added a comment -

        Changes

        1. Changes to skip behavior in case of non-streaming mode in XPathEntityProcessor

        Just to be clear, I'll define the meaning of the three constants – abort, skip and continue

        • abort – abort the import operation
        • skip – If an exception is raised, skip the current document and continue with the next one. Note that in case of a parsing error for XPathEntityProcessor, it is not possible to skip and proceed.
        • continue – Ignore the exception completely and proceed to create the current document as if nothing has happened.

        Need to add tests for this.

        Show
        Shalin Shekhar Mangar added a comment - Changes Changes to skip behavior in case of non-streaming mode in XPathEntityProcessor Just to be clear, I'll define the meaning of the three constants – abort, skip and continue abort – abort the import operation skip – If an exception is raised, skip the current document and continue with the next one. Note that in case of a parsing error for XPathEntityProcessor, it is not possible to skip and proceed. continue – Ignore the exception completely and proceed to create the current document as if nothing has happened. Need to add tests for this.
        Hide
        Shalin Shekhar Mangar added a comment -

        Adding a test case. I plan to commit shortly.

        Show
        Shalin Shekhar Mangar added a comment - Adding a test case. I plan to commit shortly.
        Hide
        Shalin Shekhar Mangar added a comment -

        Committed revision 713335.

        Thanks Noble!

        Show
        Shalin Shekhar Mangar added a comment - Committed revision 713335. Thanks Noble!
        Hide
        Lance Norskog added a comment -

        Wow!

        I just found another case for loop control: receiving no documents in a loop.

        My test case is that to fetch subsequent pages of results (first 40, next 40, etc.) from a search API I could not use any value returned in the last request. I had to make an XML file giving the "start 0, start 40, start 80" sequence. I drove an RSS feed input with this as an outer loop.

        Now, suppose I have 100 requests in the file but this particular search only has 20 results. The second time I do the search I get no documents: now I want to break out of my driving XML file loop. With the current DIH i will send another 98 search requests that will all fail.

        So, two features here:
        1) to skip when there are no documents.
        2) to end the next outer loop.

        "break to entity X" would be the most flexible - you could break out three loops if you want. This is the same as "break to label" in Java or C.

        Thanks for your time,

        Lance (the instigator)

        Show
        Lance Norskog added a comment - Wow! I just found another case for loop control: receiving no documents in a loop. My test case is that to fetch subsequent pages of results (first 40, next 40, etc.) from a search API I could not use any value returned in the last request. I had to make an XML file giving the "start 0, start 40, start 80" sequence. I drove an RSS feed input with this as an outer loop. Now, suppose I have 100 requests in the file but this particular search only has 20 results. The second time I do the search I get no documents: now I want to break out of my driving XML file loop. With the current DIH i will send another 98 search requests that will all fail. So, two features here: 1) to skip when there are no documents. 2) to end the next outer loop. "break to entity X" would be the most flexible - you could break out three loops if you want. This is the same as "break to label" in Java or C. Thanks for your time, Lance (the instigator)
        Hide
        Noble Paul added a comment -

        Lance , could you paste a sample data-config and explain the usecase .

        Show
        Noble Paul added a comment - Lance , could you paste a sample data-config and explain the usecase .
        Hide
        Shalin Shekhar Mangar added a comment -

        Lance, Transformers can add two special fields to a row "$hasMore" and "$nextUrl" which tells the XPathEntityProcessor whether to stop now and if not, what is the nextUrl to be fetched. You can write a transformer which adds these special fields based on whether you have more results or not. Maybe that can be used here?

        Show
        Shalin Shekhar Mangar added a comment - Lance, Transformers can add two special fields to a row "$hasMore" and "$nextUrl" which tells the XPathEntityProcessor whether to stop now and if not, what is the nextUrl to be fetched. You can write a transformer which adds these special fields based on whether you have more results or not. Maybe that can be used here?

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Noble Paul
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development