Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      SOLR-5623 included some conversation about the dilemmas of exception management and reporting in the analysis chain.

      I've belatedly become educated about the infostream, and this situation is a job for it. The DocInverterPerField can note exceptions in the analysis chain, log out to the infostream, and then rethrow them as before. No wrapping, no muss, no fuss.

      There are comments on this JIRA from a more complex prior idea that readers might want to ignore.

      1. LUCENE-5405-4.x.patch
        4 kB
        Benson Margulies

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          This is a lot of api overhead, a special method 'throwXYZException' and a special exception.

          I don't think its worth it myself.

          Show
          Robert Muir added a comment - This is a lot of api overhead, a special method 'throwXYZException' and a special exception. I don't think its worth it myself.
          Hide
          Robert Muir added a comment -

          This also doesnt even solve your problem, where the exception came from within the analysis chain.

          Analysis stuff (e.g. OffsetAttributeImpl) just has a stream of tokens. it doesnt now about nor even have the concept of fields, and is used in cases outside of that context too (e.g. autosuggest).

          I do not think it buys anything to try to add such stuff to the analysis chain, as thats completely unrelated to the issue. If there is going to be any context added to exceptions, it should be done by the consumer of such streams.

          Show
          Robert Muir added a comment - This also doesnt even solve your problem, where the exception came from within the analysis chain . Analysis stuff (e.g. OffsetAttributeImpl) just has a stream of tokens. it doesnt now about nor even have the concept of fields, and is used in cases outside of that context too (e.g. autosuggest). I do not think it buys anything to try to add such stuff to the analysis chain, as thats completely unrelated to the issue. If there is going to be any context added to exceptions, it should be done by the consumer of such streams.
          Hide
          Benson Margulies added a comment -

          I've been frustrated for years by the coincidence that IOException lacks constructors for 'cause', and the Lucene API is full of 'throws IOException'. However, I only just now noticed that Java fixed this in 1.6.

          So, a weaker form of this would be a subclass of IOException that can carry a field name, and a place for TokenStream to hide a field name. Then something like the Solr error handler could instanceof to see if there's a field name to be had.

          Given the other API changes to token stream component construction for 5.0, one might argue that adding a ctor arg isn't so bad.

          Show
          Benson Margulies added a comment - I've been frustrated for years by the coincidence that IOException lacks constructors for 'cause', and the Lucene API is full of 'throws IOException'. However, I only just now noticed that Java fixed this in 1.6. So, a weaker form of this would be a subclass of IOException that can carry a field name, and a place for TokenStream to hide a field name. Then something like the Solr error handler could instanceof to see if there's a field name to be had. Given the other API changes to token stream component construction for 5.0, one might argue that adding a ctor arg isn't so bad.
          Hide
          Benson Margulies added a comment -

          Hmm, well, backing up. In the other discussion, you and others seemed very unhappy with schemes of the form:

          throw new SomeException("Some local explanation", someExceptionObject);

          Based on your most recent remark, I don't see any other way to get around this; my idea about storing field names is stupid, since the chains are reusable. So, either this sort of wrapping is tolerable or not. If tolerable, I'll rewrite this JIRA, else I'll close it.

          Show
          Benson Margulies added a comment - Hmm, well, backing up. In the other discussion, you and others seemed very unhappy with schemes of the form: throw new SomeException("Some local explanation", someExceptionObject); Based on your most recent remark, I don't see any other way to get around this; my idea about storing field names is stupid, since the chains are reusable. So, either this sort of wrapping is tolerable or not. If tolerable, I'll rewrite this JIRA, else I'll close it.
          Hide
          Robert Muir added a comment -

          I don't think we should change the design of the analysis stuff (adding field names to it, add new classes or new method signatures) to support the buggy case.

          To me the question is if there should be some change to DocInverterPerField or not: and to me thats the only place where we might change anything, because indexwriter "iterates" field names for you and its true you dont have an easy way (to my knowledge) to figure out which one it was "on" when it hit an issue.

          Otherwise, consumers such as queryparsers etc can deal with this themselves, if they want to add try/catch, so be it. I am sure that stuff e.g. solr is already wrapping the exceptions there in useless SolrExceptions today.

          Show
          Robert Muir added a comment - I don't think we should change the design of the analysis stuff (adding field names to it, add new classes or new method signatures) to support the buggy case. To me the question is if there should be some change to DocInverterPerField or not: and to me thats the only place where we might change anything, because indexwriter "iterates" field names for you and its true you dont have an easy way (to my knowledge) to figure out which one it was "on" when it hit an issue. Otherwise, consumers such as queryparsers etc can deal with this themselves, if they want to add try/catch, so be it. I am sure that stuff e.g. solr is already wrapping the exceptions there in useless SolrExceptions today.
          Hide
          Benson Margulies added a comment -

          Yes, we're now in the same place. Does a catch/throw in DocInverterPerField that does something like

          throw new LuceneAnalysisException("Error analyzing text", fieldName, originalException);

          make life better? I think it makes life better, as I don't see much evil in exception wrapping like this.

          Show
          Benson Margulies added a comment - Yes, we're now in the same place. Does a catch/throw in DocInverterPerField that does something like throw new LuceneAnalysisException("Error analyzing text", fieldName, originalException); make life better? I think it makes life better, as I don't see much evil in exception wrapping like this.
          Hide
          Robert Muir added a comment -

          I dont like it being LuceneAnalysisException really: I don't see the need for a custom class.

          I think it would just be throw new IllegalArgumentException("exception from analysis chain for field '" + field + "':", fieldName, originalException).

          I think it makes life better, as I don't see much evil in exception wrapping like this.

          There is a lot of evil still, it changes the class type of the exception (by wrapping it), by nesting, it makes exceptions harder to digest for the client (lots of lucene users use the same Analyzer for every field, so the field is just not interesting).

          Remember this is all about the buggy analyzer case. Its not like we are supporting a real use case here.

          So that's why i said, i'm not sure about this whole idea on SOLR-5623. I'm just not sure its the right tradeoffs.

          Besides, there are possibly even less invasive solutions. If we hit an exception from the analyzer, we could write that we hit an exception (not the exception text, just that we hit one) for field X to the infostream in the if (Unable to render embedded object: File (success) case. This means there is no catch block at all) not found. I would support such a change today. sure its not "perfectly user friendly" but it has absolutely no downsides at all, and only helps the user, and remember, this is all about the buggy analyzer case. We don't need to be user friendly for that, just safe.

          Show
          Robert Muir added a comment - I dont like it being LuceneAnalysisException really: I don't see the need for a custom class. I think it would just be throw new IllegalArgumentException("exception from analysis chain for field '" + field + "':", fieldName, originalException). I think it makes life better, as I don't see much evil in exception wrapping like this. There is a lot of evil still, it changes the class type of the exception (by wrapping it), by nesting, it makes exceptions harder to digest for the client (lots of lucene users use the same Analyzer for every field, so the field is just not interesting). Remember this is all about the buggy analyzer case . Its not like we are supporting a real use case here. So that's why i said, i'm not sure about this whole idea on SOLR-5623 . I'm just not sure its the right tradeoffs. Besides, there are possibly even less invasive solutions. If we hit an exception from the analyzer, we could write that we hit an exception (not the exception text, just that we hit one) for field X to the infostream in the if ( Unable to render embedded object: File (success) case. This means there is no catch block at all) not found. I would support such a change today. sure its not "perfectly user friendly" but it has absolutely no downsides at all, and only helps the user, and remember, this is all about the buggy analyzer case . We don't need to be user friendly for that, just safe.
          Hide
          Benson Margulies added a comment -

          https://github.com/apache/lucene-solr/pull/21 is a seemingly simple idea for how to code this.

          I'm off to write the test. In the mean time, I offer the PR just to show a concrete idea.

          Show
          Benson Margulies added a comment - https://github.com/apache/lucene-solr/pull/21 is a seemingly simple idea for how to code this. I'm off to write the test. In the mean time, I offer the PR just to show a concrete idea.
          Hide
          Benson Margulies added a comment -

          Test added. It passes. I'm sure I've missed something here.

          Show
          Benson Margulies added a comment - Test added. It passes. I'm sure I've missed something here.
          Hide
          Robert Muir added a comment -

          Benson: this patch looks great.

          I'll wait a bit for any feedback and test it out later today.

          Show
          Robert Muir added a comment - Benson: this patch looks great. I'll wait a bit for any feedback and test it out later today.
          Hide
          Michael McCandless added a comment -

          +1 to include field name in the infoStream logging ... anything to help in debugging analysis problems.

          Show
          Michael McCandless added a comment - +1 to include field name in the infoStream logging ... anything to help in debugging analysis problems.
          Hide
          Benson Margulies added a comment -

          Am I good to commit here?

          Show
          Benson Margulies added a comment - Am I good to commit here?
          Hide
          Robert Muir added a comment -

          +1

          Show
          Robert Muir added a comment - +1
          Hide
          ASF subversion and git services added a comment -

          Commit 1562657 from Benson Margulies in branch 'dev/trunk'
          [ https://svn.apache.org/r1562657 ]

          LUCENE-5405: when there's an exception thrown by an analysis component, note the field name
          on the info stream to aid in debugging.

          Show
          ASF subversion and git services added a comment - Commit 1562657 from Benson Margulies in branch 'dev/trunk' [ https://svn.apache.org/r1562657 ] LUCENE-5405 : when there's an exception thrown by an analysis component, note the field name on the info stream to aid in debugging.
          Hide
          Benson Margulies added a comment -

          Fixed in rev 1562657.

          Show
          Benson Margulies added a comment - Fixed in rev 1562657.
          Hide
          Michael McCandless added a comment -

          Shouldn't we port this to 4.x as well?

          Show
          Michael McCandless added a comment - Shouldn't we port this to 4.x as well?
          Hide
          Michael McCandless added a comment -

          Benson Margulies are you planning to backport to 4.x? I think we should?

          Show
          Michael McCandless added a comment - Benson Margulies are you planning to backport to 4.x? I think we should?
          Hide
          Benson Margulies added a comment -

          I can backport, Michael McCandless. Is there any doc on how the project manages branches? If not, I can add some to the web site to help guide patch-offerers.

          Show
          Benson Margulies added a comment - I can backport, Michael McCandless . Is there any doc on how the project manages branches? If not, I can add some to the web site to help guide patch-offerers.
          Hide
          Michael McCandless added a comment -

          Thanks Benson.

          I don't know of a guide for this. Typically I'll just do this:

          • cd ../4x
          • svn merge -c NNNNNN ../trunk
          • (resolve conflicts, pass tests)
          • commit

          You may need to move the CHANGES entry down under 4.x (and, fixup trunk's CHANGES too, if it wasn't under 4.x to begin with).

          Show
          Michael McCandless added a comment - Thanks Benson. I don't know of a guide for this. Typically I'll just do this: cd ../4x svn merge -c NNNNNN ../trunk (resolve conflicts, pass tests) commit You may need to move the CHANGES entry down under 4.x (and, fixup trunk's CHANGES too, if it wasn't under 4.x to begin with).
          Hide
          ASF subversion and git services added a comment -

          Commit 1563711 from Benson Margulies in branch 'dev/trunk'
          [ https://svn.apache.org/r1563711 ]

          LUCENE-5405: check in the unit test, that escaped the first time.

          closes #21.

          Show
          ASF subversion and git services added a comment - Commit 1563711 from Benson Margulies in branch 'dev/trunk' [ https://svn.apache.org/r1563711 ] LUCENE-5405 : check in the unit test, that escaped the first time. closes #21.
          Hide
          Benson Margulies added a comment -

          Somehow the unit test escaped the prior commit. 1563711 fills it in.

          Show
          Benson Margulies added a comment - Somehow the unit test escaped the prior commit. 1563711 fills it in.
          Hide
          Steve Rowe added a comment -

          Is there any doc on how the project manages branches? If not, I can add some to the web site to help guide patch-offerers.

          There is this: http://wiki.apache.org/lucene-java/SvnMerge

          Show
          Steve Rowe added a comment - Is there any doc on how the project manages branches? If not, I can add some to the web site to help guide patch-offerers. There is this: http://wiki.apache.org/lucene-java/SvnMerge
          Hide
          Benson Margulies added a comment -

          Well, svn merge did something I can't make heads or tails of, so I'm going to merge by hand.

          Show
          Benson Margulies added a comment - Well, svn merge did something I can't make heads or tails of, so I'm going to merge by hand.
          Hide
          ASF subversion and git services added a comment -

          Commit 1563741 from Benson Margulies in branch 'dev/trunk'
          [ https://svn.apache.org/r1563741 ]

          LUCENE-5405: this undoes 1563711, which was the result of a cascade of confusion.

          Show
          ASF subversion and git services added a comment - Commit 1563741 from Benson Margulies in branch 'dev/trunk' [ https://svn.apache.org/r1563741 ] LUCENE-5405 : this undoes 1563711, which was the result of a cascade of confusion.
          Hide
          Benson Margulies added a comment -

          Reviewable port.

          Show
          Benson Margulies added a comment - Reviewable port.
          Hide
          Benson Margulies added a comment -

          Michael McCandless and Robert Muir: The code in the 4.x branch is more complex. I think I've managed to carry the strategy across, but I'd be grateful for some skeptical eyeballs before I commit the attach patch that does the backport.

          Show
          Benson Margulies added a comment - Michael McCandless and Robert Muir : The code in the 4.x branch is more complex. I think I've managed to carry the strategy across, but I'd be grateful for some skeptical eyeballs before I commit the attach patch that does the backport.
          Hide
          Michael McCandless added a comment -

          Thanks Benson, I didn't realize this would get tricky!

          Since success2 seems to be equivalent to succeededInProcessingField, maybe just rename success2 instead of adding a new variable?

          Also, could you move the infoStream output to the end of that finally clause (so we're sure to call stream.close())? Paranoia ...

          Otherwise it looks great. Thanks!

          Show
          Michael McCandless added a comment - Thanks Benson, I didn't realize this would get tricky! Since success2 seems to be equivalent to succeededInProcessingField, maybe just rename success2 instead of adding a new variable? Also, could you move the infoStream output to the end of that finally clause (so we're sure to call stream.close())? Paranoia ... Otherwise it looks great. Thanks!
          Hide
          Benson Margulies added a comment -

          Will do. Thanks, this is exactly what sort of feedback I was looking for.

          Show
          Benson Margulies added a comment - Will do. Thanks, this is exactly what sort of feedback I was looking for.
          Hide
          ASF subversion and git services added a comment -

          Commit 1563850 from Benson Margulies in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1563850 ]

          LUCENE-5405: backport to 4.x.

          Show
          ASF subversion and git services added a comment - Commit 1563850 from Benson Margulies in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1563850 ] LUCENE-5405 : backport to 4.x.
          Hide
          Benson Margulies added a comment -

          rev 1563850 provides the backport.

          Show
          Benson Margulies added a comment - rev 1563850 provides the backport.
          Hide
          ASF subversion and git services added a comment -

          Commit 1563853 from Benson Margulies in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1563853 ]

          LUCENE-5405: CHANGES.txt

          Show
          ASF subversion and git services added a comment - Commit 1563853 from Benson Margulies in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1563853 ] LUCENE-5405 : CHANGES.txt
          Hide
          ASF subversion and git services added a comment -

          Commit 1563855 from Benson Margulies in branch 'dev/trunk'
          [ https://svn.apache.org/r1563855 ]

          LUCENE-5405: changes.txt; and fix a typo of Grant's for LUCENE-5406.

          Show
          ASF subversion and git services added a comment - Commit 1563855 from Benson Margulies in branch 'dev/trunk' [ https://svn.apache.org/r1563855 ] LUCENE-5405 : changes.txt; and fix a typo of Grant's for LUCENE-5406 .
          Hide
          ASF subversion and git services added a comment -

          Commit 1563857 from Benson Margulies in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1563857 ]

          LUCENE-5405: some leftover merge info.

          Show
          ASF subversion and git services added a comment - Commit 1563857 from Benson Margulies in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1563857 ] LUCENE-5405 : some leftover merge info.
          Hide
          Benson Margulies added a comment -

          backported, CHANGES.txt filled in. 'this time for sure'

          Show
          Benson Margulies added a comment - backported, CHANGES.txt filled in. 'this time for sure'
          Hide
          ASF subversion and git services added a comment -

          Commit 1563868 from Benson Margulies in branch 'dev/trunk'
          [ https://svn.apache.org/r1563868 ]

          LUCENE-5405, LUCENE-5406: move changes entries to 4.7.

          Show
          ASF subversion and git services added a comment - Commit 1563868 from Benson Margulies in branch 'dev/trunk' [ https://svn.apache.org/r1563868 ] LUCENE-5405 , LUCENE-5406 : move changes entries to 4.7.

            People

            • Assignee:
              Benson Margulies
              Reporter:
              Benson Margulies
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development