Lucene.Net
  1. Lucene.Net
  2. LUCENENET-54

ArgumentOurOfRangeException caused by SF.Snowball.Ext.DanishStemmer

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
    • Fix Version/s: Lucene.Net 3.0.3
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows XP SP2, lucene.net v2.0 004

      Description

      Exception Information
      System.SystemException: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
      Parameter name: length
      at System.String.Substring(Int32 startIndex, Int32 length)
      at System.Text.StringBuilder.ToString(Int32 startIndex, Int32 length)
      at SF.Snowball.SnowballProgram.slice_to(StringBuilder s)
      at SF.Snowball.Ext.DanishStemmer.r_undouble()
      at SF.Snowball.Ext.DanishStemmer.Stem()
      — End of inner exception stack trace —
      at System.Reflection.RuntimeMethodInfo.InternalInvoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean isBinderDefault, Assembly caller, Boolean verifyAccess)
      at System.Reflection.RuntimeMethodInfo.InternalInvoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean verifyAccess)
      at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
      at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
      at System.Reflection.MethodInfo.Invoke(Object obj, Object[] parameters)
      at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
      at Lucene.Net.Analysis.Snowball.SnowballFilter.Next()
      at Lucene.Net.Index.DocumentWriter.InvertDocument(Document doc)
      at Lucene.Net.Index.DocumentWriter.AddDocument(String segment, Document doc)
      at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer analyzer)

        Issue Links

          Activity

          Hide
          George Aroush added a comment -

          Hi Torsten,

          Please first use the Lucene.Net mailing list to describe the nature of the problem and how to reproduce it. Once you have confirmation that the cause is a defect, then submit an issue.

          Your current submission of this issue doesn't give us any clue what you were doing to see this exception and what might be causing the exception.

          Thanks.

          – George

          Show
          George Aroush added a comment - Hi Torsten, Please first use the Lucene.Net mailing list to describe the nature of the problem and how to reproduce it. Once you have confirmation that the cause is a defect, then submit an issue. Your current submission of this issue doesn't give us any clue what you were doing to see this exception and what might be causing the exception. Thanks. – George
          Hide
          Torsten Rendelmann added a comment -

          Hi George, sorry about posting only the callstack - my idea was it should be enough to analyze this.
          OK, here is what we did: we parse XML feeds and index the ite content. The language (if provided) of the feed controls what stemmer (analyzer) we instantiate to index/add the content. In case of a Danish feed content the Stemmer failed to index (as posted)

          Meanwhile we have the same issue with the FinishStemmer too, so you may think about to re-open that issue and try to track it with a unit test -
          The current "solution" in our project is to just ignore "danish" and "finish" special language indexing and use the default analyzer.

          Show
          Torsten Rendelmann added a comment - Hi George, sorry about posting only the callstack - my idea was it should be enough to analyze this. OK, here is what we did: we parse XML feeds and index the ite content. The language (if provided) of the feed controls what stemmer (analyzer) we instantiate to index/add the content. In case of a Danish feed content the Stemmer failed to index (as posted) Meanwhile we have the same issue with the FinishStemmer too, so you may think about to re-open that issue and try to track it with a unit test - The current "solution" in our project is to just ignore "danish" and "finish" special language indexing and use the default analyzer.
          Hide
          Jason Fitzharris added a comment -

          I encountered the same issue when using the Finnish stemmer. The problem is similar to LUCENENET-102 as Java and .NET defines string.substring differently. Java uses

          string.substring(firstIndex, lastIndex)

          whereas .NET uses

          string.Substring(startIndex, length)

          The solution is to change the line 467 in Snowballprogram.slice_to from

          s.Append(current.ToString(bra, ket));

          to

          s.Append(current.ToString(bra, len));

          len is an existing but unused variable which is declared as

          int len = ket - bra;

          Show
          Jason Fitzharris added a comment - I encountered the same issue when using the Finnish stemmer. The problem is similar to LUCENENET-102 as Java and .NET defines string.substring differently. Java uses string.substring(firstIndex, lastIndex) whereas .NET uses string.Substring(startIndex, length) The solution is to change the line 467 in Snowballprogram.slice_to from s.Append(current.ToString(bra, ket)); to s.Append(current.ToString(bra, len)); len is an existing but unused variable which is declared as int len = ket - bra;
          Hide
          Prescott Nasser added a comment -

          Valid issue

          Show
          Prescott Nasser added a comment - Valid issue
          Hide
          Prescott Nasser added a comment - - edited

          Simon Svensson:

          I've written a simple reproduction of LUCENENET-54
          (ArgumentOutOfRangeException in SnowballProgram). I'm not sure about the
          correct workflow to reopen this issue (it was closed as invalid in 2007
          due to missing information), so I'm throwing what I got into the
          developer mailing list and hope that someone else knows the correct
          approach. Problem originates in SnowballProgram.slice_to, where the
          second argument to StringBuilder.ToString(start, length) where the last
          parameter is passed an index instead of the length.

          Reproduction:
          using System.IO;
          using Lucene.Net.Analysis.Snowball;
          using Lucene.Net.Analysis.Tokenattributes;
          using NUnit.Framework;

          namespace ConsoleApplication {
          [TestFixture]
          public class LuceneRepo {
          [Test(Description = "LUCENENET-54")]
          public void Repro()

          { var analyzer = new SnowballAnalyzer("Finnish"); var input = new StringReader("terve"); var tokenStream = analyzer.TokenStream("fieldName", input); var termAttr = (TermAttribute)tokenStream.AddAttribute(typeof (TermAttribute)); Assert.That(tokenStream.IncrementToken(), Is.True); Assert.That(termAttr.Term(), Is.EqualTo("terv")); }

          }
          }

          Unexpected exception:
          System.ArgumentOutOfRangeException: Index and length must refer to a
          location within the string.
          Parameter name: length
          at System.Text.StringBuilder.ToString(Int32 startIndex, Int32 length)
          at SF.Snowball.SnowballProgram.slice_to(StringBuilder s) in
          C:\Dev\Third
          Party\Lucene.NET\src\contrib\Snowball\SF\Snowball\SnowballProgram.cs:line 467
          at SF.Snowball.Ext.FinnishStemmer.r_tidy() in C:\Dev\Third
          Party\Lucene.NET\src\contrib\Snowball\SF\Snowball\Ext\FinnishStemmer.cs:line
          974
          at SF.Snowball.Ext.FinnishStemmer.Stem() in C:\Dev\Third
          Party\Lucene.NET\src\contrib\Snowball\SF\Snowball\Ext\FinnishStemmer.cs:line
          1137

          Expected result:
          The unit test should pass.

          Show
          Prescott Nasser added a comment - - edited Simon Svensson: I've written a simple reproduction of LUCENENET-54 (ArgumentOutOfRangeException in SnowballProgram). I'm not sure about the correct workflow to reopen this issue (it was closed as invalid in 2007 due to missing information), so I'm throwing what I got into the developer mailing list and hope that someone else knows the correct approach. Problem originates in SnowballProgram.slice_to, where the second argument to StringBuilder.ToString(start, length) where the last parameter is passed an index instead of the length. Reproduction: using System.IO; using Lucene.Net.Analysis.Snowball; using Lucene.Net.Analysis.Tokenattributes; using NUnit.Framework; namespace ConsoleApplication { [TestFixture] public class LuceneRepo { [Test(Description = "LUCENENET-54")] public void Repro() { var analyzer = new SnowballAnalyzer("Finnish"); var input = new StringReader("terve"); var tokenStream = analyzer.TokenStream("fieldName", input); var termAttr = (TermAttribute)tokenStream.AddAttribute(typeof (TermAttribute)); Assert.That(tokenStream.IncrementToken(), Is.True); Assert.That(termAttr.Term(), Is.EqualTo("terv")); } } } Unexpected exception: System.ArgumentOutOfRangeException: Index and length must refer to a location within the string. Parameter name: length at System.Text.StringBuilder.ToString(Int32 startIndex, Int32 length) at SF.Snowball.SnowballProgram.slice_to(StringBuilder s) in C:\Dev\Third Party\Lucene.NET\src\contrib\Snowball\SF\Snowball\SnowballProgram.cs:line 467 at SF.Snowball.Ext.FinnishStemmer.r_tidy() in C:\Dev\Third Party\Lucene.NET\src\contrib\Snowball\SF\Snowball\Ext\FinnishStemmer.cs:line 974 at SF.Snowball.Ext.FinnishStemmer.Stem() in C:\Dev\Third Party\Lucene.NET\src\contrib\Snowball\SF\Snowball\Ext\FinnishStemmer.cs:line 1137 Expected result: The unit test should pass.
          Hide
          Christopher Currens added a comment -

          This affected DanishStemmer, FinnishStemmer and KpStemmer. "ket-bra" is used almost everywhere in that code, expect that one place, causing the issue with the index vs length. I just pushed a fix for this to the 3.0.3 branch.

          Show
          Christopher Currens added a comment - This affected DanishStemmer, FinnishStemmer and KpStemmer. "ket-bra" is used almost everywhere in that code, expect that one place, causing the issue with the index vs length. I just pushed a fix for this to the 3.0.3 branch.

            People

            • Assignee:
              Unassigned
              Reporter:
              Torsten Rendelmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development