Lucene.Net
  1. Lucene.Net
  2. LUCENENET-463

Would like to be able to use a SimpleSpanFragmenter for extrcting whole sentances

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lucene.Net 3.0.3
    • Component/s: Lucene.Net Contrib
    • Labels:
      None

      Description

      This is described in the Java version, but it does not seem to be in the dot.net port, is there a reason for this as I would have imagined lots of people doing document work would want it.

        Issue Links

          Activity

          Hide
          Digy added a comment -

          My guess:
          Either it is not widely used as you think, or no one is willing to port it

          DIGY

          Show
          Digy added a comment - My guess: Either it is not widely used as you think, or no one is willing to port it DIGY
          Hide
          Christopher Currens added a comment -

          This is, at least, on my list of things to include in 3.0.3, but only if I can find the time to do it. The entire Highlighter project has a lot of changes between what is in it now and what needs to be for it to be the same as Java's highlighter in 3.0.3. This isn't a top priority for 3.0.3. If you'd like to port it yourself or parts of it to help make sure it's in the next release, we'd welcome it and include it, otherwise, we'll try and get it in ourselves, time permitting.

          Show
          Christopher Currens added a comment - This is, at least, on my list of things to include in 3.0.3, but only if I can find the time to do it. The entire Highlighter project has a lot of changes between what is in it now and what needs to be for it to be the same as Java's highlighter in 3.0.3. This isn't a top priority for 3.0.3. If you'd like to port it yourself or parts of it to help make sure it's in the next release, we'd welcome it and include it, otherwise, we'll try and get it in ourselves, time permitting.
          Hide
          Prescott Nasser added a comment -

          I'll try to take a look at this over the weekend.

          Show
          Prescott Nasser added a comment - I'll try to take a look at this over the weekend.
          Hide
          Christopher Currens added a comment -

          I pushed some changes already, namely porting these files:

          /incubator/lucene.net/trunk/src/contrib/Highlighter/Contrib.Highlighter.csproj
          /incubator/lucene.net/trunk/src/contrib/Highlighter/DefaultEncoder.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/Encoder.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/Formatter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/Fragmenter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/GradientFormatter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/Highlighter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/IEncoder.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/InvalidTokenOffsetsException.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/NullFragmenter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/Package.html
          /incubator/lucene.net/trunk/src/contrib/Highlighter/QueryScorer.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/QueryTermExtractor.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/QueryTermScorer.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/Scorer.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/SimpleFragmenter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/SimpleHTMLEncoder.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/SimpleHTMLFormatter.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/WeightedSpanTerm.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/WeightedSpanTermExtractor.cs
          /incubator/lucene.net/trunk/src/contrib/Highlighter/WeightedTerm.cs
          

          It's all a moot point though, because the real problem is that Highlighter relies on another contrib project we've never ported: MemoryIndex. In theory it shouldn't be too bad, it's only one file (46KB, though...yikes!). I've actually had to comment out parts of the ported Highlighter because it relies on this and we don't have it. So until we port that, Highlighter can't really be done.

          Show
          Christopher Currens added a comment - I pushed some changes already, namely porting these files: /incubator/lucene.net/trunk/src/contrib/Highlighter/Contrib.Highlighter.csproj /incubator/lucene.net/trunk/src/contrib/Highlighter/DefaultEncoder.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/Encoder.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/Formatter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/Fragmenter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/GradientFormatter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/Highlighter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/IEncoder.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/InvalidTokenOffsetsException.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/NullFragmenter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/Package.html /incubator/lucene.net/trunk/src/contrib/Highlighter/QueryScorer.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/QueryTermExtractor.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/QueryTermScorer.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/Scorer.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/SimpleFragmenter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/SimpleHTMLEncoder.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/SimpleHTMLFormatter.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/WeightedSpanTerm.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/WeightedSpanTermExtractor.cs /incubator/lucene.net/trunk/src/contrib/Highlighter/WeightedTerm.cs It's all a moot point though, because the real problem is that Highlighter relies on another contrib project we've never ported: MemoryIndex. In theory it shouldn't be too bad, it's only one file (46KB, though...yikes!). I've actually had to comment out parts of the ported Highlighter because it relies on this and we don't have it. So until we port that, Highlighter can't really be done.
          Hide
          Christopher Currens added a comment -

          Ported this and the tests. It's actually quite fast.

          For some reason the FVH is considerably slower than this (even with the PhraseLimit). It might not be an issue with FVH, since we're using a far different version than java is at 3.0.3, but we should probably look it over just in case.

          Show
          Christopher Currens added a comment - Ported this and the tests. It's actually quite fast. For some reason the FVH is considerably slower than this (even with the PhraseLimit). It might not be an issue with FVH, since we're using a far different version than java is at 3.0.3, but we should probably look it over just in case.

            People

            • Assignee:
              Christopher Currens
              Reporter:
              Steven
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development