Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3071

PathHierarchyTokenizer adaptation for urls: splits reversed

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 3.5, 4.0-ALPHA
    • modules/analysis
    • None

    Description

      PathHierarchyTokenizer should be usable to split urls the a "reversed" way (useful for faceted search against urls):
      www.site.com -> www.site.com, site.com, com

      Moreover, it should be able to skip a given number of first (or last, if reversed) tokens:
      /usr/share/doc/somesoftware/INTERESTING/PART
      Should give with 4 tokens skipped:
      INTERESTING
      INTERESTING/PART

      Attachments

        1. ant.log.tar.bz2
          15 kB
          Olivier Favre
        2. LUCENE-3071.patch
          22 kB
          Ryan McKinley
        3. LUCENE-3071.patch
          19 kB
          Ryan McKinley
        4. LUCENE-3071.patch
          18 kB
          Olivier Favre
        5. LUCENE-3071.patch
          18 kB
          Olivier Favre

        Activity

          People

            ryantxu Ryan McKinley
            ofavre Olivier Favre
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified