Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3071

PathHierarchyTokenizer adaptation for urls: splits reversed

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None

      Description

      PathHierarchyTokenizer should be usable to split urls the a "reversed" way (useful for faceted search against urls):
      www.site.com -> www.site.com, site.com, com

      Moreover, it should be able to skip a given number of first (or last, if reversed) tokens:
      /usr/share/doc/somesoftware/INTERESTING/PART
      Should give with 4 tokens skipped:
      INTERESTING
      INTERESTING/PART

        Attachments

        1. LUCENE-3071.patch
          18 kB
          Olivier Favre
        2. LUCENE-3071.patch
          18 kB
          Olivier Favre
        3. LUCENE-3071.patch
          19 kB
          Ryan McKinley
        4. LUCENE-3071.patch
          22 kB
          Ryan McKinley
        5. ant.log.tar.bz2
          15 kB
          Olivier Favre

          Activity

            People

            • Assignee:
              ryantxu Ryan McKinley
              Reporter:
              ofavre Olivier Favre
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified