Details

    • Type: New Feature New Feature
    • Status: Reopened
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.4.1
    • Fix Version/s: None
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In lucene-1793, there is the off-topic suggestion to provide compression of Unicode data. The motivation was a custom encoding in a Russian analyzer. The original supposition was that it provided a more compact index.

      This led to the comment that a different or compressed encoding would be a generally useful feature.

      BOCU-1 was suggested as a possibility. This is a patented algorithm by IBM with an implementation in ICU. If Lucene provide it's own implementation a freely avIlable, royalty-free license would need to be obtained.

      SCSU is another Unicode compression algorithm that could be used.

      An advantage of these methods is that they work on the whole of Unicode. If that is not needed an encoding such as iso8859-1 (or whatever covers the input) could be used.

      1. LUCENE-1799.patch
        9 kB
        Robert Muir
      2. LUCENE-1799.patch
        9 kB
        Uwe Schindler
      3. LUCENE-1799.patch
        9 kB
        Uwe Schindler
      4. LUCENE-1799.patch
        10 kB
        Uwe Schindler
      5. LUCENE-1799.patch
        11 kB
        Uwe Schindler
      6. LUCENE-1799.patch
        17 kB
        Uwe Schindler
      7. LUCENE-1799.patch
        9 kB
        Robert Muir
      8. LUCENE-1799.patch
        7 kB
        Michael McCandless
      9. LUCENE-1799.patch
        7 kB
        Michael McCandless
      10. LUCENE-1799.patch
        9 kB
        Michael McCandless
      11. LUCENE-1799.patch
        9 kB
        Robert Muir
      12. LUCENE-1799.patch
        9 kB
        Robert Muir
      13. LUCENE-1799.patch
        9 kB
        Robert Muir
      14. LUCENE-1799.patch
        13 kB
        Robert Muir
      15. LUCENE-1799_big.patch
        355 kB
        Robert Muir
      16. LUCENE-1779.patch
        21 kB
        Michael McCandless
      17. Benchmark.java
        1 kB
        Robert Muir
      18. Benchmark.java
        4 kB
        Yonik Seeley
      19. Benchmark.java
        1 kB
        Yonik Seeley

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            DM Smith
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development