Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This codec stores all terms/postings in RAM. It uses an
      FST<BytesRef>. This is useful on a primary key field to ensure
      lookups don't need to hit disk, to keep NRT reopen time fast even
      under IO contention.

      1. LUCENE-3209.patch
        36 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Patch; I think it's working and ready to commit. All tests pass w/ it, and I went and disabled the same tests that avoid SimpleText codec.

          Show
          Michael McCandless added a comment - Patch; I think it's working and ready to commit. All tests pass w/ it, and I went and disabled the same tests that avoid SimpleText codec.
          Hide
          Michael McCandless added a comment -

          To clarify: this codec stores postings on disk, but then on read (for searching) it loads the full byte[] from disk into RAM.

          Show
          Michael McCandless added a comment - To clarify: this codec stores postings on disk, but then on read (for searching) it loads the full byte[] from disk into RAM.
          Hide
          Simon Willnauer added a comment -

          This seems to be related to LUCENE-3069 right?

          Show
          Simon Willnauer added a comment - This seems to be related to LUCENE-3069 right?
          Hide
          Dawid Weiss added a comment -

          Looks like a related thing to me.

          Show
          Dawid Weiss added a comment - Looks like a related thing to me.
          Hide
          Michael McCandless added a comment -

          Woops! I forgot about LUCENE-3069, but, yes this is very similar.

          But I think one difference is LUCENE-3069 aims to have all terms memory resident but postings would still reside in the Directory, I think? Whereas my patch here puts all terms and postings in RAM (in a single FST). The postings format is similar to what PulsingCodec does, ie, doc + tf + pos + payload are all serialized into a single byte[] using delta vInts.

          So I think we should keep LUCENE-3069 open, as an enhancement to this codec to make it separately controllable whether postings should also be in RAM?

          Show
          Michael McCandless added a comment - Woops! I forgot about LUCENE-3069 , but, yes this is very similar. But I think one difference is LUCENE-3069 aims to have all terms memory resident but postings would still reside in the Directory, I think? Whereas my patch here puts all terms and postings in RAM (in a single FST). The postings format is similar to what PulsingCodec does, ie, doc + tf + pos + payload are all serialized into a single byte[] using delta vInts. So I think we should keep LUCENE-3069 open, as an enhancement to this codec to make it separately controllable whether postings should also be in RAM?

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development