Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8931

IndexSummary (and Index) should store the token, and the minimal key to unambiguously direct a query

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: 4.x
    • Component/s: None
    • Labels:

      Description

      Since these files are likely sticking around a little longer, it is probably worth optimising them. A relatively simple change to Index and IndexSummary could reduce the amount of space required significantly, reduce the CPU burden of lookup, and hopefully bound the amount of space needed as key size grows. On writing first we always store the token before the key (if it is different to the key); then we simply truncate the whole record to the minimum length necessary to answer an inequality search. Since the data file contains the key also, we can corroborate we have the right key once we've looked up. Since BFs are used to reduce unnecessary lookups, we don't save much by ruling the false positives out one step earlier.

      An improved follow up version would be to use a trie of shortest length to answer inequality lookups, as this would also ensure very long keys with common prefixes would not significantly increase the size of the index or summary. This would translate to a trie index for the summary keying into a static trie page for the index.

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -

          then we simply truncate the whole record to the minimum length necessary to answer an inequality search

          Meaning, we only store enough to disambiguate from the records before and after?

          Show
          jbellis Jonathan Ellis added a comment - then we simply truncate the whole record to the minimum length necessary to answer an inequality search Meaning, we only store enough to disambiguate from the records before and after?
          Hide
          benedict Benedict added a comment -

          Right

          Show
          benedict Benedict added a comment - Right
          Hide
          jbellis Jonathan Ellis added a comment -

          Good idea. This will save a lot of memory.

          Show
          jbellis Jonathan Ellis added a comment - Good idea. This will save a lot of memory.
          Hide
          snazy Robert Stupp added a comment -

          Just a raw idea, but maybe we do not need index-summaries at all with this ticket (assuming murmur3).

          Show
          snazy Robert Stupp added a comment - Just a raw idea, but maybe we do not need index-summaries at all with this ticket (assuming murmur3).
          Hide
          michaelsembwever mck added a comment -

          Bumping to fix version 4.x, as 3.11.0 is a bug-fix only release.
            ref https://s.apache.org/EHBy

          Show
          michaelsembwever mck added a comment - Bumping to fix version 4.x, as 3.11.0 is a bug-fix only release.   ref https://s.apache.org/EHBy

            People

            • Assignee:
              Unassigned
              Reporter:
              benedict Benedict
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:

                Development