Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Later
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It seems to me that KeyCache can be improved in a number of ways, but first let's state the basic goal of KeyCache: to reduce the average query response time by providing an exact seek position in a file for a given key.

      As it stands, KeyCache is both 100% accurate, but requires a lot of overhead per entry.

      I propose to make KeyCache mostly accurate (say 99.9999%), by which I means it will always fail accurately, but may rarely return an incorrect address, and code the end users of it to be able to retry to confirm they seeked to the correct position in the file, and to retry without the cache if they did not.

      The advantage of this is that we can both take the cache off-heap easily, and pack a lot more items into the cache. If we permit collisions across files and simply use the (full 128-bit) murmur hash of the key + cfId + file generation, we should get enough uniqueness to rarely have an erroneuous collision, however we will be using only 20 bytes per key, instead of the current 100 + <key length> bytes. This should allow us to answer far more queries from the key cache than before, so the positive improvement to performance should be greater than the negative drain.

      For the structure I propose an associative cache, where a single contiguous address space is broken up into regions of, say, 8 entries, plus one counter. The counter tracks the recency of access of each of the entries, so that on write the least recently accessed/written can be replaced. A linear probe within the region is used to determine if the entry we're looking for is present. This should be very quick, as the entire region should fit into one or two lines of L1.

      Advantage: we may see 5x bump in cache hit-rate, or even more for large keys.

        Issue Links

          Activity

          Hide
          benedict Benedict added a comment -

          Oh, and of course it will remove pressure from GC. With mostly off-heap memtables we may then approach a position where VM space does not depend on size of dataset.

          Show
          benedict Benedict added a comment - Oh, and of course it will remove pressure from GC. With mostly off-heap memtables we may then approach a position where VM space does not depend on size of dataset.
          Hide
          benedict Benedict added a comment -

          One possible variant of this might be if, say, the key is <= 96bit itself, we could hash the cfId + file generation, and use the top bit to indicate which type of hash we have. So 127-bit murmur3, or, say 96-bits of key + 31bit cfid/generation hash

          Since a lot of partition keys are less than that, it seems a sensible tradeoff. Also, if using murmur hash for identity, we could use a different hash on the key to select the region we map to, to increase the spread and reduce the likelihood of collisions. With, say, a 2Gb key cache we'd then have 140bit spread, or 1.4E42 values. So roughly a 1 in 1E30 chance of collision. So 99.999999999999999999999999999999% accurate.

          Show
          benedict Benedict added a comment - One possible variant of this might be if, say, the key is <= 96bit itself, we could hash the cfId + file generation, and use the top bit to indicate which type of hash we have. So 127-bit murmur3, or, say 96-bits of key + 31bit cfid/generation hash Since a lot of partition keys are less than that, it seems a sensible tradeoff. Also, if using murmur hash for identity, we could use a different hash on the key to select the region we map to, to increase the spread and reduce the likelihood of collisions. With, say, a 2Gb key cache we'd then have 140bit spread, or 1.4E42 values. So roughly a 1 in 1E30 chance of collision. So 99.999999999999999999999999999999% accurate.
          Hide
          jbellis Jonathan Ellis added a comment -

          Good idea!

          Show
          jbellis Jonathan Ellis added a comment - Good idea!
          Hide
          benedict Benedict added a comment -

          So, I now think this is probably a bit more involved than I'd hoped, due to the IndexedEntry used for making wide row slices faster. I think it could benefit from some file changes, so probably best to target for 3.0

          Show
          benedict Benedict added a comment - So, I now think this is probably a bit more involved than I'd hoped, due to the IndexedEntry used for making wide row slices faster. I think it could benefit from some file changes, so probably best to target for 3.0
          Hide
          benedict Benedict added a comment -

          Closing as the new sstable format most likely makes this unnecessary by eliminating the need for a separate key cache, although we may want to revisit this at some point afterwards since a separate cache could still be beneficial by improving memory occupancy rate, so closing as later instead of duplicate.

          Show
          benedict Benedict added a comment - Closing as the new sstable format most likely makes this unnecessary by eliminating the need for a separate key cache, although we may want to revisit this at some point afterwards since a separate cache could still be beneficial by improving memory occupancy rate, so closing as later instead of duplicate.

            People

            • Assignee:
              Unassigned
              Reporter:
              benedict Benedict
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development