Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9843

Augment or replace partition index with adaptive range filters

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Adaptive range filters are, in principle, bloom filters for range queries. They provide a space-efficient way to avoid scanning a partition when we can tell that we do not contain any data for the range requested. Like BF, they can return false positives but not false negatives.

      The implementation is of course totally different from BF. ARF is a tree where each leaf of the tree is a range of data and a bit, either on or off, denoting whether we have some data in that range.

      ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -

          Rather than just adding an ARF per partition (the way we used to have a BF – the difference is that BF is not useful for scans but this would be), we may be able to adapt this further by moving our index into the ARF. Instead of just a bit indicating yes or no we could have the offset for the start of each range [that we do have data for] in the leaf.

          (The "adaptive" in ARF means you can tune it to index hot parts of the data range in greater detail, without increasing the total memory used, at the cost of less detail for the cold ranges. We could do this in Cassandra as well, writing updated ARF to a new file. This could reduce the memory problems of pulling the indexes for very large partitions into memory. However, the paper describes very good results even without adaptation, so this is not required for proof of concept.)

          Show
          jbellis Jonathan Ellis added a comment - Rather than just adding an ARF per partition (the way we used to have a BF – the difference is that BF is not useful for scans but this would be), we may be able to adapt this further by moving our index into the ARF. Instead of just a bit indicating yes or no we could have the offset for the start of each range [that we do have data for] in the leaf. (The "adaptive" in ARF means you can tune it to index hot parts of the data range in greater detail, without increasing the total memory used, at the cost of less detail for the cold ranges. We could do this in Cassandra as well, writing updated ARF to a new file. This could reduce the memory problems of pulling the indexes for very large partitions into memory. However, the paper describes very good results even without adaptation, so this is not required for proof of concept.)
          Hide
          danchia Daniel Chia added a comment -

          T Jake Luciani Is there anything I can do to help with creating a PoC?

          Show
          danchia Daniel Chia added a comment - T Jake Luciani Is there anything I can do to help with creating a PoC?
          Hide
          jbellis Jonathan Ellis added a comment -

          Daniel Chia, the first thing we'd need is an ARF implementation that supports Cell.

          Show
          jbellis Jonathan Ellis added a comment - Daniel Chia , the first thing we'd need is an ARF implementation that supports Cell.

            People

            • Assignee:
              Unassigned
              Reporter:
              jbellis Jonathan Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development