Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-20100

Query construction is broken for SAI indexes on reversed types with fixed-length encodings

    XMLWordPrintableJSON

Details

    Description

      SAI indexes values in byte-comparable form, both in the in-memory trie that sits alongside the Memtable, and in the on-disk SSTable-adjacent indexes. In most cases, this means literally using asComparableBytes() from the type of the indexed column. There are, however, a few types that use a custom byte-comparable form, namely inet, bigint, varint, and decimal, to make sure we're dealing with a fixed-length piece of data for the numeric (balanced tree) index.

      If we index one of these types as a reversed clustering key, however, we don't write terms as reversed comparable bytes, and this breaks some assumptions during query construction and post-filtering, where we generally assume that asComparableBytes() will reverse terms before they are indexed. We can make a short-term fix here without changing anything about the on-disk format by making sure we interpret these special types as being non-reversed (i.e. through the lens of their base types).

      In the longer term, it might make sense to standardize on indexing everything in a non-reversed fashion in the index itself, although this might push some complexity into post-filtering, where we are going to have to filter data coming out of the normal read path anyway.

      Attachments

        1. ci_summary.html
          111 kB
          Caleb Rackliffe
        2. ci_summary-1.html
          106 kB
          Caleb Rackliffe

        Activity

          People

            maedhroz Caleb Rackliffe
            maedhroz Caleb Rackliffe
            Caleb Rackliffe
            David Capwell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h