Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-20100

Query construction is broken for SAI indexes on reversed types with fixed-length encodings

    XMLWordPrintableJSON

Details

    Description

      SAI indexes values in byte-comparable form, both in the in-memory trie that sits alongside the Memtable, and in the on-disk SSTable-adjacent indexes. In most cases, this means literally using asComparableBytes() from the type of the indexed column. There are, however, a few types that use a custom byte-comparable form, namely inet, bigint, varint, and decimal, to make sure we're dealing with a fixed-length piece of data for the numeric (balanced tree) index.

      If we index one of these types as a reversed clustering key, however, we don't write terms as reversed comparable bytes, and this breaks some assumptions during query construction and post-filtering, where we generally assume that asComparableBytes() will reverse terms before they are indexed. We can make a short-term fix here without changing anything about the on-disk format by making sure we interpret these special types as being non-reversed (i.e. through the lens of their base types).

      In the longer term, it might make sense to standardize on indexing everything in a non-reversed fashion in the index itself, although this might push some complexity into post-filtering, where we are going to have to filter data coming out of the normal read path anyway.

      Attachments

        1. ci_summary.html
          111 kB
          Caleb Rackliffe
        2. ci_summary-1.html
          106 kB
          Caleb Rackliffe

        Issue Links

          Activity

            People

              maedhroz Caleb Rackliffe
              maedhroz Caleb Rackliffe
              Caleb Rackliffe
              David Capwell
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h