Cassandra
  1. Cassandra
  2. CASSANDRA-4478

Make index_interval be measured in kb (instead of number of keys)

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Fix Version/s: 3.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently, index_interval is measured in number of keys: how may keys before adding an entry to the index summary. After CASSANDRA-2319, each index entry also contains the columns index for the row, so index entry can be a bit bigger and of differing sizes. Measuring in number of keys is thus sub-optimal and difficult to tune, since you might want a different setting depending of whether your rows are big or small, but the setting is global.

      So we should move to measuring the interval in bytes.

      1. 4478-incomplete.txt
        9 kB
        Sylvain Lebresne

        Activity

        Hide
        Sylvain Lebresne added a comment -

        I'll note that changing IndexSummary to consider a byte size instead of number of keys is relatively straightforward. I'm attaching an incomplete patch that does that part.

        However, one problem is that we currently use the index summary for different estimate of number of keys in the sstable. And in particular, we need to estimate the number of keys given a range of tokens, which means simply keeping the total number of keys in the sstable is not enough.

        The simplest/cheapest solution I can see for that problem would be to add to the IndexSummary a new int[] to keep how many key each sample covers (since it's not constant anymore). That does mean breaking the format of the serialized indexSummary however, but that may in turn be fine if we get this in 1.2 (since index summary aren't save before that). If someone feels like completing the attached patch with that idea, feel free to (I can find other ways to entertain myself).

        Show
        Sylvain Lebresne added a comment - I'll note that changing IndexSummary to consider a byte size instead of number of keys is relatively straightforward. I'm attaching an incomplete patch that does that part. However, one problem is that we currently use the index summary for different estimate of number of keys in the sstable. And in particular, we need to estimate the number of keys given a range of tokens, which means simply keeping the total number of keys in the sstable is not enough. The simplest/cheapest solution I can see for that problem would be to add to the IndexSummary a new int[] to keep how many key each sample covers (since it's not constant anymore). That does mean breaking the format of the serialized indexSummary however, but that may in turn be fine if we get this in 1.2 (since index summary aren't save before that). If someone feels like completing the attached patch with that idea, feel free to (I can find other ways to entertain myself).
        Hide
        Jonathan Ellis added a comment -

        What if instead we make index_interval be CQL3 rows instead of partitions?

        Show
        Jonathan Ellis added a comment - What if instead we make index_interval be CQL3 rows instead of partitions?
        Hide
        Sylvain Lebresne added a comment -

        What if instead we make index_interval be CQL3 rows instead of partitions?

        I'm not sure I see much benefit of that over measuring it in bytes. Namely:

        1. that doesn't make tuning easier. What the index_interval represent is how much of the index file you will need to read at maximum to find the indexed block you are looking for. So it does fell like to me that having this size in bytes is ideal. In particular, even if CQL3 rows vary less in size than internal ones, they are still not constant in size depending on the table.
        2. it will be more complicated/less efficient to implement in practice with the current code because the index summary is built from the index file. But the index file doesn't have enough information currently to count cql3 rows.
        3. a cql3 row count might be fairly meaningless for thrift users.
        4. currently we still have 2 nested level of indexing, the internal rows and inside that, the column index. They do are in the same file now, but they are not merged together. In that situation, I'm not really sure counting cql3 rows make any sense in fact (of course, we could merge the two level of indexing together, but that's not a small/simple patch while this ticket is more straightforward while still putting us in a situation this is probably good enough for a while).
        Show
        Sylvain Lebresne added a comment - What if instead we make index_interval be CQL3 rows instead of partitions? I'm not sure I see much benefit of that over measuring it in bytes. Namely: that doesn't make tuning easier. What the index_interval represent is how much of the index file you will need to read at maximum to find the indexed block you are looking for. So it does fell like to me that having this size in bytes is ideal . In particular, even if CQL3 rows vary less in size than internal ones, they are still not constant in size depending on the table. it will be more complicated/less efficient to implement in practice with the current code because the index summary is built from the index file. But the index file doesn't have enough information currently to count cql3 rows. a cql3 row count might be fairly meaningless for thrift users. currently we still have 2 nested level of indexing, the internal rows and inside that, the column index. They do are in the same file now, but they are not merged together. In that situation, I'm not really sure counting cql3 rows make any sense in fact (of course, we could merge the two level of indexing together, but that's not a small/simple patch while this ticket is more straightforward while still putting us in a situation this is probably good enough for a while).

          People

          • Assignee:
            Unassigned
            Reporter:
            Sylvain Lebresne
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development