Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1211

limit > 7 on the Cassandra Adapter switches to EnumerableLimit rule

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.7.0
    • Fix Version/s: 1.8.0
    • Component/s: cassandra
    • Labels:
      None

      Description

      I copied the testProject query because I was noticing a EnumerableLimit showing up on explains when a limit was higher than `1`. So I narrowed it down to 7, anything above 7 and the plan looks like this

      Plan after physical tweaks: EnumerableLimit(fetch=[8]): rowcount = 8.0, cumulative cost = {112.5 rows, 122.0 cpu, 0.0 io}, id = 2083
        CassandraToEnumerableConverter: rowcount = 15.0, cumulative cost = {104.5 rows, 114.0 cpu, 0.0 io}, id = 2081
          CassandraProject(tweet_id=[$2]): rowcount = 15.0, cumulative cost = {103.0 rows, 112.5 cpu, 0.0 io}, id = 2079
            CassandraFilter(condition=[=(CAST($0):CHAR(8) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", '!PUBLIC!')]): rowcount = 15.0, cumulative cost = {101.5 rows, 111.0 cpu, 0.0 io}, id = 2077
              CassandraTableScan(table=[[twissandra, userline]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1915
      

      anything 7 or below looks like this

      CassandraToEnumerableConverter: rowcount = 7.0, cumulative cost = {111.07282262603232 rows, 112.75 cpu, 0.0 io}, id = 2257
        CassandraProject(tweet_id=[$2]): rowcount = 7.0, cumulative cost = {110.37282262603232 rows, 112.05 cpu, 0.0 io}, id = 2255
          CassandraSort(fetch=[7]): rowcount = 7.0, cumulative cost = {109.67282262603231 rows, 111.35 cpu, 0.0 io}, id = 2253
            CassandraFilter(condition=[=(CAST($0):CHAR(8) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", '!PUBLIC!')]): rowcount = 15.0, cumulative cost = {101.5 rows, 111.0 cpu, 0.0 io}, id = 2251
              CassandraTableScan(table=[[twissandra, userline]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 2089
      

      If I wanted to change that would I alter the cost of the sort rule?

        Activity

        Hide
        michaelmior Michael Mior added a comment -

        Can you give the full queries that you're using in each case? It's entirely possible that the costs need to be tweaked. At this point they're more of a heuristic than an actual estimate.

        Show
        michaelmior Michael Mior added a comment - Can you give the full queries that you're using in each case? It's entirely possible that the costs need to be tweaked. At this point they're more of a heuristic than an actual estimate.
        Hide
        dispalt Dan Di Spaltro added a comment -

        Sure, respectively

        "select "tweet_id" from "userline" where "username" = '!PUBLIC!' limit 8
        

        and

        "select "tweet_id" from "userline" where "username" = '!PUBLIC!' limit 7
        
        Show
        dispalt Dan Di Spaltro added a comment - Sure, respectively "select " tweet_id " from " userline " where " username" = '!PUBLIC!' limit 8 and "select " tweet_id " from " userline " where " username" = '!PUBLIC!' limit 7
        Hide
        julianhyde Julian Hyde added a comment -

        It's worth looking at the various methods that override RelNode.computeSelfCost and call super.computeSelfCost().multiplyBy(). It is a useful pattern, but can be a blunt instrument sometimes.

        Also, consider re-visiting Sort.computeSelfCost in the case where there are 0 sort keys (i.e. it is a pure LIMIT or OFFSET-LIMIT query). The cost should not depend on the row count of the input.

        Show
        julianhyde Julian Hyde added a comment - It's worth looking at the various methods that override RelNode.computeSelfCost and call super.computeSelfCost().multiplyBy() . It is a useful pattern, but can be a blunt instrument sometimes. Also, consider re-visiting Sort.computeSelfCost in the case where there are 0 sort keys (i.e. it is a pure LIMIT or OFFSET-LIMIT query). The cost should not depend on the row count of the input.
        Hide
        michaelmior Michael Mior added a comment -

        Thanks for the suggestion Julian Hyde. Fixed in 62f9aba.

        Show
        michaelmior Michael Mior added a comment - Thanks for the suggestion Julian Hyde . Fixed in 62f9aba .
        Hide
        julianhyde Julian Hyde added a comment -

        Fixed in 1.8.0.

        Show
        julianhyde Julian Hyde added a comment - Fixed in 1.8.0.

          People

          • Assignee:
            michaelmior Michael Mior
            Reporter:
            dispalt Dan Di Spaltro
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development