Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5190

TPC-H testing of the Calcite framework

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.30.0
    • None
    • core, jdbc-adapter
    • None

    Description

      What our project is doing:
      1. Designed a time series database, made full use of the enumerable adapter, and made a horizontal comparison of benchmarks with other popular time databases
      2. Design catalog->database->table, and use the adapter function of jdbc to connect multiple databases such as Mysql/Es in the backend to solve the join problem of heterogeneous data (for example: select * from catalog1.database .table1 ,catalog2,database.table2 .. where t1.c1='xx' )

      But when we tested it, we found that, as far as Jdbc-Adapter is concerned, Calcite will disassemble the logical SQL into very fragmented query plan SQL, and hand it over to the underlying database instance to execute the specific physical SQL. Most of the SQL has no filter conditions to push down and the physical SQL looks very clumsy.

      So we did a version of the TPC-H test (an industry standard in the database field), taking SQL-19 as an example:

      SELECT sum(l_extendedprice* (1 - l_discount)) AS revenue
      FROM lineitem,
           part
      WHERE (p_partkey = l_partkey
             AND p_brand = 'Brand#52'
             AND p_container in ('SM CASE',
                                 'SM BOX',
                                 'SM PACK',
                                 'SM PKG')
             AND l_quantity >= 4
             AND l_quantity <= 4 + 10
             AND p_size BETWEEN 1 AND 5
             AND l_shipmode in ('AIR',
                                'AIR REG')
             AND l_shipinstruct = 'DELIVER IN PERSON')
        OR (p_partkey = l_partkey
            AND p_brand = 'Brand#11'
            AND p_container in ('MED BAG',
                                'MED BOX',
                                'MED PKG',
                                'MED PACK')
            AND l_quantity >= 18
            AND l_quantity <= 18 + 10
            AND p_size BETWEEN 1 AND 10
            AND l_shipmode in ('AIR',
                               'AIR REG')
            AND l_shipinstruct = 'DELIVER IN PERSON')
        OR (p_partkey = l_partkey
            AND p_brand = 'Brand#51'
            AND p_container in ('LG CASE',
                                'LG BOX',
                                'LG PACK',
                                'LG PKG')
            AND l_quantity >= 29
            AND l_quantity <= 29 + 10
            AND p_size BETWEEN 1 AND 15
            AND l_shipmode in ('AIR',
                               'AIR REG')
            AND l_shipinstruct = 'DELIVER IN PERSON');
      

      The physical execution plan is:

      [SELECT *
      FROM `tpch`.`lineitem`
      LIMIT 3]
      
      [SELECT `L_PARTKEY`, `L_QUANTITY`, `L_EXTENDEDPRICE`, `L_DISCOUNT`, `L_SHIPINSTRUCT`, `L_SHIPMODE`
      FROM `tpch`.`lineitem`]
      
      [SELECT `P_PARTKEY`, `P_BRAND`, `P_SIZE`, `P_CONTAINER`
      FROM `tpch`.`part`]
      

      this looks incredible...

       

      My question is: Has Calcite ever tested something like TPC-H.

      In theory, for a specific database instance, the query speed after using calcite will not drop much performance compared to the original database, so many users are happy to use Calcite to solve the problem of data islands. …

      Attachments

        Activity

          People

            Unassigned Unassigned
            itxiangkui itxiangkui
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: