Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4034

Implement a MySQL InnoDB adapter

    XMLWordPrintableJSON

Details

    Description

      Calcite’s InnoDB adapter allows you to query the data based on InnoDB data files directy, data files are also known as .ibd files, this adapter is different from JDBC adapter which requires a MySQL server to serve response. With .ibd files and the corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL server", it accepts SQL query and attempts to compile the query based on InnoDB file accessing APIs provided by [innodb-java-reader|https://github.com/alibaba/innodb-java-reader] (I am the author of the library, which is well-tested and production-ready), it will exploit projecting, filtering and sorting directly in InnoDB data file where possible. What’s more, with DDLs, the adapter is "index aware", it leverages rules to choose the right index to scan, for example, using primary key or secondary keys to look up data, then it tries to push down some conditions into storage engine. Also, the adapter leaves option to provide hint as well, so that user can indicate the optimizer to force use one specific index.

      The InnoDB adapter can,

      1) Full scan a table.

      2) Scan by index such as primary key or secondary key, single column or composite keys.

      3) Push down filtering condition to storage engine, support both point query and range query.

      4) Push down projection.

      5) Push down ordering if it matches the natural collation of an index.

      6) Support almost all the commonly used data types.

      7) Work as a simple MySQL server being able to read .ibd data files directly through file system, this can be a tool to query or dump table data by offloading from MySQL process under some conditions.

      Before I describe the adapter and its design, I was wondering if it is appropriate to deliver the work by contributing the codebase back to Calcite, or if it would be better to stay in another project, probably being referred from Calcite adapter link. Looking forward to Calcite developer's advice.

       

      Below is the first version of the usage.

      ----------

      [MySQL](https://www.mysql.com/) is the most popular Open Source SQL database management system, is developed, distributed, and supported by Oracle Corporation. InnoDB is a general-purpose storage engine that balances high reliability and high performance in MySQL, since 5.6 InnoDB has become the default MySQL storage engine.

      Calcite’s InnoDB adapter allows you to query the data based on InnoDB data files directy, data files are also known as .ibd files. It leverages [innodb-java-reader](https://github.com/alibaba/innodb-java-reader). This adapter is different from JDBC adapter which maps a schema in a JDBC data source and requires a MySQL server to serve response. With .ibd files and the corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL server", it accepts SQL query and attempts to compile the query based on InnoDB file accessing APIs provided by innodb-java-reader, it will exploit projecting, filtering and sorting directly in InnoDB data file where possible.

      What’s more, with DDLs, the adapter is "index aware", it leverages rules to choose the right index to scan, for example, using primary key or secondary keys to look up data, then it tries to push down some conditions into storage engine. Also, the adapter leaves option to provide hint as well, so that user can indicate the optimizer to force use one specific index.

      A basic example of a model file is given below, this schema reads from a MySQL "scott" database:

      {
        "version": "1.0",
        "defaultSchema": "scott",
        "schemas": [
          {
            "name": "scott",
            "type": "custom",
            "factory": "org.apache.calcite.adapter.innodb.InnodbSchemaFactory",
            "operand": {
              "sqlFilePath": [ "/path/scott.sql" ],
              "ibdDataFileBasePath": "/usr/local/mysql/data/scott"
            }
          }
        ]
      }
      

      `sqlFilePath` is a list of DDL files, you can generate table definitions by executing `mysqldump -d -u<username> -p<password> -h <hostname> <dbname>` in command-line.

      The file content of `/path/scott.sql` is given below:

      CREATE TABLE `DEPT`(
      `DEPTNO` TINYINT NOT NULL,
      `DNAME` VARCHAR(50) NOT NULL,
      `LOC` VARCHAR(20),
      UNIQUE KEY `DEPT_PK` (`DEPTNO`)
      )ENGINE=InnoDB DEFAULT CHARSET=latin1;
      
      CREATE TABLE `EMP`(
      `EMPNO` INT(11) NOT NULL,
      `ENAME` VARCHAR(100) NOT NULL,
      `JOB` VARCHAR(15) NOT NULL,
      `AGE` SMALLINT,
      `MGR` BIGINT,
      `HIREDATE` DATE,
      `SAL` DECIMAL(8,2) NOT NULL,
      `COMM` DECIMAL(6,2),
      `DEPTNO` TINYINT,
      `EMAIL` VARCHAR(100) DEFAULT NULL,
      `CREATE_DATETIME` DATETIME,
      `CREATE_TIME` TIME,
      `UPSERT_TIME` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`EMPNO`),
      KEY `ENAME_KEY` (`ENAME`),
      KEY `HIREDATE_KEY` (`HIREDATE`),
      KEY `CREATE_DATETIME_JOB_KEY` (`CREATE_DATETIME`, `JOB`),
      KEY `CREATE_TIME_KEY` (`CREATE_TIME`),
      KEY `UPSERT_TIME_KEY` (`UPSERT_TIME`),
      KEY `DEPTNO_JOB_KEY` (`DEPTNO`, `JOB`),
      KEY `DEPTNO_SAL_COMM_KEY` (`DEPTNO`, `SAL`, `COMM`),
      KEY `DEPTNO_MGR_KEY` (`DEPTNO`, `MGR`),
      KEY `AGE_KEY` (`AGE`)
      )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
      

      ibdDataFileBasePath is the parent file path of .ibd files.

      Assuming the model file is stored as `model.json`, you can connect to InnoDB data file to performa query via [sqlline](https://github.com/julianhyde/sqlline) as follows:

      sqlline> !connect jdbc:calcite:model=model.json admin admin

      We can issue a simple query to query all employees by writing standard SQL:

      sqlline> select empno,ename,job,age,mgr from "EMP";
      +-------+--------+-----------+-----+------+
      | EMPNO | ENAME  |    JOB    | AGE | MGR  |
      +-------+--------+-----------+-----+------+
      | 7369  | SMITH  | CLERK     | 30  | 7902 |
      | 7499  | ALLEN  | SALESMAN  | 24  | 7698 |
      | 7521  | WARD   | SALESMAN  | 41  | 7698 |
      | 7566  | JONES  | MANAGER   | 28  | 7839 |
      | 7654  | MARTIN | SALESMAN  | 27  | 7698 |
      | 7698  | BLAKE  | MANAGER   | 38  | 7839 |
      | 7782  | CLARK  | MANAGER   | 32  | 7839 |
      | 7788  | SCOTT  | ANALYST   | 45  | 7566 |
      | 7839  | KING   | PRESIDENT | 22  | null |
      | 7844  | TURNER | SALESMAN  | 54  | 7698 |
      | 7876  | ADAMS  | CLERK     | 35  | 7788 |
      | 7900  | JAMES  | CLERK     | 40  | 7698 |
      | 7902  | FORD   | ANALYST   | 28  | 7566 |
      | 7934  | MILLER | CLERK     | 32  | 7782 |
      +-------+--------+-----------+-----+------+
      

      While executing this query, the InnoDB adapter scans the InnoDB data file `EMP.ibd` using primary key, also known as clustering B+ tree index in MySQL, and is able to
      push down projection to underlying storage engine. Projection can reduce the size of data fetched from the storage engine.

      We can look up one employee by filtering. The InnoDB adapter retrieves all indexes through DDL file provided in `model.json`.

      sqlline> select empno,ename,job,age,mgr from "EMP" where empno = 7782;
      +-------+-------+---------+-----+------+
      | EMPNO | ENAME |   JOB   | AGE | MGR  |
      +-------+-------+---------+-----+------+
      | 7782  | CLARK | MANAGER | 32  | 7839 |
      +-------+-------+---------+-----+------+
      

      The InnoDB adapter is able to recognize that `empno` is the primary key and do a point query lookup by using clustering index instead of full table scan.

      We can do range query on primary key as well.

      sqlline> select empno,ename,job,age,mgr from "EMP" where empno > 7782 and empno < 7900;

      Note that such query with acceptable range is usually efficient in MySQL with InnoDB storage engine, because for clustering B+ tree index, records close in index are close in data file, which is good for scanning.

      We can look up employee by secondary key. For example, the filtering condition will be on a `VARCHAR` field `ename`.

      sqlline> select empno,ename,job,age,mgr from "EMP" where ename = 'smith';
      +-------+-------+-------+-----+------+
      | EMPNO | ENAME |  JOB  | AGE | MGR  |
      +-------+-------+-------+-----+------+
      | 7369  | SMITH | CLERK | 30  | 7902 |
      +-------+-------+-------+-----+------+
      

      The InnoDB adapter works well on almost all the commonly used data types in MySQL, for more information on supported data types, please refer to [innodb-java-reader](https://github.com/alibaba/innodb-java-reader#3-features).

      We can query by composite key. For example, given secondary index of `DEPTNO_MGR_KEY`.

      sqlline> select empno,ename,job,age,mgr from "EMP" where deptno = 20 and mgr = 7566;
      +-------+-------+---------+-----+------+
      | EMPNO | ENAME |   JOB   | AGE | MGR  |
      +-------+-------+---------+-----+------+
      | 7788  | SCOTT | ANALYST | 45  | 7566 |
      | 7902  | FORD  | ANALYST | 28  | 7566 |
      +-------+-------+---------+-----+------+
      

      The InnoDB adapter will leverage the matched key `DEPTNO_MGR_KEY` to push down filtering condition of `deptno = 20 and mgr = 7566`.

      In some cases, only part of the conditions can be pushed down since there is a limitation in the underlying storage engine API, leaving unpushed remainder conditions in the rest of the plan. Given the below SQL, only `deptno = 20` is pushed down.

      select empno,ename,job,age,mgr from "EMP" where deptno = 20 and upsert_time > '2018-01-01 00:00:00';
      

      `innodb-java-reader` only supports range query with lower and upper bound using an index, not fully `Index Condition Pushdown (ICP)`. The storage engine returns a range of rows and Calcite will evaluates the rest of `WHERE` condition from the rows fetched.

      For the below SQL, there are multiple indexes satisfying the left-prefix index rule, the possible indexes are `DEPTNO_JOB_KEY`, `DEPTNO_SAL_COMM_KEY` and `DEPTNO_MGR_KEY`, the Innod adapter will choose one of them according to the ordinal defined in DDL, only `deptno = 20` condition is pushed down, leaving the rest of `WHERE` condition handled by Calcite built-in execution engine.

      sqlline> select empno,deptno,sal from "EMP" where deptno = 20 and sal > 2000;
      +-------+--------+---------+
      | EMPNO | DEPTNO |   SAL   |
      +-------+--------+---------+
      | 7788  | 20     | 3000.00 |
      | 7902  | 20     | 3000.00 |
      | 7566  | 20     | 2975.00 |
      +-------+--------+---------+
      

      Accessing rows through secondary key requires scanning by secondary index and retrieving records back to clustering index in InnoDB, for a "big" scan, that would introduce many random I/O operations, so performance is usually not good enough. Note that the query above can be more performant by using `EPTNO_SAL_COMM_KEY` index, because covering index does not need to retrieve back to clustering index. We can force using `DEPTNO_SAL_COMM_KEY` index by hint as below.

      sqlline> select empno,ename,job,age,sal from "emp"/*+ index(DEPTNO_SAL_COMM_KEY) */ where deptno = 20 and sal > 2000;
      

      Hint can be configured in `SqlToRelConverter`, to enable hint, you should register `index` HintStrategy on `TableScan` in `SqlToRelConverter.ConfigBuilder`. Index hint takes effect on the base `TableScan` relational node, if there are conditions matching the index, index condition can be pushed down as well. For the below SQL, although none of the indexes can be used, but by leveraging covering index, the performance is better than full table scan, we can force to use `DEPTNO_MGR_KEY` to scan in secondary index.

      sqlline> select empno,mgr from "EMP"/*+ index(DEPTNO_MGR_KEY) */ where mgr = 7839;
      

      Ordering can be pushed down if it matches the natural collation of the index used.

      sqlline> select deptno,ename,hiredate from "EMP" where hiredate < '2020-01-01' order by hiredate desc;
      +--------+--------+------------+
      | DEPTNO | ENAME  |  HIREDATE  |
      +--------+--------+------------+
      | 20     | ADAMS  | 1987-05-23 |
      | 20     | SCOTT  | 1987-04-19 |
      | 10     | MILLER | 1982-01-23 |
      | 20     | FORD   | 1981-12-03 |
      | 30     | JAMES  | 1981-12-03 |
      | 10     | KING   | 1981-11-17 |
      | 30     | MARTIN | 1981-09-28 |
      | 30     | TURNER | 1981-09-08 |
      | 10     | CLARK  | 1981-06-09 |
      | 30     | WARD   | 1981-02-22 |
      | 30     | ALLEN  | 1981-02-20 |
      | 20     | JONES  | 1981-02-04 |
      | 30     | BLAKE  | 1981-01-05 |
      | 20     | SMITH  | 1980-12-17 |
      +--------+--------+------------+
      

      Limitations

      `innodb-java-reader` has some prerequisites for `.ibd` files, please refer to [Prerequisites](https://github.com/alibaba/innodb-java-reader#2-prerequisites).

      You can think of the adapter as a simple MySQL server, with the ability to query, dump data by offloading from MySQL process under some conditions. If pages are not flushed from InnoDB Buffer Pool to disk, then the result may be inconsistent (the LSN in `.ibd` file might smaller than in-memory pages). InnoDB leverages write ahead log in terms of performance, so there is no command available to flush all dirty pages. Only internal mechanism manages when and where to persist pages to disk, like Page Cleaner thread, adaptive flushing, etc.

      Currently the InnoDB adapter does not aware row count and cardinality of a `.ibd` data file, so it will only rely on simple rules to perform optimization, once underlying storage engine could provide such metrics and metadata, this can be integrated in Calcite by leveraging cost based optimization in the future.

       

      Attachments

        Issue Links

          Activity

            People

              julianhyde Julian Hyde
              neoremind neoremind
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m