Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1253

Create Elasticsearch adapter for Calcite

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: None
    • Labels:
      None

      Description

      This JIRA ticket is created to propose the addition of the new Elasticsearch adapter for Calcite.

      The adapter exposes a SQL interface to elasticsearch which includes projection, filtering, sort, fetch, offset queries.

      I have provided a couple of PRs, the first one is for the elasticsearch adapter & the second one is for enhancing the Vagrantfile for loading test data required for running the Junits.

      Currently, the support for aggregates is not provided. The work on it is in progress.

      Kindly evaluate my contribution & please consider it as an addition to the calcite project.

      Following PRs are provided:
      https://github.com/apache/calcite/pull/236
      https://github.com/vlsi/calcite-test-dataset/pull/10

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Thanks very much for this contribution – it looks very exciting! I took a quick look am glad to see you have all the right pieces, including tests, and I noticed you have extended Vladimir Sitnikov's test dataset. I haven't had time to review in detail, but I will. Michael Mior, do you also have time to review?

        Subhobrata Dey, As we make this a full supported part of Calcite, we will also need some documentation. Obviously it's not foremost in your mind right now, but it will help people find and use the new adapter. Michael Mior did a nice job when he contributed the Cassandra adapter: go to http://calcite.apache.org/docs/adapter.html and click on "Cassandra". I think we should aim for something similar for Elasticsearch.

        Show
        julianhyde Julian Hyde added a comment - Thanks very much for this contribution – it looks very exciting! I took a quick look am glad to see you have all the right pieces, including tests, and I noticed you have extended Vladimir Sitnikov 's test dataset . I haven't had time to review in detail, but I will. Michael Mior , do you also have time to review? Subhobrata Dey , As we make this a full supported part of Calcite, we will also need some documentation. Obviously it's not foremost in your mind right now, but it will help people find and use the new adapter. Michael Mior did a nice job when he contributed the Cassandra adapter: go to http://calcite.apache.org/docs/adapter.html and click on "Cassandra". I think we should aim for something similar for Elasticsearch.
        Hide
        kkrugler Ken Krugler added a comment -

        It would be interesting to review/compare with https://github.com/NLPchina/elasticsearch-sql, which seems like the current best alternative.

        Show
        kkrugler Ken Krugler added a comment - It would be interesting to review/compare with https://github.com/NLPchina/elasticsearch-sql , which seems like the current best alternative.
        Hide
        michaelmior Michael Mior added a comment -

        Checked out the PR for the VM. I think it would be nice to stick to Chef for provisioning, but what's there looks alright otherwise. Just testing building the VM now.

        Show
        michaelmior Michael Mior added a comment - Checked out the PR for the VM. I think it would be nice to stick to Chef for provisioning, but what's there looks alright otherwise. Just testing building the VM now.
        Hide
        michaelmior Michael Mior added a comment -

        Of course I meant Puppet instead of Chef. I'll attribute that to jet lag

        Show
        michaelmior Michael Mior added a comment - Of course I meant Puppet instead of Chef. I'll attribute that to jet lag
        Hide
        julianhyde Julian Hyde added a comment -

        Always imagine Puppet to be the Swedish Chef from the Muppets, in which case, they're the same thing.

        Show
        julianhyde Julian Hyde added a comment - Always imagine Puppet to be the Swedish Chef from the Muppets , in which case, they're the same thing.
        Hide
        julianhyde Julian Hyde added a comment -

        Indeed. Here's my quick assessment: NLPchina has tighter integration with ES (including being a plugin), the ability to query via HTTP, and a few extensions to SQL (matchQuery, GROUP BY range). Calcite probably has a fuller, more compliant implementation of SQL, a JDBC driver, the ability to combine ES with other data sources in the same query, and the ability to use different execution engines (e.g. Drill).

        I'd love to hear what other people think. We will definitely consider adding some of the SQL extensions to Calcite, especially those that relate to ES's unique features (e.g. search).

        Show
        julianhyde Julian Hyde added a comment - Indeed. Here's my quick assessment: NLPchina has tighter integration with ES (including being a plugin), the ability to query via HTTP, and a few extensions to SQL ( matchQuery , GROUP BY range ). Calcite probably has a fuller, more compliant implementation of SQL, a JDBC driver, the ability to combine ES with other data sources in the same query, and the ability to use different execution engines (e.g. Drill). I'd love to hear what other people think. We will definitely consider adding some of the SQL extensions to Calcite, especially those that relate to ES's unique features (e.g. search).
        Hide
        julianhyde Julian Hyde added a comment -

        I completed my review. I was pleased to see that tests passed, and javadoc generation was fine. Well done.

        Here are my review comments:

        1. Change type of ElasticsearchRel.Implementor.list from List<Pair<String, String>> to List<String>. The left field only exists because Mongo has two query modes: find and aggregate.
        2. Simplify ElasticsearchToEnumerableConverter.implement, removing unused variables
        3. Convert .json to standard json format (double-quoted identifiers and strings, see e.g. druid-wiki-model.json)
        4. Fix bad grammar resulting from search replace of "a Mongo" to "a Elasticsearch", e.g. in "Enumerator that reads from a Elasticsearch type" should be "an Elasticsearch" not "a Elasticsearch". Review all comments for grammar.
        5. There are not tests for each data type (e.g. integer not null, date, timestamp, boolean) so I expect there are bugs.
        6. In adapter spec, show how to write a query against a raw table (i.e. using the _map column). Your example pre-supposes that people have built views against the raw table, which is not valid.
        7. In elasticsearch_adapter.md, remove trailing spaces
        8. Fix indentation in java code, especially for long argument and parameter lists. "," should be at the end of a line, not the start.
        9. ElasticsearchRules.maybeQuote not used; remove other unused methods and code
        10. There are a few references to Bug.CALCITE_194_FIXED; makes no sense because this is bug on the Mongo adapter
        11. I may be wrong, but it looks as if code has been copied the Mongo adapter that is not needed for tests to pass. Remove the code, or better, add tests.
        12. I think there's a bug if I put a scalar expression into a project or filter that the adapter cannot handle during code generation, e.g. "select x + 1 from t", "select * from t where x + 1 > 10". For each of these, can you add tests if it works, or log a bug?
        13. Can you briefly list what should be the next work? What are the bugs, missing features? They don't need to be fixed now, but I would like to know where the holes are.

        I don't think any of these are major, in fact most are cosmetic and be fixed quickly. If these are done in the next day or two, and we are able to get the necessary changes committed to change calcite-test-dataset, I think we can get the adapter into 1.8.

        Show
        julianhyde Julian Hyde added a comment - I completed my review. I was pleased to see that tests passed, and javadoc generation was fine. Well done. Here are my review comments: Change type of ElasticsearchRel.Implementor.list from List<Pair<String, String>> to List<String> . The left field only exists because Mongo has two query modes: find and aggregate. Simplify ElasticsearchToEnumerableConverter.implement, removing unused variables Convert .json to standard json format (double-quoted identifiers and strings, see e.g. druid-wiki-model.json) Fix bad grammar resulting from search replace of "a Mongo" to "a Elasticsearch", e.g. in "Enumerator that reads from a Elasticsearch type" should be "an Elasticsearch" not "a Elasticsearch". Review all comments for grammar. There are not tests for each data type (e.g. integer not null, date, timestamp, boolean) so I expect there are bugs. In adapter spec, show how to write a query against a raw table (i.e. using the _map column). Your example pre-supposes that people have built views against the raw table, which is not valid. In elasticsearch_adapter.md, remove trailing spaces Fix indentation in java code, especially for long argument and parameter lists. "," should be at the end of a line, not the start. ElasticsearchRules.maybeQuote not used; remove other unused methods and code There are a few references to Bug.CALCITE_194_FIXED ; makes no sense because this is bug on the Mongo adapter I may be wrong, but it looks as if code has been copied the Mongo adapter that is not needed for tests to pass. Remove the code, or better, add tests. I think there's a bug if I put a scalar expression into a project or filter that the adapter cannot handle during code generation, e.g. "select x + 1 from t", "select * from t where x + 1 > 10". For each of these, can you add tests if it works, or log a bug? Can you briefly list what should be the next work? What are the bugs, missing features? They don't need to be fixed now, but I would like to know where the holes are. I don't think any of these are major, in fact most are cosmetic and be fixed quickly. If these are done in the next day or two, and we are able to get the necessary changes committed to change calcite-test-dataset, I think we can get the adapter into 1.8.
        Hide
        sbcd90 Subhobrata Dey added a comment -

        Hello Julian Hyde,

        I have fixed almost all the issues. Please review & accept the contribution.

        1. Change type of ElasticsearchRel.Implementor.list from List<Pair<String, String>> to List<String>. The left field only exists because Mongo has two query modes: find and aggregate.

        Issue fixed.

        2. Simplify ElasticsearchToEnumerableConverter.implement, removing unused variables

        Issue fixed.

        3. Convert .json to standard json format (double-quoted identifiers and strings, see e.g. druid-wiki-model.json)

        Issue fixed.

        4. Fix bad grammar resulting from search replace of "a Mongo" to "a Elasticsearch", e.g. in "Enumerator that reads from a Elasticsearch type" should be "an Elasticsearch" not "a Elasticsearch". Review all comments for grammar.

        Issue fixed.

        5. There are not tests for each data type (e.g. integer not null, date, timestamp, boolean) so I expect there are bugs.

        Kept it as part of next tasks for enhancing the tests with foodmart dataset.

        6. In adapter spec, show how to write a query against a raw table (i.e. using the _map column). Your example pre-supposes that people have built views against the raw table, which is not valid.

        Issue fixed.

        7. In elasticsearch_adapter.md, remove trailing spaces

        Issue fixed.

        8. Fix indentation in java code, especially for long argument and parameter lists. "," should be at the end of a line, not the start.

        Issue fixed.

        9. ElasticsearchRules.maybeQuote not used; remove other unused methods and code

        Issue fixed.

        10. There are a few references to Bug.CALCITE_194_FIXED; makes no sense because this is bug on the Mongo adapter

        Issue fixed.

        11. I may be wrong, but it looks as if code has been copied the Mongo adapter that is not needed for tests to pass. Remove the code, or better, add tests.

        Issue fixed by removing code.

        12. I think there's a bug if I put a scalar expression into a project or filter that the adapter cannot handle during code generation, e.g. "select x + 1 from t", "select * from t where x + 1 > 10". For each of these, can you add tests if it works, or log a bug?

        Will log a bug.

        13. Can you briefly list what should be the next work? What are the bugs, missing features? They don't need to be fixed now, but I would like to know where the holes are.

        • Enhance tests with foodmart dataset.
        • Support aggregation operations.
        Show
        sbcd90 Subhobrata Dey added a comment - Hello Julian Hyde , I have fixed almost all the issues. Please review & accept the contribution. 1. Change type of ElasticsearchRel.Implementor.list from List<Pair<String, String>> to List<String>. The left field only exists because Mongo has two query modes: find and aggregate. Issue fixed. 2. Simplify ElasticsearchToEnumerableConverter.implement, removing unused variables Issue fixed. 3. Convert .json to standard json format (double-quoted identifiers and strings, see e.g. druid-wiki-model.json) Issue fixed. 4. Fix bad grammar resulting from search replace of "a Mongo" to "a Elasticsearch", e.g. in "Enumerator that reads from a Elasticsearch type" should be "an Elasticsearch" not "a Elasticsearch". Review all comments for grammar. Issue fixed. 5. There are not tests for each data type (e.g. integer not null, date, timestamp, boolean) so I expect there are bugs. Kept it as part of next tasks for enhancing the tests with foodmart dataset. 6. In adapter spec, show how to write a query against a raw table (i.e. using the _map column). Your example pre-supposes that people have built views against the raw table, which is not valid. Issue fixed. 7. In elasticsearch_adapter.md, remove trailing spaces Issue fixed. 8. Fix indentation in java code, especially for long argument and parameter lists. "," should be at the end of a line, not the start. Issue fixed. 9. ElasticsearchRules.maybeQuote not used; remove other unused methods and code Issue fixed. 10. There are a few references to Bug.CALCITE_194_FIXED; makes no sense because this is bug on the Mongo adapter Issue fixed. 11. I may be wrong, but it looks as if code has been copied the Mongo adapter that is not needed for tests to pass. Remove the code, or better, add tests. Issue fixed by removing code. 12. I think there's a bug if I put a scalar expression into a project or filter that the adapter cannot handle during code generation, e.g. "select x + 1 from t", "select * from t where x + 1 > 10". For each of these, can you add tests if it works, or log a bug? Will log a bug. 13. Can you briefly list what should be the next work? What are the bugs, missing features? They don't need to be fixed now, but I would like to know where the holes are. Enhance tests with foodmart dataset. Support aggregation operations.
        Hide
        julianhyde Julian Hyde added a comment -

        I think this is ready to commit, so it will be in 1.8. The code and documentation look good, and the tests all pass. I propose to commit it now, even though the changes to calcite-test-dataset have not yet been accepted.

        Do you agree, Michael Mior and Vladimir Sitnikov? If so, I'll push.

        Show
        julianhyde Julian Hyde added a comment - I think this is ready to commit, so it will be in 1.8. The code and documentation look good, and the tests all pass. I propose to commit it now, even though the changes to calcite-test-dataset have not yet been accepted. Do you agree, Michael Mior and Vladimir Sitnikov ? If so, I'll push.
        Hide
        michaelmior Michael Mior added a comment -

        AFAIK, the PR for test dataset is ready to go. (Any outstanding issues Vladimir Sitnikov?) I think it would be nice to have that merged first if possible so integration tests are ready to be run but I'm +1 on committing either way.

        Show
        michaelmior Michael Mior added a comment - AFAIK, the PR for test dataset is ready to go. (Any outstanding issues Vladimir Sitnikov ?) I think it would be nice to have that merged first if possible so integration tests are ready to be run but I'm +1 on committing either way.
        Hide
        julianhyde Julian Hyde added a comment -

        I decided to pull the trigger. Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/f3caf13b. Thanks for the PR, Subhobrata Dey!

        Can you please log those 2 JIRA cases for follow-up work?

        Show
        julianhyde Julian Hyde added a comment - I decided to pull the trigger. Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/f3caf13b . Thanks for the PR, Subhobrata Dey ! Can you please log those 2 JIRA cases for follow-up work?
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.8.0 (2016-06-13).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.8.0 (2016-06-13).

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            sbcd90 Subhobrata Dey
          • Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development