Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1959

Reduce the amount of Metadata and table name calls in Druid

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.14.0
    • Component/s: druid
    • Labels:
      None

      Description

      Currently in the Druid adapter, when a model definition file is not provided, query times can be quite slow due to excessive metadata and table names calls. Initial investigation reveals that a simple query like

      SELECT * from table
      

      produces at least 30 http calls for fetching the table names, and another 30 calls for meta data. This slows down the query considerably, even when the individual calls themselves are relatively quick.

      The source of the issue seems to be DruidSchema#getTableMap, as this method dispatches calls to both DruidConnectionImpl#metadata and DruidConnectionImpl#tableNames. In addition, DruidTable#create makes another call to DruidConnectionImpl#metadata even if it's given the populated fields as arguments.

      getTableMap is called fairly often indirectly from many places, including the validator, so as the query becomes larger, the problems gets a lot worse. Ideally, one would provide a model definition to prevent this, but doing so is not always possible/preferred for users.

        Activity

        Hide
        michaelmior Michael Mior added a comment -

        Resolved in release 1.14.0 (2017-10-01)

        Show
        michaelmior Michael Mior added a comment - Resolved in release 1.14.0 (2017-10-01)
        Hide
        zhumayun Zain Humayun added a comment -

        Jesus Camacho Rodriguez Thanks. You'll probably also want to close my PR.

        Show
        zhumayun Zain Humayun added a comment - Jesus Camacho Rodriguez Thanks. You'll probably also want to close my PR.
        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Fixup in http://git-wip-us.apache.org/repos/asf/calcite/commit/b0ae502 . Thanks Zain Humayun , and Julian Hyde for the feedback!
        Hide
        zhumayun Zain Humayun added a comment - - edited

        I pushed to my original PR branch.
        Edit: I created a follow up PR addressing your comments on assert and the java docs.
        https://github.com/apache/calcite/pull/529

        Show
        zhumayun Zain Humayun added a comment - - edited I pushed to my original PR branch. Edit: I created a follow up PR addressing your comments on assert and the java docs. https://github.com/apache/calcite/pull/529
        Hide
        julianhyde Julian Hyde added a comment -

        Just add a commit to the existing PR branch, and a comment in this case.

        Show
        julianhyde Julian Hyde added a comment - Just add a commit to the existing PR branch, and a comment in this case.
        Hide
        julianhyde Julian Hyde added a comment -

        Just add a commit to the existing PR branch, and a comment in this case.

        Show
        julianhyde Julian Hyde added a comment - Just add a commit to the existing PR branch, and a comment in this case.
        Hide
        zhumayun Zain Humayun added a comment -

        Yes, you're right. This was an oversight on my part. What is the process for changing minor things like this? Another PR/JIRA ticket? A diff file instead?

        Show
        zhumayun Zain Humayun added a comment - Yes, you're right. This was an oversight on my part. What is the process for changing minor things like this? Another PR/JIRA ticket? A diff file instead?
        Hide
        julianhyde Julian Hyde added a comment -

        I see now why you need two "create" methods. However they are now rather different beasts - one modifies its arguments, another doesn't - so changing the description of the methods and parameters would help.

        Show
        julianhyde Julian Hyde added a comment - I see now why you need two "create" methods. However they are now rather different beasts - one modifies its arguments, another doesn't - so changing the description of the methods and parameters would help.
        Hide
        julianhyde Julian Hyde added a comment -

        Use org.junit.Assert, typically the assertThat method.

        Show
        julianhyde Julian Hyde added a comment - Use org.junit.Assert, typically the assertThat method.
        Hide
        zhumayun Zain Humayun added a comment - - edited

        There was an edge case where the create method was given correctly populated fields, and a non-null connection which meant that it fetched the metadata twice in total, when it should only do it once. This is why I created another create method. That's why I changed the order of the code at the end of DruidTableFactory.

        Show
        zhumayun Zain Humayun added a comment - - edited There was an edge case where the create method was given correctly populated fields, and a non-null connection which meant that it fetched the metadata twice in total, when it should only do it once. This is why I created another create method. That's why I changed the order of the code at the end of DruidTableFactory .
        Hide
        zhumayun Zain Humayun added a comment - - edited

        I was under the impression that the Schema created was tied to a particular connection. After this change, connecting to Druid clusters without model definitions should be much faster. I was also reluctant of testing with an assert, however CalciteAssert is used to handle query-related test cases, where as this test case is not quite that. Do you recommend I add another method to CalciteAssert?

        Show
        zhumayun Zain Humayun added a comment - - edited I was under the impression that the Schema created was tied to a particular connection . After this change, connecting to Druid clusters without model definitions should be much faster. I was also reluctant of testing with an assert, however CalciteAssert is used to handle query-related test cases, where as this test case is not quite that. Do you recommend I add another method to CalciteAssert?
        Hide
        julianhyde Julian Hyde added a comment -

        Sorry, I see now that it's a Druid connection, not a JDBC connection. That should be OK.

        Do we need two DruidTable.create methods? What's wrong with always requiring a connection? The method is package-protected so there's no need to preserve compatibility.

        Also, spurious code re-formatting.

        Show
        julianhyde Julian Hyde added a comment - Sorry, I see now that it's a Druid connection, not a JDBC connection. That should be OK. Do we need two DruidTable.create methods? What's wrong with always requiring a connection? The method is package-protected so there's no need to preserve compatibility. Also, spurious code re-formatting.
        Hide
        julianhyde Julian Hyde added a comment -

        Plus, you shouldn't be using assert to write test cases.

        Show
        julianhyde Julian Hyde added a comment - Plus, you shouldn't be using assert to write test cases.
        Hide
        julianhyde Julian Hyde added a comment -

        You've made Table belong to a particular Connection. Usually Schema and Table have a longer life-span than Connection. What are the implications of this change?

        Show
        julianhyde Julian Hyde added a comment - You've made Table belong to a particular Connection. Usually Schema and Table have a longer life-span than Connection. What are the implications of this change?
        Hide
        zhumayun Zain Humayun added a comment -

        Awesome. Thank you very much

        Show
        zhumayun Zain Humayun added a comment - Awesome. Thank you very much
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -
        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/f6825f0 . Thanks for the contribution Zain Humayun !
        Hide
        zhumayun Zain Humayun added a comment -
        Show
        zhumayun Zain Humayun added a comment - Jesus Camacho Rodriguez can you please review? Thanks! https://github.com/apache/calcite/pull/524

          People

          • Assignee:
            zhumayun Zain Humayun
            Reporter:
            zhumayun Zain Humayun
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development