Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5972

Slow performance for query on INFORMATION_SCHEMA.TABLE

    Details

      Description

      A query like the following on INFORMATION_SCHEMA takes a long time to execute.

      select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES` WHERE TABLE_NAME LIKE '%' AND ( TABLE_SCHEMA = 'hive.default' ) ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME;

      Reason being we fetch table information for all schemas instead of just 'hive.default' schema.

      If we change the predicate like this, it executes very fast.

      select TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE from INFORMATION_SCHEMA.`TABLES` WHERE ( TABLE_SCHEMA = 'hive.default' ) AND TABLE_NAME LIKE '%' ORDER BY TABLE_TYPE, TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME;

      The difference is in the order in which we evaluate the expressions in the predicate.
      In the first case, we first evaluate TABLE_NAME LIKE '%' and decide that it is inconclusive (since we do not know the schema). So, we go get all tables for all the schemas.

      In the second case, we first evaluate TABLE_SCHEMA = 'hive.default' and decide that we need to fetch only tables for that schema.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user ppadma opened a pull request:

          https://github.com/apache/drill/pull/1038

          DRILL-5972: Slow performance for query on INFORMATION_SCHEMA.TABLE

          Please see DRILL-5972.
          Problem is while evaluating "boolean and", we are returning as soon as expression result is not TRUE. We should go through all the expressions and if any of them is FALSE, return FALSE. If any of them is INCONCLUSIVE, return INCONCLUSIVE.
          Tested the fix with lot of hive tables and verified that it fixes the performance problem.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/ppadma/drill DRILL-5972

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/drill/pull/1038.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #1038


          commit 70f489098acc94d9e1af1aa05c35e2dfe01198c6
          Author: Padma Penumarthy <ppenumar97@yahoo.com>
          Date: 2017-11-16T04:31:23Z

          DRILL-5972: Slow performance for query on INFORMATION_SCHEMA.TABLE


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user ppadma opened a pull request: https://github.com/apache/drill/pull/1038 DRILL-5972 : Slow performance for query on INFORMATION_SCHEMA.TABLE Please see DRILL-5972 . Problem is while evaluating "boolean and", we are returning as soon as expression result is not TRUE. We should go through all the expressions and if any of them is FALSE, return FALSE. If any of them is INCONCLUSIVE, return INCONCLUSIVE. Tested the fix with lot of hive tables and verified that it fixes the performance problem. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ppadma/drill DRILL-5972 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1038.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1038 commit 70f489098acc94d9e1af1aa05c35e2dfe01198c6 Author: Padma Penumarthy <ppenumar97@yahoo.com> Date: 2017-11-16T04:31:23Z DRILL-5972 : Slow performance for query on INFORMATION_SCHEMA.TABLE
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user priteshm commented on the issue:

          https://github.com/apache/drill/pull/1038

          @parthchandra can you please review this?

          Show
          githubbot ASF GitHub Bot added a comment - Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1038 @parthchandra can you please review this?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user parthchandra commented on a diff in the pull request:

          https://github.com/apache/drill/pull/1038#discussion_r153658073

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java —
          @@ -202,14 +202,19 @@ private Result evaluateHelperFunction(Map<String, String> recordValues, Function
          // If at least one arg returns FALSE, then the AND function value is FALSE
          // If at least one arg returns INCONCLUSIVE, then the AND function value is INCONCLUSIVE
          // If all args return TRUE, then the AND function value is TRUE
          + Result result = Result.TRUE;
          +
          for(ExprNode arg : exprNode.args) {
          Result exprResult = evaluateHelper(recordValues, arg);

          • if (exprResult != Result.TRUE) {
            + if (exprResult == Result.FALSE) { return exprResult; }

            + if (exprResult == Result.INCONCLUSIVE) {

              • End diff –

          Just to be clear. You want to return `Result.INCONCLUSIVE` if any one of the expressions is inconclusive and there is no expression that is false. Correct ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/1038#discussion_r153658073 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java — @@ -202,14 +202,19 @@ private Result evaluateHelperFunction(Map<String, String> recordValues, Function // If at least one arg returns FALSE, then the AND function value is FALSE // If at least one arg returns INCONCLUSIVE, then the AND function value is INCONCLUSIVE // If all args return TRUE, then the AND function value is TRUE + Result result = Result.TRUE; + for(ExprNode arg : exprNode.args) { Result exprResult = evaluateHelper(recordValues, arg); if (exprResult != Result.TRUE) { + if (exprResult == Result.FALSE) { return exprResult; } + if (exprResult == Result.INCONCLUSIVE) { End diff – Just to be clear. You want to return `Result.INCONCLUSIVE` if any one of the expressions is inconclusive and there is no expression that is false. Correct ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user ppadma commented on a diff in the pull request:

          https://github.com/apache/drill/pull/1038#discussion_r153661820

          — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java —
          @@ -202,14 +202,19 @@ private Result evaluateHelperFunction(Map<String, String> recordValues, Function
          // If at least one arg returns FALSE, then the AND function value is FALSE
          // If at least one arg returns INCONCLUSIVE, then the AND function value is INCONCLUSIVE
          // If all args return TRUE, then the AND function value is TRUE
          + Result result = Result.TRUE;
          +
          for(ExprNode arg : exprNode.args) {
          Result exprResult = evaluateHelper(recordValues, arg);

          • if (exprResult != Result.TRUE) {
            + if (exprResult == Result.FALSE) { return exprResult; }

            + if (exprResult == Result.INCONCLUSIVE) {

              • End diff –

          @parthchandra yes, that is correct.

          Show
          githubbot ASF GitHub Bot added a comment - Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1038#discussion_r153661820 — Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java — @@ -202,14 +202,19 @@ private Result evaluateHelperFunction(Map<String, String> recordValues, Function // If at least one arg returns FALSE, then the AND function value is FALSE // If at least one arg returns INCONCLUSIVE, then the AND function value is INCONCLUSIVE // If all args return TRUE, then the AND function value is TRUE + Result result = Result.TRUE; + for(ExprNode arg : exprNode.args) { Result exprResult = evaluateHelper(recordValues, arg); if (exprResult != Result.TRUE) { + if (exprResult == Result.FALSE) { return exprResult; } + if (exprResult == Result.INCONCLUSIVE) { End diff – @parthchandra yes, that is correct.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user parthchandra commented on the issue:

          https://github.com/apache/drill/pull/1038

          +1. LGTM

          Show
          githubbot ASF GitHub Bot added a comment - Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1038 +1. LGTM
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/drill/pull/1038

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1038
          Hide
          arina Arina Ielchiieva added a comment -

          Merged into master with commit id 3d13b4f604852b959fe5ab0f21ec01d455553a2f

          Show
          arina Arina Ielchiieva added a comment - Merged into master with commit id 3d13b4f604852b959fe5ab0f21ec01d455553a2f

            People

            • Assignee:
              ppenumarthy Padma Penumarthy
              Reporter:
              ppenumarthy Padma Penumarthy
              Reviewer:
              Parth Chandra
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development