Hive
  1. Hive
  2. HIVE-1496

enhance CREATE INDEX to support immediate index build

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0, 0.8.0
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None

      Description

      Currently we only support WITH DEFERRED REBUILD.

      1. hive-1496.patch
        40 kB
        Ashutosh Chauhan
      2. hive-1496.1.patch.txt
        54 kB
        Siddhartha Gunda

        Issue Links

          Activity

          Hide
          Siddhartha Gunda added a comment -

          Phabricator URI : https://reviews.facebook.net/D3951

          Please review the code. Thanks!

          Show
          Siddhartha Gunda added a comment - Phabricator URI : https://reviews.facebook.net/D3951 Please review the code. Thanks!
          Hide
          Ashutosh Chauhan added a comment -

          @Siddhartha,

          Can you create a review request for this on Phabricator?

          Show
          Ashutosh Chauhan added a comment - @Siddhartha, Can you create a review request for this on Phabricator?
          Hide
          Siddhartha Gunda added a comment -

          This patch uses "create table as" instead of insert statement to build the index table. Create table as works only for non partitioned tables as of now. Hence this patch is for non partitioned tables.

          Show
          Siddhartha Gunda added a comment - This patch uses "create table as" instead of insert statement to build the index table. Create table as works only for non partitioned tables as of now. Hence this patch is for non partitioned tables.
          Hide
          John Sichi added a comment -

          Oof. That sounds too painful+brittle. What if we instead prepare a reentrant CREATE TABLE AS SELECT statement (rather than an INSERT statement), but then splice out the CREATE TABLE part (leaving just the load part). Still very hacky, but maybe not too much.

          Show
          John Sichi added a comment - Oof. That sounds too painful+brittle. What if we instead prepare a reentrant CREATE TABLE AS SELECT statement (rather than an INSERT statement), but then splice out the CREATE TABLE part (leaving just the load part). Still very hacky, but maybe not too much.
          Hide
          Syed S. Albiz added a comment -

          CREATE TABLE AS SELECT doesn't need to compile re-entrant HQL, so it resolves the table name for the INSERT as part of the same SemanticAnalyzer pass. However when we run the ALTER INDEX statement, the DDLSemanticAnalyzer tries to compile a re-entrant INSERT query as a dependent task. I guess one other approach would be to do some checking in the SemanticAnalyzer pass on the re-entrant SQL to populate the mapping of valid table names with tables that will be created by the time the task doing the INSERT runs?

          Show
          Syed S. Albiz added a comment - CREATE TABLE AS SELECT doesn't need to compile re-entrant HQL, so it resolves the table name for the INSERT as part of the same SemanticAnalyzer pass. However when we run the ALTER INDEX statement, the DDLSemanticAnalyzer tries to compile a re-entrant INSERT query as a dependent task. I guess one other approach would be to do some checking in the SemanticAnalyzer pass on the re-entrant SQL to populate the mapping of valid table names with tables that will be created by the time the task doing the INSERT runs?
          Hide
          John Sichi added a comment -

          @Syed: how does CREATE TABLE AS SELECT manage to do it? Doesn't it need to prepare the equivalent of an INSERT statement against a table which doesn't exist yet?

          Show
          John Sichi added a comment - @Syed: how does CREATE TABLE AS SELECT manage to do it? Doesn't it need to prepare the equivalent of an INSERT statement against a table which doesn't exist yet?
          Hide
          John Sichi added a comment -

          @Syed: how does CREATE TABLE AS SELECT manage to do it? Doesn't it need to prepare the equivalent of an INSERT statement against a table which doesn't exist yet?

          Show
          John Sichi added a comment - @Syed: how does CREATE TABLE AS SELECT manage to do it? Doesn't it need to prepare the equivalent of an INSERT statement against a table which doesn't exist yet?
          Hide
          Syed S. Albiz added a comment -

          I've been looking at this for a while, unfortunately, since the population job requires the index table to exist so that it can do an INSERT OVERWRITE (...) into it, the approach we discussed using a dummy object will not work as expected. I think the only way to make this work will be to do the ms transaction to create the index table immediately in the SemanticAnalyzer, with a special case if necessary for EXPLAINs such that the code is not executed if part of an EXPLAIN statement.

          Show
          Syed S. Albiz added a comment - I've been looking at this for a while, unfortunately, since the population job requires the index table to exist so that it can do an INSERT OVERWRITE (...) into it, the approach we discussed using a dummy object will not work as expected. I think the only way to make this work will be to do the ms transaction to create the index table immediately in the SemanticAnalyzer, with a special case if necessary for EXPLAINs such that the code is not executed if part of an EXPLAIN statement.
          Hide
          John Sichi added a comment -

          Discussion from IRC:

          ssalbiz: jsichi: I was looking at ashutosh 's patch for 1496, and I was wondering if the problem with it the lack of atomicity? It seems to chain the map-red tasks to populate the index with the DDLTask correctly if I'm reading the patch/testcase right
          [4:37pm] jsichi: ssalbiz:  you're right; I misread the patch--didn't notice the addIdxTasks part.  ashutosh, sorry about that.
          [4:40pm] jsichi: but I don't think we should be calling db.createIndex directly from DDLSemanticAnalyzer...should still be chaining in the DDLWork for that
          [4:41pm] ssalbiz: right, I agree
          [4:43pm] ashutosh: jsichi: I rememeber you commenting on the jira that atomicity will be an issue, but its ok to solve it seperately in a followup work
          [4:44pm] jsichi: ashutosh:  agreed.  But we should still be following the usual pattern for executing the metastore update from within a task (rather than analyzer)
          [4:45pm] jsichi: Another followup is to support a mode whereby updates to a table trigger a rebuild of the corresponding index partitions.
          [4:47pm] jsichi: I guess the reason you had to do it early (in the analyzer) is that the build-task generation requires the metastore to already be populated.
          [4:47pm] ashutosh: yeah.. correct thats the reason
          [4:47pm] jsichi: hmmm
          [4:48pm] ashutosh: build-task assumes all the data to be populated
          [4:55pm] jsichi: I guess the only way to resolve that would be to factor out the code that knows how to make up an Index object from a CreateIndexDesc, and then create a temp during analysis (then discard it), then later create the real one when the task executes.
          [4:56pm] ashutosh: yeah.. i think temp object approach may work
          [4:57pm] ashutosh: but probably will churn around lot of code
          [4:58pm] jsichi: having EXPLAIN able to show what's gonna be done without actually doing it seems like a valuable guarantee to preserve
          [4:58pm] jsichi: (see e.g. HIVE-2478)
          [5:01pm] ashutosh: ⁃actually temp obj approach may not work because createIndex task connects to metastore to get the metadata … so it must exist in metastore
          [5:02pm] jsichi: that's only for verifying that the index name does not conflict, rigth?
          [5:05pm] jsichi: woohoo, finally gonna get a clean trunk build on Jenkins since I'm about to commit HIVE-2493!
          [5:05pm] ashutosh: awesome
          [5:05pm] jsichi: already got a clean run on 0.8
          [5:07pm] ashutosh: for me HBase tests always fail
          [5:07pm] ashutosh: with exception NoRegionServerFound exception
          [5:07pm] ashutosh: whats the magic there ?
          [5:08pm] ssalbiz: looking at the TableBasedIndexHandler code, it does a bunch of checks to ensure that the partition specs of the table and index are in sync in the metastore. I think it would be possible to write a helper method in TableBasedIndexHandler that can be used to generate Index Map-Red Tasks without relying on the metastore at all if we can assume that the Index ms partition spec and the base table partition spec are going to be consistent when the da
          [5:09pm] ssalbiz: seems like less code churn than trying to feed the current method mock metastore/Index objects to make those checks pass
          [5:10pm] jsichi: ashutosh:  hmm, dunno...I'll bet there's a real exception buried somewhere deep in the logs...
          [5:10pm] jsichi: ssalbiz:  I don't think we want to change the index handler interface though
          [5:15pm] ssalbiz: hmmm, ok, in that case I guess we will have to feed temp objects to the existing generateIndexBuildTaskList method
          [5:18pm] ashutosh: one possibility to avoid explain problem is to not execute ms operation in semantic analyzer if its an explain query
          [5:19pm] jsichi: There's already precedent for the temp objects....e.g. Hive.createIndex already calls indexHandler.analyzeIndexDefinition with indexDesc and tt params which haven't actually been written to the metastore yet.
          [5:20pm] jsichi: ashutosh:  that wouldn't work, would it, since the explain is supposed to show the index build tasks too
          [5:22pm] ashutosh: hmm.. right.. it wont show it.. but will atleast prevent execution of unwanted ms operation in case of explain ...
          [5:22pm] ashutosh: i can take a look at temp object approach
          [5:23pm] jsichi: ssalbiz is actually going to work on indexing again as part of a school project, so if you're OK with it, we can assign back to him.
          [5:25pm] ashutosh: ya.. thats fine..
          [5:25pm] ashutosh: he can take it up
          [5:25pm] jsichi: OK cool.  I'll copy-paste this conversation into JIRA in case we forget anything later.
          
          Show
          John Sichi added a comment - Discussion from IRC: ssalbiz: jsichi: I was looking at ashutosh 's patch for 1496, and I was wondering if the problem with it the lack of atomicity? It seems to chain the map-red tasks to populate the index with the DDLTask correctly if I'm reading the patch/testcase right [4:37pm] jsichi: ssalbiz: you're right; I misread the patch--didn't notice the addIdxTasks part. ashutosh, sorry about that. [4:40pm] jsichi: but I don't think we should be calling db.createIndex directly from DDLSemanticAnalyzer...should still be chaining in the DDLWork for that [4:41pm] ssalbiz: right, I agree [4:43pm] ashutosh: jsichi: I rememeber you commenting on the jira that atomicity will be an issue, but its ok to solve it seperately in a followup work [4:44pm] jsichi: ashutosh: agreed. But we should still be following the usual pattern for executing the metastore update from within a task (rather than analyzer) [4:45pm] jsichi: Another followup is to support a mode whereby updates to a table trigger a rebuild of the corresponding index partitions. [4:47pm] jsichi: I guess the reason you had to do it early (in the analyzer) is that the build-task generation requires the metastore to already be populated. [4:47pm] ashutosh: yeah.. correct thats the reason [4:47pm] jsichi: hmmm [4:48pm] ashutosh: build-task assumes all the data to be populated [4:55pm] jsichi: I guess the only way to resolve that would be to factor out the code that knows how to make up an Index object from a CreateIndexDesc, and then create a temp during analysis (then discard it), then later create the real one when the task executes. [4:56pm] ashutosh: yeah.. i think temp object approach may work [4:57pm] ashutosh: but probably will churn around lot of code [4:58pm] jsichi: having EXPLAIN able to show what's gonna be done without actually doing it seems like a valuable guarantee to preserve [4:58pm] jsichi: (see e.g. HIVE-2478) [5:01pm] ashutosh: ⁃actually temp obj approach may not work because createIndex task connects to metastore to get the metadata … so it must exist in metastore [5:02pm] jsichi: that's only for verifying that the index name does not conflict, rigth? [5:05pm] jsichi: woohoo, finally gonna get a clean trunk build on Jenkins since I'm about to commit HIVE-2493! [5:05pm] ashutosh: awesome [5:05pm] jsichi: already got a clean run on 0.8 [5:07pm] ashutosh: for me HBase tests always fail [5:07pm] ashutosh: with exception NoRegionServerFound exception [5:07pm] ashutosh: whats the magic there ? [5:08pm] ssalbiz: looking at the TableBasedIndexHandler code, it does a bunch of checks to ensure that the partition specs of the table and index are in sync in the metastore. I think it would be possible to write a helper method in TableBasedIndexHandler that can be used to generate Index Map-Red Tasks without relying on the metastore at all if we can assume that the Index ms partition spec and the base table partition spec are going to be consistent when the da [5:09pm] ssalbiz: seems like less code churn than trying to feed the current method mock metastore/Index objects to make those checks pass [5:10pm] jsichi: ashutosh: hmm, dunno...I'll bet there's a real exception buried somewhere deep in the logs... [5:10pm] jsichi: ssalbiz: I don't think we want to change the index handler interface though [5:15pm] ssalbiz: hmmm, ok, in that case I guess we will have to feed temp objects to the existing generateIndexBuildTaskList method [5:18pm] ashutosh: one possibility to avoid explain problem is to not execute ms operation in semantic analyzer if its an explain query [5:19pm] jsichi: There's already precedent for the temp objects....e.g. Hive.createIndex already calls indexHandler.analyzeIndexDefinition with indexDesc and tt params which haven't actually been written to the metastore yet. [5:20pm] jsichi: ashutosh: that wouldn't work, would it, since the explain is supposed to show the index build tasks too [5:22pm] ashutosh: hmm.. right.. it wont show it.. but will atleast prevent execution of unwanted ms operation in case of explain ... [5:22pm] ashutosh: i can take a look at temp object approach [5:23pm] jsichi: ssalbiz is actually going to work on indexing again as part of a school project, so if you're OK with it, we can assign back to him. [5:25pm] ashutosh: ya.. thats fine.. [5:25pm] ashutosh: he can take it up [5:25pm] jsichi: OK cool. I'll copy-paste this conversation into JIRA in case we forget anything later.
          Hide
          John Sichi added a comment -

          Ignore my previous comment...I misread the patch.

          Show
          John Sichi added a comment - Ignore my previous comment...I misread the patch.
          Hide
          John Sichi added a comment -

          Ashutosh, the DEFERRED REBUILD refers to the data portion (not the metadata for the index definition).

          Show
          John Sichi added a comment - Ashutosh, the DEFERRED REBUILD refers to the data portion (not the metadata for the index definition).
          Hide
          Ashutosh Chauhan added a comment -

          Attaching patch with a testcase. Also, assigning to myself.

          Show
          Ashutosh Chauhan added a comment - Attaching patch with a testcase. Also, assigning to myself.
          Hide
          John Sichi added a comment -

          Syed's internship is over, so I don't think he's working on this one any more.

          Show
          John Sichi added a comment - Syed's internship is over, so I don't think he's working on this one any more.
          Hide
          Ashutosh Chauhan added a comment -

          @Syed,

          Are you working on this ?

          Show
          Ashutosh Chauhan added a comment - @Syed, Are you working on this ?
          Hide
          John Sichi added a comment -

          A related topic here is what happens when an INSERT is done to a table with an index on it. For WITH DEFERRED REBUILD, it's necessary for the user to explicitly run the REBUILD. For immediate rebuild, it should happen automatically for the partitions affected by the INSERT.

          We can either do this now or treat it as a followup.

          Show
          John Sichi added a comment - A related topic here is what happens when an INSERT is done to a table with an index on it. For WITH DEFERRED REBUILD, it's necessary for the user to explicitly run the REBUILD. For immediate rebuild, it should happen automatically for the partitions affected by the INSERT. We can either do this now or treat it as a followup.
          Hide
          John Sichi added a comment -

          The implementation for this will need to chain together a task to do the actual index building together with a task to do the metastore update. It should be similar to CREATE TABLE AS SELECT (which both creates the table definition in the metastore and does the equivalent of an INSERT to populate it with the SELECT results).

          Use "EXPLAIN CREATE TABLE p AS SELECT * FROM pokes;" to see the combined plan. And see the end of SemanticAnalyzer.genMapRedTasks for where it chains the tasks together.

              if (qb.isCTAS()) {
                // generate a DDL task and make it a dependent task of the leaf
                ...
          

          For immediate index build, we want to combine the existing CREATE INDEX with ALTER INDEX REBUILD. One hiccup may be that the rebuild already wants the index to be defined in the metastore, whereas for CREATE TABLE AS SELECT we do it in the opposite order (only populating the metastore after the data is successfully loaded). It may be acceptable to just make the CREATE INDEX non-atomic (i.e. populate the metastore first, and if the rebuild fails, we leave the index empty; the user can retry with ALTER INDEX REBUILD, same as if it had been deferred in the first place).

          Ning Zhang (nzhang at facebook dot com) did the CREATE TABLE AS SELECT implementation, so he may be able to provide help if you run into trouble with this one.

          Show
          John Sichi added a comment - The implementation for this will need to chain together a task to do the actual index building together with a task to do the metastore update. It should be similar to CREATE TABLE AS SELECT (which both creates the table definition in the metastore and does the equivalent of an INSERT to populate it with the SELECT results). Use "EXPLAIN CREATE TABLE p AS SELECT * FROM pokes;" to see the combined plan. And see the end of SemanticAnalyzer.genMapRedTasks for where it chains the tasks together. if (qb.isCTAS()) { // generate a DDL task and make it a dependent task of the leaf ... For immediate index build, we want to combine the existing CREATE INDEX with ALTER INDEX REBUILD. One hiccup may be that the rebuild already wants the index to be defined in the metastore, whereas for CREATE TABLE AS SELECT we do it in the opposite order (only populating the metastore after the data is successfully loaded). It may be acceptable to just make the CREATE INDEX non-atomic (i.e. populate the metastore first, and if the rebuild fails, we leave the index empty; the user can retry with ALTER INDEX REBUILD, same as if it had been deferred in the first place). Ning Zhang (nzhang at facebook dot com) did the CREATE TABLE AS SELECT implementation, so he may be able to provide help if you run into trouble with this one.
          Hide
          John Sichi added a comment -

          Russell, please reassign to the actual owner.

          Show
          John Sichi added a comment - Russell, please reassign to the actual owner.
          Hide
          John Sichi added a comment -

          Followup for HIVE-417.

          Show
          John Sichi added a comment - Followup for HIVE-417 .

            People

            • Assignee:
              Syed S. Albiz
              Reporter:
              John Sichi
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development