Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3.0, 0.4.0, 0.6.0
    • Fix Version/s: 0.7.0
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Index support requires metastore schema upgrade (TBD).

      Description

      Implement indexing on Hive so that lookup and range queries are efficient.

      1. indexing_with_ql_rewrites_trunk_953221.patch
        191 kB
        Prafulla T
      2. idx2.png
        168 kB
        John Sichi
      3. hive-indexing-8-thrift-metastore-remodel.patch
        1.28 MB
        He Yongqiang
      4. hive-indexing.5.thrift.patch
        195 kB
        He Yongqiang
      5. hive-indexing.3.patch
        149 kB
        He Yongqiang
      6. hive-417-2009-07-18.patch
        160 kB
        He Yongqiang
      7. hive-417.proto.patch
        39 kB
        Prasad Chakka
      8. hive.indexing.13.patch
        2.78 MB
        He Yongqiang
      9. hive.indexing.12.patch
        2.78 MB
        He Yongqiang
      10. hive.indexing.11.patch
        1.34 MB
        He Yongqiang

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          This sounds really awesome! Make hadoop-hive suitable for things other than brute force table-scans!

          Show
          dhruba borthakur added a comment - This sounds really awesome! Make hadoop-hive suitable for things other than brute force table-scans!
          Hide
          Prasad Chakka added a comment -

          We already have code (based on early version of hive) that does the important work. But we need to integrate this with Hive QL and determine the candidate indexes for a query. Also a way to create buckets (both hash and range) would really speed up the index processing. I didn't realize but predicate pushdown makes things simpler.

          Show
          Prasad Chakka added a comment - We already have code (based on early version of hive) that does the important work. But we need to integrate this with Hive QL and determine the candidate indexes for a query. Also a way to create buckets (both hash and range) would really speed up the index processing. I didn't realize but predicate pushdown makes things simpler.
          Hide
          Ashish Thusoo added a comment -

          +1 to this...

          We should also look at Katta

          http://katta.wiki.sourceforge.net/

          to see if we can leverage it to be a really fast index. I think that integration would really rock..

          Thoughts?

          Show
          Ashish Thusoo added a comment - +1 to this... We should also look at Katta http://katta.wiki.sourceforge.net/ to see if we can leverage it to be a really fast index. I think that integration would really rock.. Thoughts?
          Hide
          Prasad Chakka added a comment -

          Another way of doing it is to create a file format that contains index along with data... but i think that would take lot more time.

          Show
          Prasad Chakka added a comment - Another way of doing it is to create a file format that contains index along with data... but i think that would take lot more time.
          Hide
          He Yongqiang added a comment -

          Great feature! Looking forward to see it.

          Show
          He Yongqiang added a comment - Great feature! Looking forward to see it.
          Hide
          Prasad Chakka added a comment -

          HIVE-1230 has changed the interface for RecordReader and it no longer has getPos() method. The older interfaces are deprecated. I used this method in the prototype get the current position while creating the index and also while reading the actual data file. Even the SequenceFileRecordReader does not have this method.

          Without getPos() and seek() methods to RecordReader it becomes tough to implement any kind of generic indexing.

          Show
          Prasad Chakka added a comment - HIVE-1230 has changed the interface for RecordReader and it no longer has getPos() method. The older interfaces are deprecated. I used this method in the prototype get the current position while creating the index and also while reading the actual data file. Even the SequenceFileRecordReader does not have this method. Without getPos() and seek() methods to RecordReader it becomes tough to implement any kind of generic indexing.
          Hide
          Prasad Chakka added a comment -

          Here is a very rough outline as how this can be done (prototype code has creation and execution parts but not he HiveQL related stuff)

          hive indexing:
          goal of hive indexing is to speed up lookup queries on certain columns of the table. currently queries with predicates like 'WHERE tab1.col1 = 10' has to load the complete table/partition and process all the rows. if there exists an index on col1 then only a small portion of the file can be loaded.

          command to create index:

          create index tab_idx1 on tab1 (col1, ...);

          if the base table is partitioned then the index is also partitioned. indexes can be created on base tables whose file format supports (getPos() and possible seek() or equivalent methods.)

          format of index:
          index is also a hive table with the following columns
          col_1...col_k – key cols. base table columns on which this index is defined
          list<offset> – positions of rows which contain these keys

          offset is a combination of following
          file_name – relative path of the file in which this row is contained. (relative to the partition/table location)
          byte_offset – byte offset of the row in the file. row can be found at this byte offset or in the block starting at this byte offset for Block Compressed Sequence Files.

          when to create index:
          traditionally databases try to update index when the table is loaded. hive doesn't process rows while loading tables using 'LOAD DATA INPATH' command. also it may slow down the actual loading for 'INSERT ... SELECT ... FROM ...' type of statements. so users should have an option whether the index is initialized during 'INSERT ... SELECT ...' or initialized separately. Another command like 'update index tab_idx partition ..' can be provided.

          how to create index:
          index can be created using the following hive command augmented with 'offset'
          'select col_1...col_k, offset from tab1'

          offset can be provided as built in function which can be derived in HiveInputRecordReader which will in turn use the specific FileFormat's Reader getPos() method and the 'map.input.file' for the file name (or from the tableDesc or partiionDesc).

          Algorithm For using index:
          1) Hive QL needs to determine whether a particular query can use any existing indexes. This can be determined by examining the predicate tree. After predicate pushdown, all those predicates which can use index are in the child operator of a TableScanOperator. This predicate tree needs to be examined. If this contains any subset of columns of an index then that index can be used. Until stats are available, it is not possible to guess whether using index is beneficial. This needs to be fleshed out more to check both 'AND' and 'OR' predicates.

          2) For each of the qualified indexes, a map/reduce job can be created using the predicates determined in step 1. The output of this job should have the following information
          file_name – fully qualified file name that contains the data
          byte_offset – position of row

          3) If there is more than one qualified index then the outputs of step2 needs to be combined depending on whether the predicates on these indexes have 'AND' or 'OR' between them.

          4) Modify the original plan to use only those FileSplits that appear in the output of step3. This reduces the number of mappers spawned by JobTracker.

          5) Modify the original plan to use HiveIndexRecordReader instead of regular record reader. Output of step3 (which is sorted) is available to the HiveIndexRecordReader. It can skip to these locations instead of reading every record in the input of the Mapper.

          Show
          Prasad Chakka added a comment - Here is a very rough outline as how this can be done (prototype code has creation and execution parts but not he HiveQL related stuff) hive indexing: goal of hive indexing is to speed up lookup queries on certain columns of the table. currently queries with predicates like 'WHERE tab1.col1 = 10' has to load the complete table/partition and process all the rows. if there exists an index on col1 then only a small portion of the file can be loaded. command to create index: create index tab_idx1 on tab1 (col1, ...); if the base table is partitioned then the index is also partitioned. indexes can be created on base tables whose file format supports (getPos() and possible seek() or equivalent methods.) format of index: index is also a hive table with the following columns col_1...col_k – key cols. base table columns on which this index is defined list<offset> – positions of rows which contain these keys offset is a combination of following file_name – relative path of the file in which this row is contained. (relative to the partition/table location) byte_offset – byte offset of the row in the file. row can be found at this byte offset or in the block starting at this byte offset for Block Compressed Sequence Files. when to create index: traditionally databases try to update index when the table is loaded. hive doesn't process rows while loading tables using 'LOAD DATA INPATH' command. also it may slow down the actual loading for 'INSERT ... SELECT ... FROM ...' type of statements. so users should have an option whether the index is initialized during 'INSERT ... SELECT ...' or initialized separately. Another command like 'update index tab_idx partition ..' can be provided. how to create index: index can be created using the following hive command augmented with 'offset' 'select col_1...col_k, offset from tab1' offset can be provided as built in function which can be derived in HiveInputRecordReader which will in turn use the specific FileFormat's Reader getPos() method and the 'map.input.file' for the file name (or from the tableDesc or partiionDesc). Algorithm For using index: 1) Hive QL needs to determine whether a particular query can use any existing indexes. This can be determined by examining the predicate tree. After predicate pushdown, all those predicates which can use index are in the child operator of a TableScanOperator. This predicate tree needs to be examined. If this contains any subset of columns of an index then that index can be used. Until stats are available, it is not possible to guess whether using index is beneficial. This needs to be fleshed out more to check both 'AND' and 'OR' predicates. 2) For each of the qualified indexes, a map/reduce job can be created using the predicates determined in step 1. The output of this job should have the following information file_name – fully qualified file name that contains the data byte_offset – position of row 3) If there is more than one qualified index then the outputs of step2 needs to be combined depending on whether the predicates on these indexes have 'AND' or 'OR' between them. 4) Modify the original plan to use only those FileSplits that appear in the output of step3. This reduces the number of mappers spawned by JobTracker. 5) Modify the original plan to use HiveIndexRecordReader instead of regular record reader. Output of step3 (which is sorted) is available to the HiveIndexRecordReader. It can skip to these locations instead of reading every record in the input of the Mapper.
          Hide
          Prasad Chakka added a comment -

          Thanks Yongqiang!

          Show
          Prasad Chakka added a comment - Thanks Yongqiang!
          Hide
          He Yongqiang added a comment -

          Thanks Prasad for detailed description of the index design.
          Several questions:
          (1)
          Is the index based on sort? Single-column index based on sort can be very useful for query involving this column, both point query and range query. But for sort-based multi-column index, it can not be utilized for queries not containing the column used as primary sort order in the index. For example, we create an sort based index on table1(col1,col2,col3). The index uses col1 as primary sort order, col2 as secondary sort order, and col3...
          We can use this index to accelerate queries like:
          1) select * from table1 where col1>2 and col2<34
          2) select * from table1 where col1<34 and col3 >45
          3) selcet * from table1 where col1>23
          but, we can not use it for queries like:
          4) select * from table1 where col2>34 and col3<3
          5) select * from table1 where col2 =34
          6) select * from table1 where col3 <45

          (2)
          Should we consider using index to accelerate query involving join several tables. For example, we have two tables:
          user(userid,name,address, age,title,company);
          click(userid,url,datetime);
          And now we have a query like:
          select url from user, click where user.userid=click.userid and user.name="user_name" and datetime between last month; to select the url list the specified user visits in last month.
          If we have an index: create index user_url on table user(name), click(datetime) where user.userid=click.userid, then the above query can be accelerated.

          (3)
          Index can also be used in Group-by aggregation queries. Should we also consider them?
          (4)
          Another feature is to integrate Lucene index with Hive. Ashish suggested to integrate katta. I took a look at katta, and i think it maybe not necessary to include katta in. If we include it, the hive user will have to deploy katta and zookeeper in their cluster. I think we can integrate lucene internally without touch katta.

          Show
          He Yongqiang added a comment - Thanks Prasad for detailed description of the index design. Several questions: (1) Is the index based on sort? Single-column index based on sort can be very useful for query involving this column, both point query and range query. But for sort-based multi-column index, it can not be utilized for queries not containing the column used as primary sort order in the index. For example, we create an sort based index on table1(col1,col2,col3). The index uses col1 as primary sort order, col2 as secondary sort order, and col3... We can use this index to accelerate queries like: 1) select * from table1 where col1>2 and col2<34 2) select * from table1 where col1<34 and col3 >45 3) selcet * from table1 where col1>23 but, we can not use it for queries like: 4) select * from table1 where col2>34 and col3<3 5) select * from table1 where col2 =34 6) select * from table1 where col3 <45 (2) Should we consider using index to accelerate query involving join several tables. For example, we have two tables: user(userid,name,address, age,title,company); click(userid,url,datetime); And now we have a query like: select url from user, click where user.userid=click.userid and user.name="user_name" and datetime between last month; to select the url list the specified user visits in last month. If we have an index: create index user_url on table user(name), click(datetime) where user.userid=click.userid, then the above query can be accelerated. (3) Index can also be used in Group-by aggregation queries. Should we also consider them? (4) Another feature is to integrate Lucene index with Hive. Ashish suggested to integrate katta. I took a look at katta, and i think it maybe not necessary to include katta in. If we include it, the hive user will have to deploy katta and zookeeper in their cluster. I think we can integrate lucene internally without touch katta.
          Hide
          Prasad Chakka added a comment -

          1)
          The question you raised applies only to B+Tree indexes. The index that I defined above is not really a traditional database index but a kind of summary table (or view) and any lookup/range-query on table requires reading of the whole index. So you can apply all predicates as long as columns referenced in the predicates exist in the index. So we should be able use index on (col1, col2, col3) for all the queries above. Sorting order has no impact here since the whole index is read into memory anyways.

          Since this index can be created in sorted order, we can create sparse index (similar to non-leaf nodes of a B+-Tree) if the index itself is too big (ie, index sizes are order of magnitude larger than HDFS block size). But this can be done as a later optimization.

          2)
          With the design above, indexes on joins will come free since predicate pushdown will push the 'user.name="user_name"' to above the join and only index filtered rows participate in join.

          But creating indexes on the joined output may increase the index size so as to decrease the overall effectiveness. But with sparse indexes this problem might be mitigated so we can support this kind of join indexes along with support for sparse indexes.

          3)
          Yes, for some aggregation queries it may make sense to read the index (since it is a summary table as well). Aggregations or any queries that involve only columns from the index can operate only on the index and not the main table.

          4)
          I also looked at it and not sure how it fits into Hive. Katta is more like an distributed index server.

          Show
          Prasad Chakka added a comment - 1) The question you raised applies only to B+Tree indexes. The index that I defined above is not really a traditional database index but a kind of summary table (or view) and any lookup/range-query on table requires reading of the whole index. So you can apply all predicates as long as columns referenced in the predicates exist in the index. So we should be able use index on (col1, col2, col3) for all the queries above. Sorting order has no impact here since the whole index is read into memory anyways. Since this index can be created in sorted order, we can create sparse index (similar to non-leaf nodes of a B+-Tree) if the index itself is too big (ie, index sizes are order of magnitude larger than HDFS block size). But this can be done as a later optimization. 2) With the design above, indexes on joins will come free since predicate pushdown will push the 'user.name="user_name"' to above the join and only index filtered rows participate in join. But creating indexes on the joined output may increase the index size so as to decrease the overall effectiveness. But with sparse indexes this problem might be mitigated so we can support this kind of join indexes along with support for sparse indexes. 3) Yes, for some aggregation queries it may make sense to read the index (since it is a summary table as well). Aggregations or any queries that involve only columns from the index can operate only on the index and not the main table. 4) I also looked at it and not sure how it fits into Hive. Katta is more like an distributed index server.
          Hide
          He Yongqiang added a comment -

          Prasad,the index you designed is a kind of hash index? And it requires to scan the whole table for either lookup/range-query.
          Sort based query will only need to scan part of the index table, but it also has problems as i posted in the previous comment.

          Since this index can be created in sorted order, we can create sparse index

          how? I think sort based index can not be sparse index, and only can be dense index.

          I also looked at it and not sure how it fits into Hive. Katta is more like an distributed index server.

          yes, we do not need to fit katta into Hive. But we can integrate lucene index into Hive. But right now i think we can put this on the last part of our schedule.

          Show
          He Yongqiang added a comment - Prasad,the index you designed is a kind of hash index? And it requires to scan the whole table for either lookup/range-query. Sort based query will only need to scan part of the index table, but it also has problems as i posted in the previous comment. Since this index can be created in sorted order, we can create sparse index how? I think sort based index can not be sparse index, and only can be dense index. I also looked at it and not sure how it fits into Hive. Katta is more like an distributed index server. yes, we do not need to fit katta into Hive. But we can integrate lucene index into Hive. But right now i think we can put this on the last part of our schedule.
          Hide
          He Yongqiang added a comment -

          Prasad, I think the hash based index is more like a projection, because number of rows with same hash value of indexed columns will be very low.
          And in most cases, the block/pos list's size will only be 1.

          Show
          He Yongqiang added a comment - Prasad, I think the hash based index is more like a projection, because number of rows with same hash value of indexed columns will be very low. And in most cases, the block/pos list's size will only be 1.
          Hide
          Prasad Chakka added a comment -

          The above index is not a hash index since you can't do range queries on hash index and lookups are constant time. not sure what to call this except that it is a view (simple projection) of the base table with offsets into the base table.

          on sparse index, i meant you can create a sparse index on top of the index i described above. but this can be done later.

          > And in most cases, the block/pos list's size will only be 1

          that is not the case if the index is on a non-primary key column. and i think, mostly this is the case where indexes will be used in data warehouses.

          Show
          Prasad Chakka added a comment - The above index is not a hash index since you can't do range queries on hash index and lookups are constant time. not sure what to call this except that it is a view (simple projection) of the base table with offsets into the base table. on sparse index, i meant you can create a sparse index on top of the index i described above. but this can be done later. > And in most cases, the block/pos list's size will only be 1 that is not the case if the index is on a non-primary key column. and i think, mostly this is the case where indexes will be used in data warehouses.
          Hide
          He Yongqiang added a comment -

          >that is not the case if the index is on a non-primary key column. and i think, mostly this is the case where indexes will be used in data warehouses.
          Yes. If the index is built on one column, the block/pos list's size will be large. But if it is built on many columns, i think the block/pos list's size will be small.
          Anyway, we can build this index as the first step.
          And after this finished, we can try other kinds of index, like:
          1) sort based index
          2) lucene index
          3) block-scope B+Tree or R-tree or other advantage index data structures.

          Prasad, you said you already wrote some code, would you please attach it?

          Show
          He Yongqiang added a comment - >that is not the case if the index is on a non-primary key column. and i think, mostly this is the case where indexes will be used in data warehouses. Yes. If the index is built on one column, the block/pos list's size will be large. But if it is built on many columns, i think the block/pos list's size will be small. Anyway, we can build this index as the first step. And after this finished, we can try other kinds of index, like: 1) sort based index 2) lucene index 3) block-scope B+Tree or R-tree or other advantage index data structures. Prasad, you said you already wrote some code, would you please attach it?
          Hide
          He Yongqiang added a comment -

          Forgot one, Bitmap. I did some tests on Bitmap, it's performance is excellent.

          Show
          He Yongqiang added a comment - Forgot one, Bitmap. I did some tests on Bitmap, it's performance is excellent.
          Hide
          Prasad Chakka added a comment -

          Yes, mostly the block/pos size will be small but I don't think we can assume that since there will be enough cases where it will not be true.

          We explore the other approaches later on. different indexes will be useful in different scenarios.

          i will try to post some code this week.

          Show
          Prasad Chakka added a comment - Yes, mostly the block/pos size will be small but I don't think we can assume that since there will be enough cases where it will not be true. We explore the other approaches later on. different indexes will be useful in different scenarios. i will try to post some code this week.
          Hide
          Prasad Chakka added a comment -

          patch built from hive prototype code. it compiles with hive code but needs lot of work before it can fit nicely with hive.

          Yongqing, the code is hard to understand in the context of open source hive code but makes sense from prototype code. let me know if you have any question. you might want to use this as a guide and start from scratch hive2 indexing code.

          Show
          Prasad Chakka added a comment - patch built from hive prototype code. it compiles with hive code but needs lot of work before it can fit nicely with hive. Yongqing, the code is hard to understand in the context of open source hive code but makes sense from prototype code. let me know if you have any question. you might want to use this as a guide and start from scratch hive2 indexing code.
          Hide
          He Yongqiang added a comment -

          Thanks a lot, Prasad. I will put questions on the jira. and will start working on it after we set the design. Looking forward to working on it.

          Show
          He Yongqiang added a comment - Thanks a lot, Prasad. I will put questions on the jira. and will start working on it after we set the design. Looking forward to working on it.
          Hide
          He Yongqiang added a comment -

          Checked how Mysql does with index and found mysql either can not use index to handle situations in my earlier post:

          but, we can not use it for queries like:
          4) select * from table1 where col2>34 and col3<3
          5) select * from table1 where col2 =34
          6) select * from table1 where col3 <45

          And now a basic idea for our index design, just like Prasad commented in previous post:
          1) index structure
          use a mr job to create index, input is a file with all columns, and mapper output kv pairs, where key is <indexed col1, indexed col2,...> offset.
          And we define a comparator for <indexed col1, indexed col2,...> to letting the shuffle phase sort all mappers' output. And in reducer, we combine kv-pairs to
          <indexed col1, indexed col2,...> list_of_offsets
          This is a dense sorted index, then we create a sparse index on the dense index. And we also collect column data distribution informations (histogram) while doing this.
          2)
          we consider using index for a query only when the query involves the columns of leftmost part of the index.
          And also need to consider index merge when involves two indexes, and a cost estimation to consider whether using index will decrease query time (this is the work need to do in the optimizer).

          But as first step, we can first finish part 1 and hive ql part. Then consider part two(optimizer part). After part1 finished, i will examine part2 in more detail.

          Show
          He Yongqiang added a comment - Checked how Mysql does with index and found mysql either can not use index to handle situations in my earlier post: but, we can not use it for queries like: 4) select * from table1 where col2>34 and col3<3 5) select * from table1 where col2 =34 6) select * from table1 where col3 <45 And now a basic idea for our index design, just like Prasad commented in previous post: 1) index structure use a mr job to create index, input is a file with all columns, and mapper output kv pairs, where key is <indexed col1, indexed col2,...> offset. And we define a comparator for <indexed col1, indexed col2,...> to letting the shuffle phase sort all mappers' output. And in reducer, we combine kv-pairs to <indexed col1, indexed col2,...> list_of_offsets This is a dense sorted index, then we create a sparse index on the dense index. And we also collect column data distribution informations (histogram) while doing this. 2) we consider using index for a query only when the query involves the columns of leftmost part of the index. And also need to consider index merge when involves two indexes, and a cost estimation to consider whether using index will decrease query time (this is the work need to do in the optimizer). But as first step, we can first finish part 1 and hive ql part. Then consider part two(optimizer part). After part1 finished, i will examine part2 in more detail.
          Hide
          Prasad Chakka added a comment -

          the plan looks good. i am not sure we need to create sparse index on the dense index in phase 1. In most cases the size of dense index will be small enough so that additional mr job for processing the sparse index will become unnecessary. if sparse index is not necessary then there is not need for the dense index be sorted.

          since the dense index is scanned completely while processing the query, we can use the index if any predicate column exists in index definition.

          Show
          Prasad Chakka added a comment - the plan looks good. i am not sure we need to create sparse index on the dense index in phase 1. In most cases the size of dense index will be small enough so that additional mr job for processing the sparse index will become unnecessary. if sparse index is not necessary then there is not need for the dense index be sorted. since the dense index is scanned completely while processing the query, we can use the index if any predicate column exists in index definition.
          Hide
          Seymour Zhang added a comment -

          Hello Prasad and Yongqiang, Thank you very much for this great effort.

          One of my suggestions would be that, since we've done indexing with Mapreduce, and for some queries based on the generated indexes, can we just omit the time-consuming Mapreduce phase during the querying period, as we've already got all of the files/offsets and we can go to these specific file offsets directly to get relevant rows of the table? This would greatly expedite the query process.

          This would be helpful for the following case in one of my usages with Hive. With Hive, I've already sharded (by date), and bucketed (by cols hashing) of my log data into a hierachical files. Also I've sorted each file with the hashing cols. As I may have many rows with same column values but different timestamps, to minimize index size, I'd like to treat these rows of same col values as a block and only use a single index entry for this block. This will grealy reduce the index size of my data, but still very useful in my query request with those cols.

          Show
          Seymour Zhang added a comment - Hello Prasad and Yongqiang, Thank you very much for this great effort. One of my suggestions would be that, since we've done indexing with Mapreduce, and for some queries based on the generated indexes, can we just omit the time-consuming Mapreduce phase during the querying period, as we've already got all of the files/offsets and we can go to these specific file offsets directly to get relevant rows of the table? This would greatly expedite the query process. This would be helpful for the following case in one of my usages with Hive. With Hive, I've already sharded (by date), and bucketed (by cols hashing) of my log data into a hierachical files. Also I've sorted each file with the hashing cols. As I may have many rows with same column values but different timestamps, to minimize index size, I'd like to treat these rows of same col values as a block and only use a single index entry for this block. This will grealy reduce the index size of my data, but still very useful in my query request with those cols.
          Hide
          He Yongqiang added a comment -

          Thanks for the suggestions, Seymour.
          I have also thought what your said, directly fetch the data instead of initilize a new mr job. I will try include this, but it may be done in the second phase(the optimize phase).

          >>I'd like to treat these rows of same col values as a block and only use a single index entry for this block
          in the design, we indeed only use one index entry. And not only for contineous values, we use the same index entry for all rows with the same col value.

          Show
          He Yongqiang added a comment - Thanks for the suggestions, Seymour. I have also thought what your said, directly fetch the data instead of initilize a new mr job. I will try include this, but it may be done in the second phase(the optimize phase). >>I'd like to treat these rows of same col values as a block and only use a single index entry for this block in the design, we indeed only use one index entry. And not only for contineous values, we use the same index entry for all rows with the same col value.
          Hide
          Joydeep Sen Sarma added a comment -
          • are we going to have one index file per hdfs file? (or one per partition?)

          related question is how this is going to interact with sampling? (i think currently the sampling predicate is optimized out for bucketed tables - although not terribly sure).

          i would love to see the api to invoke the index.

          • ideally we would like to plug in different indexing schemes - as well with map-side joins - the hashmap storing the smaller table can be seen as an index on this table. It would seem that one should be able to replace a map-side join based on tables loaded into jdbm with tables with indices proposed here (and thereby do joins based on indices almost trivially).
          • we should enable people to be able to plug in their own indices (since it's quite likely that over time there will be multiple indexing efforts on hadoop files).
          Show
          Joydeep Sen Sarma added a comment - are we going to have one index file per hdfs file? (or one per partition?) related question is how this is going to interact with sampling? (i think currently the sampling predicate is optimized out for bucketed tables - although not terribly sure). i would love to see the api to invoke the index. ideally we would like to plug in different indexing schemes - as well with map-side joins - the hashmap storing the smaller table can be seen as an index on this table. It would seem that one should be able to replace a map-side join based on tables loaded into jdbm with tables with indices proposed here (and thereby do joins based on indices almost trivially). we should enable people to be able to plug in their own indices (since it's quite likely that over time there will be multiple indexing efforts on hadoop files).
          Hide
          He Yongqiang added a comment -

          Joydeep, Thanks for the concern.
          >>are we going to have one index file per hdfs file?
          yeah.
          >>i would love to see the api to invoke the index
          currently it is not settle down. I will try to give it in the next week.
          >>enable people to be able to plug in their own indices
          I think if we have a well-designed adaptable api, then this can be addressed.
          >>we would like to plug in different indexing schemes
          yes. i have proposed several schemes in previous posts. Can you give me some schemes, so i can compare and make a better design.

          BTW, I will try to write a proposal in next week. I have an important english exam this weekend. Sorry for the delay.

          Show
          He Yongqiang added a comment - Joydeep, Thanks for the concern. >>are we going to have one index file per hdfs file? yeah. >>i would love to see the api to invoke the index currently it is not settle down. I will try to give it in the next week. >>enable people to be able to plug in their own indices I think if we have a well-designed adaptable api, then this can be addressed. >>we would like to plug in different indexing schemes yes. i have proposed several schemes in previous posts. Can you give me some schemes, so i can compare and make a better design. BTW, I will try to write a proposal in next week. I have an important english exam this weekend. Sorry for the delay.
          Hide
          Seymour Zhang added a comment -

          >>are we going to have one index file per hdfs file?

          Can we also support exporting these index files as a table to some other storage system like HBase or Tokyou Cabinet, i.e. these seperate index files for each HDFS file, can be expressed as a single table in Hive?

          Show
          Seymour Zhang added a comment - >>are we going to have one index file per hdfs file? Can we also support exporting these index files as a table to some other storage system like HBase or Tokyou Cabinet, i.e. these seperate index files for each HDFS file, can be expressed as a single table in Hive?
          Hide
          Schubert Zhang added a comment -

          Prasad,

          About yout comments at 17/May/09 08:39 AM.
          >> ... So we should be able use index on (col1, col2, col3) for all the queries above. Sorting order has no impact here since the whole index is read into memory anyways.

          If the size of index is big out of memory size, how to read whole index into memory?

          Show
          Schubert Zhang added a comment - Prasad, About yout comments at 17/May/09 08:39 AM. >> ... So we should be able use index on (col1, col2, col3) for all the queries above. Sorting order has no impact here since the whole index is read into memory anyways. If the size of index is big out of memory size, how to read whole index into memory?
          Hide
          Prasad Chakka added a comment -

          Schubert,

          We can run another map-reduce job that scans the index and builds out the results file sorted by the index key. This file can be read sequentially and determine which input table HDFS blocks to be fed to the actual job for the query.

          Another way is to build a sparse index on the index. But if the table itself is sorted, we can build the sparse index (ala MapFile) directly and use it. @Facebook, the usecase we have doesn't have this sorting property but I can envision this being useful for primary indexes where the index sort order and the table sort order are same.

          Can you think of any other ways? Ofcourse, we can process index files using HBase or TokyoCabinet but that requires another system to be setup and administered and both systems need to be available for index processing. But in some cases these solutions also work. The indexing scheme described above should play well with Hbase and TokyoCabinet since index is a file with rows containg a key and position parameters. In Hadoop we can stored that in SequenceFile or may be TFile but if they have to be stored in external systems, we can plug-in a custom SerDe and change the default location of these two a location where the external systems can access these files.

          Show
          Prasad Chakka added a comment - Schubert, We can run another map-reduce job that scans the index and builds out the results file sorted by the index key. This file can be read sequentially and determine which input table HDFS blocks to be fed to the actual job for the query. Another way is to build a sparse index on the index. But if the table itself is sorted, we can build the sparse index (ala MapFile) directly and use it. @Facebook, the usecase we have doesn't have this sorting property but I can envision this being useful for primary indexes where the index sort order and the table sort order are same. Can you think of any other ways? Ofcourse, we can process index files using HBase or TokyoCabinet but that requires another system to be setup and administered and both systems need to be available for index processing. But in some cases these solutions also work. The indexing scheme described above should play well with Hbase and TokyoCabinet since index is a file with rows containg a key and position parameters. In Hadoop we can stored that in SequenceFile or may be TFile but if they have to be stored in external systems, we can plug-in a custom SerDe and change the default location of these two a location where the external systems can access these files.
          Hide
          Schubert Zhang added a comment -

          Prasad,

          Thanks for your comments. Now, I understand your comments.

          Yes, in one of our projects, we sorted the data table and build sparse index which record the block keys and file offsets. Then, we load the index files into HBase to service for query. I works fine.

          Show
          Schubert Zhang added a comment - Prasad, Thanks for your comments. Now, I understand your comments. Yes, in one of our projects, we sorted the data table and build sparse index which record the block keys and file offsets. Then, we load the index files into HBase to service for query. I works fine.
          Hide
          He Yongqiang added a comment -

          Had a talk with Prasad and Zheng about the index design.
          And i write a summary of the meeting: http://docs.google.com/View?id=dc9jpfdr_6dkrt82c7
          Any suggestion is appreciated!
          BTW, In this meeting and the doc, we did not include any other index types. And i think we first implement one index type, and after that, others will be more clear.

          Show
          He Yongqiang added a comment - Had a talk with Prasad and Zheng about the index design. And i write a summary of the meeting: http://docs.google.com/View?id=dc9jpfdr_6dkrt82c7 Any suggestion is appreciated! BTW, In this meeting and the doc, we did not include any other index types. And i think we first implement one index type, and after that, others will be more clear.
          Hide
          Prasad Chakka added a comment -

          One thing that isn't mentioned is that in List<bucketname, List<offset>> column, offsets are sorted.

          Another thing missing is, when an index on a partition is built then a new partition will be created for that index table (similar to that of creating a partition for a regular table).

          We can distinguish index tables and regular tables by having a table parameter.

          We can skip partition specific indexes in the first phase if it reduces amount of work and assume indexes defined on a table can be created on all partitions.

          Show
          Prasad Chakka added a comment - One thing that isn't mentioned is that in List<bucketname, List<offset>> column, offsets are sorted. Another thing missing is, when an index on a partition is built then a new partition will be created for that index table (similar to that of creating a partition for a regular table). We can distinguish index tables and regular tables by having a table parameter. We can skip partition specific indexes in the first phase if it reduces amount of work and assume indexes defined on a table can be created on all partitions.
          Hide
          He Yongqiang added a comment -

          A Draft version of buiding(create/update) index.
          Still needs:
          1) reducer sorts a set positions of one key in memory, which may cause out of memory when a index key occures too many times
          2) the index output format can only use IgoreKeyTextOutputFormat right now.

          Show
          He Yongqiang added a comment - A Draft version of buiding(create/update) index. Still needs: 1) reducer sorts a set positions of one key in memory, which may cause out of memory when a index key occures too many times 2) the index output format can only use IgoreKeyTextOutputFormat right now.
          Hide
          Prasad Chakka added a comment -

          1) Are you worried about the sort phase of the reducer or the IndexBuilder's reducer code? I don't think former issue will be a problem. The later issue can be avoided by writing multiple rows for a key if the number of offsets exceed a certain limit. So reducer can flush the offsets periodically to disk thus avoiding OutOfMemory exceptions in reducer.

          2) What are the other options for the index output format?

          Show
          Prasad Chakka added a comment - 1) Are you worried about the sort phase of the reducer or the IndexBuilder's reducer code? I don't think former issue will be a problem. The later issue can be avoided by writing multiple rows for a key if the number of offsets exceed a certain limit. So reducer can flush the offsets periodically to disk thus avoiding OutOfMemory exceptions in reducer. 2) What are the other options for the index output format?
          Hide
          He Yongqiang added a comment -

          1) For a given key, we are using a sorted set for each bucket to store positions at the reduer. I am worried that "one sorted set for each bucket" may cause out of memory problem.
          as you commentted earlier: "List<bucketname, List<offset>> column, offsets are sorted".
          Think about one extreme situation: one file contains a single value million times. So at the reducer we are storing million positions in a sorted set.

          >>So reducer can flush the offsets periodically to disk thus avoiding OutOfMemory exceptions in reducer.
          If we do this, how we can guarantee they are sorted. I mean offsets after this flush are greater than offsets in previous flush.

          2)What are the other options for the index output format?
          I think there is no other options. We need to discard the key part. And i think in hive only IgnoreKeyTextOutputFormat does that. And Of course all hive's custom HiveOutputFormat can discard key part, but they can not be specified in the map-reduce jobconf, since they do not extend OutputFormat.

          Show
          He Yongqiang added a comment - 1) For a given key, we are using a sorted set for each bucket to store positions at the reduer. I am worried that "one sorted set for each bucket" may cause out of memory problem. as you commentted earlier: "List<bucketname, List<offset>> column, offsets are sorted". Think about one extreme situation: one file contains a single value million times. So at the reducer we are storing million positions in a sorted set. >>So reducer can flush the offsets periodically to disk thus avoiding OutOfMemory exceptions in reducer. If we do this, how we can guarantee they are sorted. I mean offsets after this flush are greater than offsets in previous flush. 2)What are the other options for the index output format? I think there is no other options. We need to discard the key part. And i think in hive only IgnoreKeyTextOutputFormat does that. And Of course all hive's custom HiveOutputFormat can discard key part, but they can not be specified in the map-reduce jobconf, since they do not extend OutputFormat.
          Hide
          Prasad Chakka added a comment -

          well the number of offsets can't exceed number of SequenceFile blocks since we can only index the SequenceFile block offsets. So the problem is not as dire as it can be. And also if there are that many (i.e. more than 10% of rows in traditional RDBMS but may more in Hadoop case) have same key then index may not be efficient after all since it is better to read the whole table anyways.

          Show
          Prasad Chakka added a comment - well the number of offsets can't exceed number of SequenceFile blocks since we can only index the SequenceFile block offsets. So the problem is not as dire as it can be. And also if there are that many (i.e. more than 10% of rows in traditional RDBMS but may more in Hadoop case) have same key then index may not be efficient after all since it is better to read the whole table anyways.
          Hide
          Prasad Chakka added a comment -

          what i am trying to say is for such frequent keys indexing may not be of much help so may be we can relax 'sort' property? i don't think there is another easy way out other than do a disk based sort. check you can reuse any of the hadoop sorting code. Or can we piggyback this sorting on top of hadoop reduce sort phase some how?

          Show
          Prasad Chakka added a comment - what i am trying to say is for such frequent keys indexing may not be of much help so may be we can relax 'sort' property? i don't think there is another easy way out other than do a disk based sort. check you can reuse any of the hadoop sorting code. Or can we piggyback this sorting on top of hadoop reduce sort phase some how?
          Hide
          Prasad Chakka added a comment -

          i am not able to apply the patch to latest trunk.

          Show
          Prasad Chakka added a comment - i am not able to apply the patch to latest trunk.
          Hide
          He Yongqiang added a comment -

          I am about to update it. And making some modifications according to our offline talk.
          And i need to add some testcases and do more tests.
          So i will submit a new patch by this weekend, and will notify you any updates.

          Show
          He Yongqiang added a comment - I am about to update it. And making some modifications according to our offline talk. And i need to add some testcases and do more tests. So i will submit a new patch by this weekend, and will notify you any updates.
          Hide
          He Yongqiang added a comment -

          This jira page is a bit too long to follow.
          With discussions with Prasad and Zheng, will open several sub-tasks for index-building, generalize index interface to allow plugin different index strategies, and optimize queries with index etc.

          Let's keep discussions involved overall indexing here in this jira page, and move other discussions to separate jira pages.

          Show
          He Yongqiang added a comment - This jira page is a bit too long to follow. With discussions with Prasad and Zheng, will open several sub-tasks for index-building, generalize index interface to allow plugin different index strategies, and optimize queries with index etc. Let's keep discussions involved overall indexing here in this jira page, and move other discussions to separate jira pages.
          Hide
          He Yongqiang added a comment -

          Created HIVE-678 for add support for building index.
          see https://issues.apache.org/jira/browse/HIVE-678

          Show
          He Yongqiang added a comment - Created HIVE-678 for add support for building index. see https://issues.apache.org/jira/browse/HIVE-678
          Hide
          Jeff Hammerbacher added a comment -

          Another type of index worth knowing about: the "negative index"/"storage index" from Exadata, described at http://blogs.oracle.com/datawarehousing/2009/09/500gbsec_and_database_machine.html.

          We get some "negative indexing" for free with partitions, but this may be useful for more "distinctive" scans over columns for which we have not partitioned.

          Show
          Jeff Hammerbacher added a comment - Another type of index worth knowing about: the "negative index"/"storage index" from Exadata, described at http://blogs.oracle.com/datawarehousing/2009/09/500gbsec_and_database_machine.html . We get some "negative indexing" for free with partitions, but this may be useful for more "distinctive" scans over columns for which we have not partitioned.
          Hide
          Prasad Chakka added a comment -

          @jeff, i think this is more suitable for storing it along with data where blocks of data can skipped while scanning rows. i think columnar storage might already be doing this.

          Show
          Prasad Chakka added a comment - @jeff, i think this is more suitable for storing it along with data where blocks of data can skipped while scanning rows. i think columnar storage might already be doing this.
          Hide
          Jeff Hammerbacher added a comment -

          Yeah, I think so as well. Did my comment make it seem like I thought otherwise?

          Show
          Jeff Hammerbacher added a comment - Yeah, I think so as well. Did my comment make it seem like I thought otherwise?
          Hide
          Prasad Chakka added a comment -

          there can be a summary index here as well (every SequenceFile block will have min & max column values in the index). thought you are hinting at that.

          Show
          Prasad Chakka added a comment - there can be a summary index here as well (every SequenceFile block will have min & max column values in the index). thought you are hinting at that.
          Hide
          Joydeep Sen Sarma added a comment -

          are there any references on this technique?

          someone had earlier suggested this (apparently from reading Netezza documentation) - but i don't understand when it would work. why would a (fairly large) sequencefile block only limited range of values (assuming the metadata stores a min-max range). most cases i can imagine in our dataset would either have low cardinality columns (so most values would be present) or for large cardinality ones - the distribution would be random (relative to the primary sort key) - and the range would seem ineffective.

          unless there are columns that are closely related to the how data is sorted/partitioned (perhaps some product ids are limited to specific range of time - but the partitioning is on time and not product id - and even that sounds dubious).

          a bloom filter would seem much more plausible at allowing good filtering. even then don't understand why this sort of metadata should be kept along with the block and not separately (much more flexible - can be added on demand) as this jira is headed towards.

          Show
          Joydeep Sen Sarma added a comment - are there any references on this technique? someone had earlier suggested this (apparently from reading Netezza documentation) - but i don't understand when it would work. why would a (fairly large) sequencefile block only limited range of values (assuming the metadata stores a min-max range). most cases i can imagine in our dataset would either have low cardinality columns (so most values would be present) or for large cardinality ones - the distribution would be random (relative to the primary sort key) - and the range would seem ineffective. unless there are columns that are closely related to the how data is sorted/partitioned (perhaps some product ids are limited to specific range of time - but the partitioning is on time and not product id - and even that sounds dubious). a bloom filter would seem much more plausible at allowing good filtering. even then don't understand why this sort of metadata should be kept along with the block and not separately (much more flexible - can be added on demand) as this jira is headed towards.
          Hide
          Prasad Chakka added a comment -

          i don't think it makes much sense unless there is some clustering or sorting property. if there is clustering and sorting and the selectivity of a query is much higher than 10% then storing this metadata along with data makes sense instead of a separate block. the 10% threshold may be larger for Hive but the point still stands. in OLAP case data is change seldom and the size of this kind of metadata is much smaller than the data itself so the overhead of storing this data is negligible.

          something similar to this is done in DB2 Multi-Dimensional Clustering where whole blocks (disk blocks) are skipped if the key value doesn't fit the query.

          Show
          Prasad Chakka added a comment - i don't think it makes much sense unless there is some clustering or sorting property. if there is clustering and sorting and the selectivity of a query is much higher than 10% then storing this metadata along with data makes sense instead of a separate block. the 10% threshold may be larger for Hive but the point still stands. in OLAP case data is change seldom and the size of this kind of metadata is much smaller than the data itself so the overhead of storing this data is negligible. something similar to this is done in DB2 Multi-Dimensional Clustering where whole blocks (disk blocks) are skipped if the key value doesn't fit the query.
          Hide
          Joydeep Sen Sarma added a comment -

          MDC also maintains metadata separately - at least based on their paper (http://www.research.ibm.com/compsci/project_spotlight/datamgmt/SIGMOD2003.pdf)

          Show
          Joydeep Sen Sarma added a comment - MDC also maintains metadata separately - at least based on their paper ( http://www.research.ibm.com/compsci/project_spotlight/datamgmt/SIGMOD2003.pdf )
          Hide
          Prasad Chakka added a comment -

          yes they do but they don't use for table scans which are done if the query selectivity is greater than 10% (or some such). they use the index for index scans and in joins. I wrote the table scan code

          Show
          Prasad Chakka added a comment - yes they do but they don't use for table scans which are done if the query selectivity is greater than 10% (or some such). they use the index for index scans and in joins. I wrote the table scan code
          Hide
          Schubert Zhang added a comment -

          Prasad Chakka added a comment - 15/Apr/09 11:25 AM
          Another way of doing it is to create a file format that contains index along with data... but i think that would take lot more time.

          We are trying to store data in sorted and block-indexed files (such as HFile or TFile). Then I think we can know the startKey and lastKey of each file and each block. This block index(block summary) is just for primary key.

          Show
          Schubert Zhang added a comment - Prasad Chakka added a comment - 15/Apr/09 11:25 AM Another way of doing it is to create a file format that contains index along with data... but i think that would take lot more time. We are trying to store data in sorted and block-indexed files (such as HFile or TFile). Then I think we can know the startKey and lastKey of each file and each block. This block index(block summary) is just for primary key.
          Hide
          Edward Capriolo added a comment -

          I currently am benching an 11 node hive cluster against a 16 TB MySQL system 4x quad core 32 GB RAM 5.1 with partitioning.

          Hive destroys mysql with any query like:

          (date_id is my partition column.)

          set mapred.map.tasks=34;
          set mapred.reduce.tasks=11;
          FROM pageviews
          insert overwrite directory '/user/ecapriolo/hivetest4'
          select sitename_id, user_id, count(user_id)   WHERE date_id=20091250 group by sitename_id,user_id
          12098855 Rows loaded to /user/ecapriolo/hivetest4
          OK
          Time taken: 185.528 seconds
          

          The same query can take over 3000 seconds on MySQL because these large summary queries are always written to a temp table and then writes bottleneck your read queries.

          However, if mysql has an index (and if the index is in memory, which is hard in a warehouse) on some other value in the where clause like:

          select sitename_id, user_id, count(user_id)   WHERE date_id=20091250 and sitename_id=400 group by sitename_id,user_id 
          

          MySQL gets a relative performance speed-up, while hive ends up scanning the entire table.

          I agree with dhruba,
          >>This sounds really awesome! Make hadoop-hive suitable for things other than brute force table-scans!

          If we had indexes helping stop some brute force scans, that would just open up other doors to what hive could do.

          Show
          Edward Capriolo added a comment - I currently am benching an 11 node hive cluster against a 16 TB MySQL system 4x quad core 32 GB RAM 5.1 with partitioning. Hive destroys mysql with any query like: (date_id is my partition column.) set mapred.map.tasks=34; set mapred.reduce.tasks=11; FROM pageviews insert overwrite directory '/user/ecapriolo/hivetest4' select sitename_id, user_id, count(user_id) WHERE date_id=20091250 group by sitename_id,user_id 12098855 Rows loaded to /user/ecapriolo/hivetest4 OK Time taken: 185.528 seconds The same query can take over 3000 seconds on MySQL because these large summary queries are always written to a temp table and then writes bottleneck your read queries. However, if mysql has an index (and if the index is in memory, which is hard in a warehouse) on some other value in the where clause like: select sitename_id, user_id, count(user_id) WHERE date_id=20091250 and sitename_id=400 group by sitename_id,user_id MySQL gets a relative performance speed-up, while hive ends up scanning the entire table. I agree with dhruba, >>This sounds really awesome! Make hadoop-hive suitable for things other than brute force table-scans! If we had indexes helping stop some brute force scans, that would just open up other doors to what hive could do.
          Hide
          He Yongqiang added a comment -

          Got talked with Prasad about this issue today.
          I may not able to finish this in the coming one or two months. I am now spending most of my time working on some other issues. I am sorry about that.
          If anyone want this feature in, please feel free to take over from me. And i will provide all help that i can. If no one picked up, i can finish it after finishing issues at hand.
          Thanks.

          Show
          He Yongqiang added a comment - Got talked with Prasad about this issue today. I may not able to finish this in the coming one or two months. I am now spending most of my time working on some other issues. I am sorry about that. If anyone want this feature in, please feel free to take over from me. And i will provide all help that i can. If no one picked up, i can finish it after finishing issues at hand. Thanks.
          Hide
          Prafulla T added a comment -

          He Yongqiang ,
          Have you started working on this one ?
          If not, I was interested in taking a look at it.
          Patch link hive- 417-2009-07-18.patch is not working, can you share latest patch here ?

          Show
          Prafulla T added a comment - He Yongqiang , Have you started working on this one ? If not, I was interested in taking a look at it. Patch link hive- 417-2009-07-18.patch is not working, can you share latest patch here ?
          Hide
          He Yongqiang added a comment -

          Cool. Yes. i do have a latest patch for this jira. I will cleanup it and post.

          Show
          He Yongqiang added a comment - Cool. Yes. i do have a latest patch for this jira. I will cleanup it and post.
          Hide
          He Yongqiang added a comment -

          With this patch, the index can work. but it is not so intelligent.

          This is how this patch works:

          === how to create the index table and generate index data ===
          set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

          drop table src_rc_index;

          //create an index table on table src_rc, and the index col is key.
          //And the index table's data is stored using textfile (also work with seq, rcfile)
          create index src_rc_index type compact on table src_rc(key) stored as textfile;

          hive> show table extended like src_rc_index;
          tableName:src_rc_index
          owner:heyongqiang
          location:file:/user/hive/warehouse/src_rc_index
          inputformat:org.apache.hadoop.mapred.TextInputFormat
          outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          columns:struct columns

          { i32 key, string _bucketname, list<string> _offsets}

          About the index table's schema. besides the index columns from the base table, the index table has two more columns (_bucketname string, array(string) offsets )

          //generate the actuall index table's data (here also support partition)
          update index src_rc_index;

          ====How to use the index table====

          //find the offset for 'key=0' in the index table, and put the bucketname and offset list in a temp directory
          insert overwrite directory "/tmp/index_result" select `_bucketname` , `_offsets` from src_rc_index where key=0;

          set hive.exec.index_file=/tmp/index_result;

          //use a new index file format to prune inputsplit based on the offset list
          //stored in "hive.exec.index_file" which is populated in previous command
          set hive.input.format=org.apache.hadoop.hive.ql.index.io.HiveIndexInputFormat;

          //this query will not scan the whole base data
          select key, value from src_rc where key=0;

          Things done in the patch:
          1) hql command for creating index table
          2) hql command and map-reduce job for updating index (generating the index table's data).
          3) a HiveIndexInputFormat to leverage the offsets got from index table to reduce number of blocks/map-tasks

          Things need to be done:
          1) right now the index table is manually specified in queries. we need this to be more intelligent by automatically generating the plan using index .
          2) The HiveIndexInputFormat needs a new RecordReader to seek to a given offset instead of scanning the whole block.
          3) right now we use a map-reduce job to scan the whole index table to find hits offsets. But since the index table is sorted, we can leverage the sort property to avoid the map-reduce job in many cases. (easiest way is to do a binary search in client.)

          The first todo is the most important part. I think the third may need much more work (maybe not true).

          (Note: although this patch has been tested in production cluster, it could still have bugs. We will be really appreciate if you can report bugs you find here.)

          Show
          He Yongqiang added a comment - With this patch, the index can work. but it is not so intelligent. This is how this patch works: === how to create the index table and generate index data === set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; drop table src_rc_index; //create an index table on table src_rc, and the index col is key. //And the index table's data is stored using textfile (also work with seq, rcfile) create index src_rc_index type compact on table src_rc(key) stored as textfile; hive> show table extended like src_rc_index; tableName:src_rc_index owner:heyongqiang location: file:/user/hive/warehouse/src_rc_index inputformat:org.apache.hadoop.mapred.TextInputFormat outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat columns:struct columns { i32 key, string _bucketname, list<string> _offsets} About the index table's schema. besides the index columns from the base table, the index table has two more columns (_bucketname string, array(string) offsets ) //generate the actuall index table's data (here also support partition) update index src_rc_index; ====How to use the index table==== //find the offset for 'key=0' in the index table, and put the bucketname and offset list in a temp directory insert overwrite directory "/tmp/index_result" select `_bucketname` , `_offsets` from src_rc_index where key=0; set hive.exec.index_file=/tmp/index_result; //use a new index file format to prune inputsplit based on the offset list //stored in "hive.exec.index_file" which is populated in previous command set hive.input.format=org.apache.hadoop.hive.ql.index.io.HiveIndexInputFormat; //this query will not scan the whole base data select key, value from src_rc where key=0; Things done in the patch: 1) hql command for creating index table 2) hql command and map-reduce job for updating index (generating the index table's data). 3) a HiveIndexInputFormat to leverage the offsets got from index table to reduce number of blocks/map-tasks Things need to be done: 1) right now the index table is manually specified in queries. we need this to be more intelligent by automatically generating the plan using index . 2) The HiveIndexInputFormat needs a new RecordReader to seek to a given offset instead of scanning the whole block. 3) right now we use a map-reduce job to scan the whole index table to find hits offsets. But since the index table is sorted, we can leverage the sort property to avoid the map-reduce job in many cases. (easiest way is to do a binary search in client.) The first todo is the most important part. I think the third may need much more work (maybe not true). (Note: although this patch has been tested in production cluster, it could still have bugs. We will be really appreciate if you can report bugs you find here.)
          Hide
          He Yongqiang added a comment -

          I forgot to add this line "set hive.exec.compress.output=false;" in the above snippet before selecting from the index table.

          Show
          He Yongqiang added a comment - I forgot to add this line "set hive.exec.compress.output=false;" in the above snippet before selecting from the index table.
          Hide
          Prafulla T added a comment -

          I was thinking of adding something called query rewrite module.
          It would be rule-based query rewrite system and it would
          rewrite the query into semantically equivalent query which is
          more optimized and/or uses indexes (not just for scans, but
          for other query operators, e.g. GroupBy etc.)

          Eg.

          select distinct c1
          from t1;

          This query, if we have densed index ('compact summary index' in this
          hive indexing patch) on c1 can be replaced with query on index table
          itself.

          select idx_key
          from t1_cmpct_sum_idx;

          Similar query transformation can happen for other queries.

          Module will be placed just before optimizer and will help optimizer.
          Module structure looks like below.

          [Query parser]
          [Query rewrites] --> new phase
          [Query optimization]
          [Query execution planner]
          [Query execution engine]

          The rewrite module is 'generic', not just for above indexing case,
          but for other cases too, e.g. OR predicates to union (for efficiency?), outer join
          to union of anti & semi joins, moving out 'order by' out of union
          subquery etc etc.

          The aim is to implement a very simple, light-weight rewrite support,
          implement the indexing related rewrites (above rewrite does not
          even need a new run-time map-red operator) and integrate indexing
          support quickly and cleanly. As noted above, this rewrite phase
          is rule-based (and not cost-based), sort of early optimization.

          Let me know what u think. I'll start with reading ur patch.
          This would do most part from TODO 1,
          TODO 2 and 3 will have to be looked into.

          Show
          Prafulla T added a comment - I was thinking of adding something called query rewrite module. It would be rule-based query rewrite system and it would rewrite the query into semantically equivalent query which is more optimized and/or uses indexes (not just for scans, but for other query operators, e.g. GroupBy etc.) Eg. select distinct c1 from t1; This query, if we have densed index ('compact summary index' in this hive indexing patch) on c1 can be replaced with query on index table itself. select idx_key from t1_cmpct_sum_idx; Similar query transformation can happen for other queries. Module will be placed just before optimizer and will help optimizer. Module structure looks like below. [Query parser] [Query rewrites] --> new phase [Query optimization] [Query execution planner] [Query execution engine] The rewrite module is 'generic', not just for above indexing case, but for other cases too, e.g. OR predicates to union (for efficiency?), outer join to union of anti & semi joins, moving out 'order by' out of union subquery etc etc. The aim is to implement a very simple, light-weight rewrite support, implement the indexing related rewrites (above rewrite does not even need a new run-time map-red operator) and integrate indexing support quickly and cleanly. As noted above, this rewrite phase is rule-based (and not cost-based), sort of early optimization. Let me know what u think. I'll start with reading ur patch. This would do most part from TODO 1, TODO 2 and 3 will have to be looked into.
          Hide
          He Yongqiang added a comment -

          Plan sounds perfectly good to me!

          Show
          He Yongqiang added a comment - Plan sounds perfectly good to me!
          Hide
          Ashish Thusoo added a comment -

          A couple of comments on this:

          A complication that happens by doing a rewrite just after parse is that you loose the ability to report back errors that correspond to the original query. Also the
          metadata that you need to do the rewrite is only available after phase 1 of semantic analysis. So in my opinion the rewrite should be done after semantic analysis but before plan generation. Is that what you had in mind...

          so something like...

          [Query parser]
          [Query semantic analysis]
          [Query optimization]
          ...

          Show
          Ashish Thusoo added a comment - A couple of comments on this: A complication that happens by doing a rewrite just after parse is that you loose the ability to report back errors that correspond to the original query. Also the metadata that you need to do the rewrite is only available after phase 1 of semantic analysis. So in my opinion the rewrite should be done after semantic analysis but before plan generation. Is that what you had in mind... so something like... [Query parser] [Query semantic analysis] [Query optimization] ...
          Hide
          Prafulla T added a comment -

          Yes Ashish,
          Thats what I had in mind.

          Rewrite system would need metadata, and hence it should be invoked
          after semantic analysis phase which would make metadata available.

          Show
          Prafulla T added a comment - Yes Ashish, Thats what I had in mind. Rewrite system would need metadata, and hence it should be invoked after semantic analysis phase which would make metadata available.
          Hide
          Prafulla T added a comment -

          Hi All,
          Here are first set of changes about query rewrite module in Hive and
          intial rewrite rule to transform groupby and distinct queries so that
          they make use of indexes.
          Note that patch contains createIndex* related changes from Yongqiang
          Here are the files that I have touched for rewrite related changes
          ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
          – hasIndex and getIndex related APIs
          ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
          ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
          ql/src/java/org/apache/hadoop/hive/ql/rewrite/HiveRewriteEngine.java
          – Rewrite engine infrastructure
          ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/GbToCompactSumIdxRewrite.java
          – Actual rewrite rule which does above mentioned optimization
          ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRule.java
          ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRuleContext.java
          ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
          – Unit test including various queries.
          ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out

          I've tested some queries end-to-end with this GbToIdx rewrite rule on.
          It is working well.

          Rewrite changes can be disabled with hive.ql.rewrite boolean flag
          and this perticular rewrite can be disabled/enabled based on
          hive.ql.rewrite.gbtoidx boolean flag.
          Default value for hive.ql.rewrite is true
          and that for hive.ql.rewrite.gbtoidx is false.

          There are currently following limitations (which are being worked upon)
          1. Rewrite engine does not invoke rewrite in recursive manner.
          It currently just rewrites topLevel QB.
          2. GbToIdx rewrite is disabled for all queries having "where clause" , we
          can support certain "where clauses" in which all colrefs are index key
          columns.
          3. We need some API to know if indexes are up-to-date or not. We need to
          do this rewrite only when index has up-to-date data.

          Let me know your review comments.
          I am using fork of apache-git repository for development on github.
          All these changes are also available on http://github.com/prafullat/hive

          Show
          Prafulla T added a comment - Hi All, Here are first set of changes about query rewrite module in Hive and intial rewrite rule to transform groupby and distinct queries so that they make use of indexes. Note that patch contains createIndex* related changes from Yongqiang Here are the files that I have touched for rewrite related changes ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java – hasIndex and getIndex related APIs ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/rewrite/HiveRewriteEngine.java – Rewrite engine infrastructure ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/GbToCompactSumIdxRewrite.java – Actual rewrite rule which does above mentioned optimization ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRule.java ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRuleContext.java ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q – Unit test including various queries. ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out I've tested some queries end-to-end with this GbToIdx rewrite rule on. It is working well. Rewrite changes can be disabled with hive.ql.rewrite boolean flag and this perticular rewrite can be disabled/enabled based on hive.ql.rewrite.gbtoidx boolean flag. Default value for hive.ql.rewrite is true and that for hive.ql.rewrite.gbtoidx is false. There are currently following limitations (which are being worked upon) 1. Rewrite engine does not invoke rewrite in recursive manner. It currently just rewrites topLevel QB. 2. GbToIdx rewrite is disabled for all queries having "where clause" , we can support certain "where clauses" in which all colrefs are index key columns. 3. We need some API to know if indexes are up-to-date or not. We need to do this rewrite only when index has up-to-date data. Let me know your review comments. I am using fork of apache-git repository for development on github. All these changes are also available on http://github.com/prafullat/hive
          Hide
          Namit Jain added a comment -

          Looking at the patch (not yet in detail) seems to suggest the following:

          1. The index file can only be a text file.
          2. PROJECTION index is not used - I mean, to start, can we just get the basic COMPACT+SUMMARY and only support that.

          Show
          Namit Jain added a comment - Looking at the patch (not yet in detail) seems to suggest the following: 1. The index file can only be a text file. 2. PROJECTION index is not used - I mean, to start, can we just get the basic COMPACT+SUMMARY and only support that.
          Hide
          Namit Jain added a comment -

          DDLSemanticAnalyzer.java

          if (outputFormat == null)

          { outputFormat = RCFileOutputFormat.class; }

          use the default - dont hardcode.

          Show
          Namit Jain added a comment - DDLSemanticAnalyzer.java if (outputFormat == null) { outputFormat = RCFileOutputFormat.class; } use the default - dont hardcode.
          Hide
          Namit Jain added a comment -

          Few higher level comments:

          1. Populate the index at create index.
          2. Instead of proposing a new syntax, why dont we use 'alter index <INDEX_NAME> ON <TABLE_NAME> REBUILD;
          3. Since the code is in a prototype stage, can we move the index code to contrib ?

          Show
          Namit Jain added a comment - Few higher level comments: 1. Populate the index at create index. 2. Instead of proposing a new syntax, why dont we use 'alter index <INDEX_NAME> ON <TABLE_NAME> REBUILD; 3. Since the code is in a prototype stage, can we move the index code to contrib ?
          Hide
          He Yongqiang added a comment -

          1. Populate the index at create index.
          Let's do this in a followup jira.

          3. Since the code is in a prototype stage, can we move the index code to contrib ?
          Move to contrib is not good. In future we need to plug index automatically to the query.

          Show
          He Yongqiang added a comment - 1. Populate the index at create index. Let's do this in a followup jira. 3. Since the code is in a prototype stage, can we move the index code to contrib ? Move to contrib is not good. In future we need to plug index automatically to the query.
          Hide
          Jeff Hammerbacher added a comment -

          3. Since the code is in a prototype stage, can we move the index code to contrib ?

          It's been the experience of other Hadoop-related projects that contrib gets messy. It has proven effective to either keep experimental features in mainline trunk or to put them up on github.

          Show
          Jeff Hammerbacher added a comment - 3. Since the code is in a prototype stage, can we move the index code to contrib ? It's been the experience of other Hadoop-related projects that contrib gets messy. It has proven effective to either keep experimental features in mainline trunk or to put them up on github.
          Hide
          Namit Jain added a comment -

          if (work.getReducer() != null)

          { work.getReducer().jobClose(job, success, feedBack); }

          if (IndexBuilderBaseReducer.class.isAssignableFrom(this
          .getReducerClass()))

          { this.closeIndexBuilder(job, success); }

          }

          Instead of the above code in ExecDriver, IndexBuilderBaseReducer/CompactSumReducer should have a jobClose - no code
          change needed in ExecDriver.

          I would still vote for the index code to be in contrib, it will take some time to clean it up - then it should be moved to the mainline.
          Till then, it is usable, but in a prototype state.

          What we should aim for is minimum changes in ql/. and put all changes in contrib for now. As they become stable, we can pull them
          in - even the DDLSemanticAnalyzer should be factored in contrib

          Show
          Namit Jain added a comment - if (work.getReducer() != null) { work.getReducer().jobClose(job, success, feedBack); } if (IndexBuilderBaseReducer.class.isAssignableFrom(this .getReducerClass())) { this.closeIndexBuilder(job, success); } } Instead of the above code in ExecDriver, IndexBuilderBaseReducer/CompactSumReducer should have a jobClose - no code change needed in ExecDriver. I would still vote for the index code to be in contrib, it will take some time to clean it up - then it should be moved to the mainline. Till then, it is usable, but in a prototype state. What we should aim for is minimum changes in ql/. and put all changes in contrib for now. As they become stable, we can pull them in - even the DDLSemanticAnalyzer should be factored in contrib
          Hide
          Ashish Thusoo added a comment -

          Looked at the code and have some questions...

          Can you explain how the metastore object model is laid out. It seems that the table names of the index are stored in key value properties of the table that the index is created on. Is that correct? Would it be better to put a key reference from the index table to the base table instead (similar to what is done for partitions)?

          Also, how would this be used to query the table? Can you give an example?

          Is the idea here to select from the index an then pass the offsets to another query to look up the table? An example or a test which shows the query on the base table would be useful.

          Show
          Ashish Thusoo added a comment - Looked at the code and have some questions... Can you explain how the metastore object model is laid out. It seems that the table names of the index are stored in key value properties of the table that the index is created on. Is that correct? Would it be better to put a key reference from the index table to the base table instead (similar to what is done for partitions)? Also, how would this be used to query the table? Can you give an example? Is the idea here to select from the index an then pass the offsets to another query to look up the table? An example or a test which shows the query on the base table would be useful.
          Hide
          John Sichi added a comment -

          Had a chat with Ashish and Yongqiang offline, and came up with three alternatives.

          1) "Shortest path to checkin": Treat current code as prototype and move it into contrib, providing a utility for creating/updating the index, and keeping changes to core classes to a minimum. As Yongqiang pointed out, this makes it harder to follow up with automatic use of the index due to the lack of metadata. If we do this, we should create a new JIRA issue for its limited scope.

          2) "Full-fledged index support": change the JDO metamodel to add support for indexes as first class objects, and come up with a pluggable index creation+access design framework which can encompass a variety of index types likely to be needed in the future. Code from this patch would become the first such index implementation provided. If we do this, we should continue on in this truly epic JIRA issue.

          3) "Rework as materialized view": keep the JDO metamodel as is (adding a new table type for MATERIALIZED_VIEW) but change the DDL to CREATE MATERIALIZED VIEW AS SELECT ... and then come up with the system functions needed (e.g. for accessing file offsets) in order to be able to express the index construction as SQL. We would then execute view materialization in a fashion similar to CREATE TABLE AS SELECT. This approach best reflects the way the current code models an index as an ordinary table, but requires some other changes (e.g. CTAS + dynamic partitioning, something we want anyway). If we do this, we should create a new JIRA issue since it's a different feature from the user POV.

          We're aiming to reach a decision next week; input is welcome on whether these alternatives make sense (and on others we should consider).

          Since this JIRA issue is already so overloaded, we would also like to treat the following two items as separate followup JIRA issues rather than trying to address it all at once:

          • rewrite framework
          • automatic usage of index or materialized view by optimizer
          Show
          John Sichi added a comment - Had a chat with Ashish and Yongqiang offline, and came up with three alternatives. 1) "Shortest path to checkin": Treat current code as prototype and move it into contrib, providing a utility for creating/updating the index, and keeping changes to core classes to a minimum. As Yongqiang pointed out, this makes it harder to follow up with automatic use of the index due to the lack of metadata. If we do this, we should create a new JIRA issue for its limited scope. 2) "Full-fledged index support": change the JDO metamodel to add support for indexes as first class objects, and come up with a pluggable index creation+access design framework which can encompass a variety of index types likely to be needed in the future. Code from this patch would become the first such index implementation provided. If we do this, we should continue on in this truly epic JIRA issue. 3) "Rework as materialized view": keep the JDO metamodel as is (adding a new table type for MATERIALIZED_VIEW) but change the DDL to CREATE MATERIALIZED VIEW AS SELECT ... and then come up with the system functions needed (e.g. for accessing file offsets) in order to be able to express the index construction as SQL. We would then execute view materialization in a fashion similar to CREATE TABLE AS SELECT. This approach best reflects the way the current code models an index as an ordinary table, but requires some other changes (e.g. CTAS + dynamic partitioning, something we want anyway). If we do this, we should create a new JIRA issue since it's a different feature from the user POV. We're aiming to reach a decision next week; input is welcome on whether these alternatives make sense (and on others we should consider). Since this JIRA issue is already so overloaded, we would also like to treat the following two items as separate followup JIRA issues rather than trying to address it all at once: rewrite framework automatic usage of index or materialized view by optimizer
          Hide
          Prafulla T added a comment -

          Hi Yongqiang,
          I am facing some problem for creating SUMMARY indexes.
          This index is not built with update index command.
          COMPACT SUMMARY index works fine. Is there any problem with
          creation of SUMMARY index table ?

          Show
          Prafulla T added a comment - Hi Yongqiang, I am facing some problem for creating SUMMARY indexes. This index is not built with update index command. COMPACT SUMMARY index works fine. Is there any problem with creation of SUMMARY index table ?
          Hide
          He Yongqiang added a comment -

          I think SUMMARY index's mapper code is comment out in the uploaded patch.

          Show
          He Yongqiang added a comment - I think SUMMARY index's mapper code is comment out in the uploaded patch.
          Hide
          John Sichi added a comment -

          Based on discussion with Yongqiang, we've decided to go for "Full-fledged index support".

          Show
          John Sichi added a comment - Based on discussion with Yongqiang, we've decided to go for "Full-fledged index support".
          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          Any chance you guys could post a more detailed design document for "full-fledged index support"? I'm quite curious to read up on it.

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey, Any chance you guys could post a more detailed design document for "full-fledged index support"? I'm quite curious to read up on it. Thanks, Jeff
          Hide
          John Sichi added a comment -

          @Jeff: Yes, we'll put it up on the wiki, similar to how we did for storage handler + HBase.

          Show
          John Sichi added a comment - @Jeff: Yes, we'll put it up on the wiki, similar to how we did for storage handler + HBase.
          Hide
          He Yongqiang added a comment -

          Attache a new patch for review. Added metastore remodeling based on offline discussion with John and Paul.

          Also addressed namit's suggestions: "Instead of proposing a new syntax, why dont we use 'alter index <INDEX_NAME> ON <TABLE_NAME> REBUILD;"

          @Ashish,
          >>Also, how would this be used to query the table? Can you give an example?

          Added four test files in clientpositive. You can now refer thoee files for examples.

          An example from one qfile:
          CREATE INDEX srcpart_index_proj TYPE COMPACT ON TABLE srcpart(key);
          ALTER INDEX srcpart_index_proj ON srcpart REBUILD;

          SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
          INSERT OVERWRITE DIRECTORY "/tmp/index_test_index_result" SELECT `bucketname` , `_offsets` FROM defaultsrcpart_srcpart_index_proj_ x WHERE x.key=100 AND x.ds = '2008-04-08';
          SET hive.exec.index_file=/tmp/index_test_index_result;
          SET hive.input.format=org.apache.hadoop.hive.ql.index.io.HiveIndexInputFormat;
          SELECT key, value FROM srcpart WHERE key=100 AND ds = '2008-04-08';

          Will update the patch and open follow up jiras after get more comments.

          Thanks
          Yongqiang

          Show
          He Yongqiang added a comment - Attache a new patch for review. Added metastore remodeling based on offline discussion with John and Paul. Also addressed namit's suggestions: "Instead of proposing a new syntax, why dont we use 'alter index <INDEX_NAME> ON <TABLE_NAME> REBUILD;" @Ashish, >>Also, how would this be used to query the table? Can you give an example? Added four test files in clientpositive. You can now refer thoee files for examples. An example from one qfile: CREATE INDEX srcpart_index_proj TYPE COMPACT ON TABLE srcpart(key); ALTER INDEX srcpart_index_proj ON srcpart REBUILD; SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; INSERT OVERWRITE DIRECTORY "/tmp/index_test_index_result" SELECT ` bucketname` , `_offsets` FROM default srcpart_srcpart_index_proj _ x WHERE x.key=100 AND x.ds = '2008-04-08'; SET hive.exec.index_file=/tmp/index_test_index_result; SET hive.input.format=org.apache.hadoop.hive.ql.index.io.HiveIndexInputFormat; SELECT key, value FROM srcpart WHERE key=100 AND ds = '2008-04-08'; Will update the patch and open follow up jiras after get more comments. Thanks Yongqiang
          Hide
          John Sichi added a comment -

          Attaching metastore data model diagram reverse-engineered from Yongqiang's latest patch (using Power Architect); can be used in a design doc. Cosmetic: I renamed INDEXS to IDXS for consistency with TBLS, likewise for IDX_ID and other attributes.

          I marked one of the relationships between index and table as optional since we want to allow for indexes which don't store their data structure in tabular form. I'm not sure whether there's something we need to do in package.jdo to make sure the corresponding key is nullable.

          Show
          John Sichi added a comment - Attaching metastore data model diagram reverse-engineered from Yongqiang's latest patch (using Power Architect); can be used in a design doc. Cosmetic: I renamed INDEXS to IDXS for consistency with TBLS, likewise for IDX_ID and other attributes. I marked one of the relationships between index and table as optional since we want to allow for indexes which don't store their data structure in tabular form. I'm not sure whether there's something we need to do in package.jdo to make sure the corresponding key is nullable.
          Hide
          John Sichi added a comment -

          Whoops, relationships connecting TBLS/SDS and IDXS/SDS got lost; will attach another diagram which fixes that.

          Show
          John Sichi added a comment - Whoops, relationships connecting TBLS/SDS and IDXS/SDS got lost; will attach another diagram which fixes that.
          Hide
          John Sichi added a comment -

          idx2.png

          Show
          John Sichi added a comment - idx2.png
          Hide
          John Sichi added a comment -

          Here are some preliminary comments on the metastore work. We can move on to the plugin design next week and start getting all of this into a doc.

          • We should support a property on the index which controls the name of the index table, and only generate an index table name automatically in the case where the user doesn't supply the property. For this, we'll need to add property key/values to the grammar (IDXPROPERTIES like TBLPROPERTIES and SERDEPROPERTIES?).
          • The grammar supports control over the tableFileFormat for the index table; what about other attributes such as row format, location, and TBLPROPERTIES? Some of these may be dictated by the index implementation, but it may be useful to override in some cases (same as tableFileFormat).
          • Is the partitioning for the index independent of the partitioning for the table? Don't we need to allow control over this in the grammar?
          • I think we should track the status of the index (when was the last time it was rebuilt, if ever) so that we know whether it is fresh with respect to the base table data. How should we model this in such a way that it takes per-partition indexing into account?
          • Some metastore followups to be logged separately: COMMENT clause on index definition; DESCRIBE INDEX; SHOW INDEXES; dealing with base table columns being dropped/renamed out from under the index
          • For generating the index table structure, we'll need to move that to plugin (rather than in Hive.java), since each index will need a different table structure (or no table structure at all).
          • Test queries: remember to add ORDER BY for determinism. Also, I'm not sure whether it is safe to use /tmp in the local file system (it may not exist, e.g. on Windows). I used it in hbase_bulk.m, but that uses a mini HDFS cluster (not the local file system).
          • Dropping a table with an index on it currently gives the exception below (in Derby; I didn't test MySQL yet). Same for attempting to drop an index table directly (instead of dropping the index). The second case should either fail with a meaningful exception, or implicitly drop the index definition as a trigger from dropping the table.

          hive> create table t1(i int);
          OK
          hive> create index q type compact on table t1;
          OK
          hive> drop table t1;
          FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore
          NestedThrowables:
          java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint 'INDEXS_FK3' for key (12). The statement has been rolled back.
          FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

          hive> create table t5(i int);
          OK
          hive> create index r type compact on table t5;
          OK
          hive> drop table default_t5_r_;
          FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore
          NestedThrowables:
          java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint 'INDEXS_FK2' for key (17). The statement has been rolled back.
          FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

          Show
          John Sichi added a comment - Here are some preliminary comments on the metastore work. We can move on to the plugin design next week and start getting all of this into a doc. We should support a property on the index which controls the name of the index table, and only generate an index table name automatically in the case where the user doesn't supply the property. For this, we'll need to add property key/values to the grammar (IDXPROPERTIES like TBLPROPERTIES and SERDEPROPERTIES?). The grammar supports control over the tableFileFormat for the index table; what about other attributes such as row format, location, and TBLPROPERTIES? Some of these may be dictated by the index implementation, but it may be useful to override in some cases (same as tableFileFormat). Is the partitioning for the index independent of the partitioning for the table? Don't we need to allow control over this in the grammar? I think we should track the status of the index (when was the last time it was rebuilt, if ever) so that we know whether it is fresh with respect to the base table data. How should we model this in such a way that it takes per-partition indexing into account? Some metastore followups to be logged separately: COMMENT clause on index definition; DESCRIBE INDEX; SHOW INDEXES; dealing with base table columns being dropped/renamed out from under the index For generating the index table structure, we'll need to move that to plugin (rather than in Hive.java), since each index will need a different table structure (or no table structure at all). Test queries: remember to add ORDER BY for determinism. Also, I'm not sure whether it is safe to use /tmp in the local file system (it may not exist, e.g. on Windows). I used it in hbase_bulk.m, but that uses a mini HDFS cluster (not the local file system). Dropping a table with an index on it currently gives the exception below (in Derby; I didn't test MySQL yet). Same for attempting to drop an index table directly (instead of dropping the index). The second case should either fail with a meaningful exception, or implicitly drop the index definition as a trigger from dropping the table. hive> create table t1(i int); OK hive> create index q type compact on table t1 ; OK hive> drop table t1; FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore NestedThrowables: java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint 'INDEXS_FK3' for key (12). The statement has been rolled back. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> create table t5(i int); OK hive> create index r type compact on table t5 ; OK hive> drop table default_ t5_r _; FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore NestedThrowables: java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of foreign key constraint 'INDEXS_FK2' for key (17). The statement has been rolled back. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
          Hide
          He Yongqiang added a comment -

          THANKS FOR THE DETAILED COMMENTS.

          >>We should support a property on the index which controls the name of the index table, and only generate an index table name automatically in the case where the user doesn't supply the property.
          will add this in the following patch.

          >>For this, we'll need to add property key/values to the grammar (IDXPROPERTIES like TBLPROPERTIES and SERDEPROPERTIES?).
          Let's do it in a followup jira.

          >>The grammar supports control over the tableFileFormat for the index table; what about other attributes such as row format, location, and TBLPROPERTIES? Some of these may be dictated by the index implementation, but it may be useful to override in some cases (same as tableFileFormat).
          We can add this when we see the requirement. For now we can leave this out.

          >>I think we should track the status of the index (when was the last time it was rebuilt, if ever) so that we know whether it is fresh with respect to the base table data. How should we model this in such a way that it takes per-partition indexing into account?
          I think it's the same as the one of key/value property. no?

          >>Test queries: remember to add ORDER BY for determinism.
          will add this in the following patch.

          >>Also, I'm not sure whether it is safe to use /tmp in the local file system (it may not exist, e.g. on Windows). I used it in hbase_bulk.m, but that uses a mini HDFS cluster (not the local file system).
          I think it's should be ok because it's not local tmp. it's mini HDFS /tmp

          >>Dropping a table with an index on it currently gives the exception below (in Derby; I didn't test MySQL yet). Same for attempting to drop an index table directly (instead of dropping the index). The second case should either fail with a meaningful exception, or implicitly drop the index definition as a trigger from dropping the table.
          Actually this is reported by Prafulla offline. Will add this in the following patch. For the second case, i am planning to report error.

          Show
          He Yongqiang added a comment - THANKS FOR THE DETAILED COMMENTS. >>We should support a property on the index which controls the name of the index table, and only generate an index table name automatically in the case where the user doesn't supply the property. will add this in the following patch. >>For this, we'll need to add property key/values to the grammar (IDXPROPERTIES like TBLPROPERTIES and SERDEPROPERTIES?). Let's do it in a followup jira. >>The grammar supports control over the tableFileFormat for the index table; what about other attributes such as row format, location, and TBLPROPERTIES? Some of these may be dictated by the index implementation, but it may be useful to override in some cases (same as tableFileFormat). We can add this when we see the requirement. For now we can leave this out. >>I think we should track the status of the index (when was the last time it was rebuilt, if ever) so that we know whether it is fresh with respect to the base table data. How should we model this in such a way that it takes per-partition indexing into account? I think it's the same as the one of key/value property. no? >>Test queries: remember to add ORDER BY for determinism. will add this in the following patch. >>Also, I'm not sure whether it is safe to use /tmp in the local file system (it may not exist, e.g. on Windows). I used it in hbase_bulk.m, but that uses a mini HDFS cluster (not the local file system). I think it's should be ok because it's not local tmp. it's mini HDFS /tmp >>Dropping a table with an index on it currently gives the exception below (in Derby; I didn't test MySQL yet). Same for attempting to drop an index table directly (instead of dropping the index). The second case should either fail with a meaningful exception, or implicitly drop the index definition as a trigger from dropping the table. Actually this is reported by Prafulla offline. Will add this in the following patch. For the second case, i am planning to report error.
          Hide
          John Sichi added a comment -

          Preliminary draft of design doc is here

          http://wiki.apache.org/hadoop/Hive/IndexDev

          Yongqiang and I are still working out some of the details.

          Show
          John Sichi added a comment - Preliminary draft of design doc is here http://wiki.apache.org/hadoop/Hive/IndexDev Yongqiang and I are still working out some of the details.
          Hide
          Ning Zhang added a comment -

          Based on some internal discussions below are some comments about the design doc:

          1) the staleness (inconsistency) between the index and the base table should be addressed more precisely.
          Since the current implementation allows the user to query the index table directly, we should guarantee that the index is consistent with the base table at the query time. This means at the query START time, the index was built completely based on the data stored in the base table. The current design does not satisfy this criteria in that it only record the last_modification_time (LMT) of the base table and the index table, and check if the latter is larger than the former. This leaves the following example break:

          timestamp0: last update of partition P1
          timestamp1: start create index on partition P1
          timestamp2: start insert overwrite P1
          timestamp3: finish insert overwrite P1
          timestamp4: finish index creation on P1
          timestamp 5: query on P1

          The LMTs of the index and the base table are timestamp4 and timestamp3 respectively so the optimizer will conclude the index is consistent with base table. However, the index was built based on stale data at the timestamp5. So the index should not be used.

          Instead of recording the LMT of the index table, we probably should record the LMT of the base table in the index metadata at the beginning of the index creation. In the above example, the timestamp recorded in the index metadata should be timestamp0. This means the index was created based on the base table at timestamp0. At the query time, we should check timestamp0 against timestamp 3, which correctly conclude the index is stale.

          BTW, all the timestamp should be coming from some centralized clock such as the DFS directory update time (from the namenode).

          2) The above consistency problem does not only present in the case of "DEFERRED REBUILD". Even if the index rebuild starts right away after INSERT OVERWRITE, there is still a time window that the index is stale (before the index creation is complete). So we need the same mechanism to figure out stale indexes.

          3) I think a lock-based concurrency may not be the best choice as well. If the index creation takes a long time, it defers the availability of the base table. If we have the optimizer, we should always query against the base tables, and let the optimizer to figure out whether an index is available and fresh. So if an index creation is not finished, we can just use the base table, otherwise we can use the index if the cost is less expensive.

          4) Another case is that if the index creation finished and the query is using the index, and then an DML happened on the base table and finished before the query finish. Here we only guarantee snapshot consistency (results consisting with the data at the beginning of the query, not after the query).

          5) If we have the mechanism to check consistency of the index, then the "index rebuild" command could just return if the index is consistent. We can also allow a "force" option in case we need to compensate for bad metadata.

          Show
          Ning Zhang added a comment - Based on some internal discussions below are some comments about the design doc: 1) the staleness (inconsistency) between the index and the base table should be addressed more precisely. Since the current implementation allows the user to query the index table directly, we should guarantee that the index is consistent with the base table at the query time. This means at the query START time, the index was built completely based on the data stored in the base table. The current design does not satisfy this criteria in that it only record the last_modification_time (LMT) of the base table and the index table, and check if the latter is larger than the former. This leaves the following example break: timestamp0: last update of partition P1 timestamp1: start create index on partition P1 timestamp2: start insert overwrite P1 timestamp3: finish insert overwrite P1 timestamp4: finish index creation on P1 timestamp 5: query on P1 The LMTs of the index and the base table are timestamp4 and timestamp3 respectively so the optimizer will conclude the index is consistent with base table. However, the index was built based on stale data at the timestamp5. So the index should not be used. Instead of recording the LMT of the index table, we probably should record the LMT of the base table in the index metadata at the beginning of the index creation. In the above example, the timestamp recorded in the index metadata should be timestamp0. This means the index was created based on the base table at timestamp0. At the query time, we should check timestamp0 against timestamp 3, which correctly conclude the index is stale. BTW, all the timestamp should be coming from some centralized clock such as the DFS directory update time (from the namenode). 2) The above consistency problem does not only present in the case of "DEFERRED REBUILD". Even if the index rebuild starts right away after INSERT OVERWRITE, there is still a time window that the index is stale (before the index creation is complete). So we need the same mechanism to figure out stale indexes. 3) I think a lock-based concurrency may not be the best choice as well. If the index creation takes a long time, it defers the availability of the base table. If we have the optimizer, we should always query against the base tables, and let the optimizer to figure out whether an index is available and fresh. So if an index creation is not finished, we can just use the base table, otherwise we can use the index if the cost is less expensive. 4) Another case is that if the index creation finished and the query is using the index, and then an DML happened on the base table and finished before the query finish. Here we only guarantee snapshot consistency (results consisting with the data at the beginning of the query, not after the query). 5) If we have the mechanism to check consistency of the index, then the "index rebuild" command could just return if the index is consistent. We can also allow a "force" option in case we need to compensate for bad metadata.
          Hide
          He Yongqiang added a comment -

          Attach a new diff for review.

          Update:
          Based on some offline discussions, this patch takes a virtual column approach to build index.
          Added 2 virtual columns in this patch, one is input filename, and the other is file-wise block_offset.
          An example of querying virtual columns:
          select INPUT_FILENAME, key, BLOCKOFFSETINSIDE_FILE from src;

          Show
          He Yongqiang added a comment - Attach a new diff for review. Update: Based on some offline discussions, this patch takes a virtual column approach to build index. Added 2 virtual columns in this patch, one is input filename, and the other is file-wise block_offset. An example of querying virtual columns: select INPUT_ FILE NAME, key, BLOCK OFFSET INSIDE _FILE from src;
          Hide
          John Sichi added a comment -

          Yongqiang, I looked at hive.indexing.10.patch, but I don't see the virtual columns in there?

          Show
          John Sichi added a comment - Yongqiang, I looked at hive.indexing.10.patch, but I don't see the virtual columns in there?
          Hide
          John Sichi added a comment -

          Metastore upgrade scripts will need to be added in HIVE-1427.

          Show
          John Sichi added a comment - Metastore upgrade scripts will need to be added in HIVE-1427 .
          Hide
          He Yongqiang added a comment -

          Sorry, hive.indexing.10.patch was created from a wrong workspace.
          Attached a new diff.

          Show
          He Yongqiang added a comment - Sorry, hive.indexing.10.patch was created from a wrong workspace. Attached a new diff.
          Hide
          John Sichi added a comment -

          First pass of review comments on latest patch (I'll probably have more tomorrow).

          • INDEX_NAME precision in the metastore should be 128 characters (not 767), following convention for other identifiers
          • I don't think we need INDEX_TABLE_NAME at all in the metastore; it should only be used during CREATE INDEX and then forgotten
          • Move HiveIndexInputFormat and HiveIndexResult to package org.apache.hadoop.hive.ql.index.compact, and add Compact in their names (I'd still prefer to move this entire package out to a new subproj, but I guess we can skip that part now since most of the code went away with the virtual column approach); rename property hive.exec.index_file to hive.index.compact.file
          • Support WITH DEFERRED REBUILD, and require this to be specified for now to avoid confusion (per discussion in design meeting)
          • when generating reentrant INSERT, need to quote identifiers such as table/column names (use HiveUtils.unparseIdentifier), and may need extra escaping for special characters in getPartKVPairStringArray (I'm not sure--check with Paul)
          • thread_local should be private (and named threadLocal); go through public IOContext.get() instead; likewise use public getter/setter methods on IOContext instead of accessing its data members directly
          • need ORDER BY in virtual_column.q
          • remove extra semicolon in other ORDER BY's, and make sure they cover a unique key in all cases
          • don't need TYPE and UPDATE as keywords in grammar
          Show
          John Sichi added a comment - First pass of review comments on latest patch (I'll probably have more tomorrow). INDEX_NAME precision in the metastore should be 128 characters (not 767), following convention for other identifiers I don't think we need INDEX_TABLE_NAME at all in the metastore; it should only be used during CREATE INDEX and then forgotten Move HiveIndexInputFormat and HiveIndexResult to package org.apache.hadoop.hive.ql.index.compact, and add Compact in their names (I'd still prefer to move this entire package out to a new subproj, but I guess we can skip that part now since most of the code went away with the virtual column approach); rename property hive.exec.index_file to hive.index.compact.file Support WITH DEFERRED REBUILD, and require this to be specified for now to avoid confusion (per discussion in design meeting) when generating reentrant INSERT, need to quote identifiers such as table/column names (use HiveUtils.unparseIdentifier), and may need extra escaping for special characters in getPartKVPairStringArray (I'm not sure--check with Paul) thread_local should be private (and named threadLocal); go through public IOContext.get() instead; likewise use public getter/setter methods on IOContext instead of accessing its data members directly need ORDER BY in virtual_column.q remove extra semicolon in other ORDER BY's, and make sure they cover a unique key in all cases don't need TYPE and UPDATE as keywords in grammar
          Hide
          John Sichi added a comment -

          Whoops, forgot two leftover from a private diff review:

          • metastore/if/hive_metastore.thrift:102 instead of including the full indexTable structure inside the Index structure, can we omit it but then pass it as an additional parameter to add_index?
          • ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java:86 Move generic partition analysis out into Hive, since it will be the same for all plugins.

          We can talk more about these tomorrow if it's not clear.

          Show
          John Sichi added a comment - Whoops, forgot two leftover from a private diff review: metastore/if/hive_metastore.thrift:102 instead of including the full indexTable structure inside the Index structure, can we omit it but then pass it as an additional parameter to add_index? ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java:86 Move generic partition analysis out into Hive, since it will be the same for all plugins. We can talk more about these tomorrow if it's not clear.
          Hide
          John Sichi added a comment -

          Regarding the mkset function: can we rename this to collect_array to hint that it is a UDAF? The @Description should also make this clear.

          Collect is the standard SQL name for this aggregate function, but the standard version returns a multiset rather than an array, so let's call it collect_array to be specific.

          Also, it will need its own independent unit tests (open a followup JIRA issue for this).

          Show
          John Sichi added a comment - Regarding the mkset function: can we rename this to collect_array to hint that it is a UDAF? The @Description should also make this clear. Collect is the standard SQL name for this aggregate function, but the standard version returns a multiset rather than an array, so let's call it collect_array to be specific. Also, it will need its own independent unit tests (open a followup JIRA issue for this).
          Hide
          Ashish Thusoo added a comment -

          Started looking at this. One initial question I had - why is virtualcolumn class in the serde2 package?

          Show
          Ashish Thusoo added a comment - Started looking at this. One initial question I had - why is virtualcolumn class in the serde2 package?
          Hide
          John Sichi added a comment -

          Another followup needed: REBUILD should be propagating lineage and read/write info from the reentrant INSERT statement up to the top-level statement so that hooks get called with the right information.

          Show
          John Sichi added a comment - Another followup needed: REBUILD should be propagating lineage and read/write info from the reentrant INSERT statement up to the top-level statement so that hooks get called with the right information.
          Hide
          Ashish Thusoo added a comment -

          Also, how is the file name populated? That is not done through the IOContext?

          Show
          Ashish Thusoo added a comment - Also, how is the file name populated? That is not done through the IOContext?
          Hide
          He Yongqiang added a comment -

          @Ashish
          >>why is virtualcolumn class in the serde2 package?
          will put it to ql.io package. I put it to serde2 package just because i thought it maybe needed by the serde layer. Since all codes are almost done and it is not accessed by serde,
          it makes sense to move it to ql.
          >>how is the file name populated
          filename and block offset are all populated by record reader. filename is populated by looking at the split path when we construct the record reader. Offset is generated at runtime by record reader.

          Show
          He Yongqiang added a comment - @Ashish >>why is virtualcolumn class in the serde2 package? will put it to ql.io package. I put it to serde2 package just because i thought it maybe needed by the serde layer. Since all codes are almost done and it is not accessed by serde, it makes sense to move it to ql. >>how is the file name populated filename and block offset are all populated by record reader. filename is populated by looking at the split path when we construct the record reader. Offset is generated at runtime by record reader.
          Hide
          He Yongqiang added a comment -

          IOContext is just a container, HiveContextAwareRecordReader is responsible for filling it with actual values.

          Show
          He Yongqiang added a comment - IOContext is just a container, HiveContextAwareRecordReader is responsible for filling it with actual values.
          Hide
          He Yongqiang added a comment -

          A new patch integrates comments from John and Ashish. Thanks!

          Show
          He Yongqiang added a comment - A new patch integrates comments from John and Ashish. Thanks!
          Hide
          John Sichi added a comment -

          Thanks Yongqiang. Looking at it now.

          Show
          John Sichi added a comment - Thanks Yongqiang. Looking at it now.
          Hide
          John Sichi added a comment -

          +1. Will commit when tests pass. I noticed a number of trivial issues (like Javadoc mismatches) which I'll put in a followup.

          Show
          John Sichi added a comment - +1. Will commit when tests pass. I noticed a number of trivial issues (like Javadoc mismatches) which I'll put in a followup.
          Hide
          Joydeep Sen Sarma added a comment -

          i am waiting for a commit on hive-1408. that's probably gonna collide.

          Show
          Joydeep Sen Sarma added a comment - i am waiting for a commit on hive-1408. that's probably gonna collide.
          Hide
          John Sichi added a comment -

          Thanks Joydeep. Yeah, this one has tons of plan diffs due to the virtual columns.

          Show
          John Sichi added a comment - Thanks Joydeep. Yeah, this one has tons of plan diffs due to the virtual columns.
          Hide
          John Sichi added a comment -

          Yongqiang, I passed tests on Hadoop 0.20, but Ning has committed HIVE-1408, which conflicts, so you'll need to rebase against that and then I'll try again.

          Show
          John Sichi added a comment - Yongqiang, I passed tests on Hadoop 0.20, but Ning has committed HIVE-1408 , which conflicts, so you'll need to rebase against that and then I'll try again.
          Hide
          John Sichi added a comment -

          Since another patch is needed, here are the review comments I mentioned above.

          • Javadoc for Hive.createIndex needs parameters fixed
          • Javadoc for HiveIndexHandler.analyzeIndexDefinition: remove storageDesc]
          • In HiveUtils.getIndexHandler: the message should be "Error in loading index handler" rather than "Error in loading storage handler"
          • GenericUDAFCollectSet @Description : "with no duplication elements" should be "with duplicate elements eliminated"
          • DDLSemanticAnalyzer.analyzeCreateIndex: hanlder is misspelled
          • Property AbstractIndexHandler.INDEX_COLS_KEY is never used; get rid of it?
          • For HiveIndex.INDEX_TABLE_CREATETIME property name, spell out "lastModifiedTime" instead of "lmt"
          Show
          John Sichi added a comment - Since another patch is needed, here are the review comments I mentioned above. Javadoc for Hive.createIndex needs parameters fixed Javadoc for HiveIndexHandler.analyzeIndexDefinition: remove storageDesc] In HiveUtils.getIndexHandler: the message should be "Error in loading index handler" rather than "Error in loading storage handler" GenericUDAFCollectSet @Description : "with no duplication elements" should be "with duplicate elements eliminated" DDLSemanticAnalyzer.analyzeCreateIndex: hanlder is misspelled Property AbstractIndexHandler.INDEX_COLS_KEY is never used; get rid of it? For HiveIndex.INDEX_TABLE_CREATETIME property name, spell out "lastModifiedTime" instead of "lmt"
          Hide
          He Yongqiang added a comment -

          a new patch against trunk

          Show
          He Yongqiang added a comment - a new patch against trunk
          Hide
          John Sichi added a comment -

          OK, testing lucky patch 13...

          Show
          John Sichi added a comment - OK, testing lucky patch 13...
          Hide
          John Sichi added a comment -

          Committed. Thanks Yongqiang!

          Show
          John Sichi added a comment - Committed. Thanks Yongqiang!
          Hide
          He Yongqiang added a comment -

          For mysql metastore upgrade, please refer to
          http://wiki.apache.org/hadoop/Hive/IndexDev#Metastore_Upgrades

          Show
          He Yongqiang added a comment - For mysql metastore upgrade, please refer to http://wiki.apache.org/hadoop/Hive/IndexDev#Metastore_Upgrades

            People

            • Assignee:
              He Yongqiang
              Reporter:
              Prasad Chakka
            • Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development