Hive
  1. Hive
  2. HIVE-2128

Automatic Indexing with multiple tables

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: Indexing
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables.

      1. HIVE-2128.8.patch
        99 kB
        Syed S. Albiz
      2. HIVE-2128.7.patch
        103 kB
        Syed S. Albiz
      3. HIVE-2128.6.patch
        102 kB
        Syed S. Albiz
      4. HIVE-2128.5.patch
        93 kB
        Syed S. Albiz
      5. HIVE-2128.4.patch
        95 kB
        Syed S. Albiz
      6. HIVE-2128.2.patch
        41 kB
        Syed S. Albiz
      7. HIVE-2128.1.patch
        10 kB
        Syed S. Albiz
      8. HIVE-2128.1.patch
        29 kB
        Syed S. Albiz

        Issue Links

          Activity

          Hide
          John Sichi added a comment -

          HiveInputFormat already keeps track of the mapping from path to input format. So the idea here is that instead of setting HiveIndexedInputFormat globally for the entire job, we need to be associating it only with the paths that are supposed to have index filtering applied.

          Show
          John Sichi added a comment - HiveInputFormat already keeps track of the mapping from path to input format. So the idea here is that instead of setting HiveIndexedInputFormat globally for the entire job, we need to be associating it only with the paths that are supposed to have index filtering applied.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1010/
          -----------------------------------------------------------

          Review request for hive and John Sichi.

          Summary
          -------

          Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.

          This addresses bug HIVE-2128.
          https://issues.apache.org/jira/browse/HIVE-2128

          Diffs


          ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
          ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
          ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
          ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6

          Diff: https://reviews.apache.org/r/1010/diff

          Testing
          -------

          added new testcase index_auto_mult_tables.q

          Thanks,

          Syed

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ ----------------------------------------------------------- Review request for hive and John Sichi. Summary ------- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128 . https://issues.apache.org/jira/browse/HIVE-2128 Diffs ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 Diff: https://reviews.apache.org/r/1010/diff Testing ------- added new testcase index_auto_mult_tables.q Thanks, Syed
          Hide
          John Sichi added a comment -

          I don't see the new testcase in either the patch or the reviewboard entry?

          Show
          John Sichi added a comment - I don't see the new testcase in either the patch or the reviewboard entry?
          Hide
          Syed S. Albiz added a comment -

          updated to include testcase

          Show
          Syed S. Albiz added a comment - updated to include testcase
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1010/
          -----------------------------------------------------------

          (Updated 2011-07-06 00:03:20.513755)

          Review request for hive and John Sichi.

          Changes
          -------

          updated patch to include testcase

          Summary
          -------

          Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.

          This addresses bug HIVE-2128.
          https://issues.apache.org/jira/browse/HIVE-2128

          Diffs (updated)


          ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
          ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
          ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
          ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
          ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/1010/diff

          Testing
          -------

          added new testcase index_auto_mult_tables.q

          Thanks,

          Syed

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ ----------------------------------------------------------- (Updated 2011-07-06 00:03:20.513755) Review request for hive and John Sichi. Changes ------- updated patch to include testcase Summary ------- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128 . https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 090ecfc ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1010/diff Testing ------- added new testcase index_auto_mult_tables.q Thanks, Syed
          Hide
          John Sichi added a comment -

          I don't think this approach works...what if both tables in the query have an applicable filter?

          In that case, don't we need a separate blockfilter file per table alias?

          Show
          John Sichi added a comment - I don't think this approach works...what if both tables in the query have an applicable filter? In that case, don't we need a separate blockfilter file per table alias?
          Hide
          Syed S. Albiz added a comment -

          Hmm, you're right, we would need multiple blockfilter files in that case. I'm not sure if that is possible though in the framework we have, given that we choose one single index query to generate each time at each stage. In the case where we have multiple indexes involved at each stage, the index handler is responsible for combining index inputs into a single output file. For example, I have attached a case that seems like it might cause problems (both src and srcpart have indexes built and filters applied), however since the index queries are generated separately for separate table scans it seems like there is no collision between blockfilter files. Does this address the issue? It seems like there might still be a possibility of collision between table and blockfilter file, so I wonder if you had a different case in mind?

          Show
          Syed S. Albiz added a comment - Hmm, you're right, we would need multiple blockfilter files in that case. I'm not sure if that is possible though in the framework we have, given that we choose one single index query to generate each time at each stage. In the case where we have multiple indexes involved at each stage, the index handler is responsible for combining index inputs into a single output file. For example, I have attached a case that seems like it might cause problems (both src and srcpart have indexes built and filters applied), however since the index queries are generated separately for separate table scans it seems like there is no collision between blockfilter files. Does this address the issue? It seems like there might still be a possibility of collision between table and blockfilter file, so I wonder if you had a different case in mind?
          Hide
          John Sichi added a comment -

          I was thinking of the case of compact indexes (one on each table).

          Your test case is similar, but for bitmap indexes. We certainly should not be trying to combine the indexes in this case since they are on different tables! The plan looks strange already because it is applying the srcpart predicate twice, and the src index not at all. (It's hard to tell what's going on since the same predicate is applied on both tables; use a different predicate to see if it's two copies of the same vs one of each.)

          Regardless of index type, I think we should be able to use indexes on different tables at once in the same query.

          Show
          John Sichi added a comment - I was thinking of the case of compact indexes (one on each table). Your test case is similar, but for bitmap indexes. We certainly should not be trying to combine the indexes in this case since they are on different tables! The plan looks strange already because it is applying the srcpart predicate twice, and the src index not at all. (It's hard to tell what's going on since the same predicate is applied on both tables; use a different predicate to see if it's two copies of the same vs one of each.) Regardless of index type, I think we should be able to use indexes on different tables at once in the same query.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1010/
          -----------------------------------------------------------

          (Updated 2011-07-13 00:29:56.738368)

          Review request for hive and John Sichi.

          Changes
          -------

          Revamped approach. We already uniquely assign filenames to each index query result, so instead of throwing those away, keep them in the indexIntermediateFile variable, and take the union of those input paths to generate the next set of input splits.

          Summary
          -------

          Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.

          This addresses bug HIVE-2128.
          https://issues.apache.org/jira/browse/HIVE-2128

          Diffs (updated)


          ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION
          ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
          ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 02ab78c
          ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
          ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
          ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
          ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
          ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
          ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/1010/diff

          Testing
          -------

          added new testcase index_auto_mult_tables.q

          Thanks,

          Syed

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ ----------------------------------------------------------- (Updated 2011-07-13 00:29:56.738368) Review request for hive and John Sichi. Changes ------- Revamped approach. We already uniquely assign filenames to each index query result, so instead of throwing those away, keep them in the indexIntermediateFile variable, and take the union of those input paths to generate the next set of input splits. Summary ------- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128 . https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/Driver.java b278ffe ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 02ab78c ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1010/diff Testing ------- added new testcase index_auto_mult_tables.q Thanks, Syed
          Hide
          John Sichi added a comment -

          Could you make sure the latest patch is uploaded here and matching Review Board, and then click Submit Patch? Also make sure all spurious changes (like extra imports) are gone; I'm seeing some of those in Review Board.

          Show
          John Sichi added a comment - Could you make sure the latest patch is uploaded here and matching Review Board, and then click Submit Patch? Also make sure all spurious changes (like extra imports) are gone; I'm seeing some of those in Review Board.
          Hide
          Syed S. Albiz added a comment -

          removed unnecessary imports

          Show
          Syed S. Albiz added a comment - removed unnecessary imports
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1010/
          -----------------------------------------------------------

          (Updated 2011-07-19 03:15:17.006396)

          Review request for hive and John Sichi.

          Changes
          -------

          removed unnecessary imports from patch

          Summary
          -------

          Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.

          This addresses bug HIVE-2128.
          https://issues.apache.org/jira/browse/HIVE-2128

          Diffs (updated)


          ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION
          ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
          ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
          ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
          ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
          ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
          ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e

          Diff: https://reviews.apache.org/r/1010/diff

          Testing
          -------

          added new testcase index_auto_mult_tables.q

          Thanks,

          Syed

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ ----------------------------------------------------------- (Updated 2011-07-19 03:15:17.006396) Review request for hive and John Sichi. Changes ------- removed unnecessary imports from patch Summary ------- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128 . https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e Diff: https://reviews.apache.org/r/1010/diff Testing ------- added new testcase index_auto_mult_tables.q Thanks, Syed
          Hide
          John Sichi added a comment -

          Comments on Review Board. After that it looks good to go!

          Show
          John Sichi added a comment - Comments on Review Board. After that it looks good to go!
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1010/#review1112
          -----------------------------------------------------------

          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java
          <https://reviews.apache.org/r/1010/#comment2271>

          Why was this comment truncated?

          ql/src/test/queries/clientpositive/index_auto_mult_tables.q
          <https://reviews.apache.org/r/1010/#comment2273>

          All of these SELECT statements need ORDER BY for determinism.

          • John

          On 2011-07-19 03:15:17, Syed Albiz wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1010/

          -----------------------------------------------------------

          (Updated 2011-07-19 03:15:17)

          Review request for hive and John Sichi.

          Summary

          -------

          Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.

          This addresses bug HIVE-2128.

          https://issues.apache.org/jira/browse/HIVE-2128

          Diffs

          -----

          ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION

          ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION

          ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION

          ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION

          ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6

          ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION

          ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION

          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e

          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d

          ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5

          ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946

          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f

          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e

          Diff: https://reviews.apache.org/r/1010/diff

          Testing

          -------

          added new testcase index_auto_mult_tables.q

          Thanks,

          Syed

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/#review1112 ----------------------------------------------------------- ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java < https://reviews.apache.org/r/1010/#comment2271 > Why was this comment truncated? ql/src/test/queries/clientpositive/index_auto_mult_tables.q < https://reviews.apache.org/r/1010/#comment2273 > All of these SELECT statements need ORDER BY for determinism. John On 2011-07-19 03:15:17, Syed Albiz wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ ----------------------------------------------------------- (Updated 2011-07-19 03:15:17) Review request for hive and John Sichi. Summary ------- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128 . https://issues.apache.org/jira/browse/HIVE-2128 Diffs ----- ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexQueryContext.java 617723e Diff: https://reviews.apache.org/r/1010/diff Testing ------- added new testcase index_auto_mult_tables.q Thanks, Syed
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1010/
          -----------------------------------------------------------

          (Updated 2011-07-21 23:52:23.929900)

          Review request for hive and John Sichi.

          Changes
          -------

          Added order by to testcases. This revealed an existing bug where we would walk the entire operator tree for each task in the task tree in IndexWhereTaskDispatcher. I amended this to only walk the subset of the operator tree in the current task.

          Summary
          -------

          Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust.

          This addresses bug HIVE-2128.
          https://issues.apache.org/jira/browse/HIVE-2128

          Diffs (updated)


          ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 4c9efd1
          ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e
          ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d
          ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5
          ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java da084f6
          ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6
          ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION
          ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION
          ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION
          ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION

          Diff: https://reviews.apache.org/r/1010/diff

          Testing
          -------

          added new testcase index_auto_mult_tables.q

          Thanks,

          Syed

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ ----------------------------------------------------------- (Updated 2011-07-21 23:52:23.929900) Review request for hive and John Sichi. Changes ------- Added order by to testcases. This revealed an existing bug where we would walk the entire operator tree for each task in the task tree in IndexWhereTaskDispatcher. I amended this to only walk the subset of the operator tree in the current task. Summary ------- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128 . https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 4c9efd1 ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java da084f6 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1010/diff Testing ------- added new testcase index_auto_mult_tables.q Thanks, Syed
          Hide
          John Sichi added a comment -

          +1. Will commit when tests pass.

          Show
          John Sichi added a comment - +1. Will commit when tests pass.
          Hide
          John Sichi added a comment -

          I got failures in the new tests:

          testCliDriver_index_auto_mult_tables
          testCliDriver_index_auto_mult_tables_compact
          testCliDriver_index_auto_self_join

          Plus this existing test:

          testCliDriver_index_bitmap_auto_partitioned

          Maybe you forgot to update the logs?

          Show
          John Sichi added a comment - I got failures in the new tests: testCliDriver_index_auto_mult_tables testCliDriver_index_auto_mult_tables_compact testCliDriver_index_auto_self_join Plus this existing test: testCliDriver_index_bitmap_auto_partitioned Maybe you forgot to update the logs?
          Hide
          Syed S. Albiz added a comment -

          Sorry, I forgot to regenerate the testcase outputs. Fixed in this patch

          Show
          Syed S. Albiz added a comment - Sorry, I forgot to regenerate the testcase outputs. Fixed in this patch
          Hide
          John Sichi added a comment -

          Reran with latest patch and still got failures in these three:

          testCliDriver_index_auto_mult_tables
          testCliDriver_index_auto_mult_tables_compact
          testCliDriver_index_auto_self_join

          Show
          John Sichi added a comment - Reran with latest patch and still got failures in these three: testCliDriver_index_auto_mult_tables testCliDriver_index_auto_mult_tables_compact testCliDriver_index_auto_self_join
          Hide
          Syed S. Albiz added a comment -

          Ah, sorry, forgot to git rebase before regenerating the patch, some of the recently landed patches introduced changes to the testcase output.

          Show
          Syed S. Albiz added a comment - Ah, sorry, forgot to git rebase before regenerating the patch, some of the recently landed patches introduced changes to the testcase output.
          Hide
          John Sichi added a comment -

          Committed. Thanks Syed!

          Show
          John Sichi added a comment - Committed. Thanks Syed!
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #848 (See https://builds.apache.org/job/Hive-trunk-h0.21/848/)
          HIVE-2128. Automatic Indexing with multiple tables.
          (Syed Albiz via jvs)

          jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150962
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java
          • /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out
          • /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
          • /hive/trunk/ql/src/test/queries/clientpositive/index_auto_self_join.q
          • /hive/trunk/ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java
          • /hive/trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java
          • /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out
          • /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables.q
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #848 (See https://builds.apache.org/job/Hive-trunk-h0.21/848/ ) HIVE-2128 . Automatic Indexing with multiple tables. (Syed Albiz via jvs) jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1150962 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java /hive/trunk/ql/src/test/queries/clientpositive/index_auto_self_join.q /hive/trunk/ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java /hive/trunk/ql/src/test/results/clientpositive/index_auto_self_join.q.out /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java /hive/trunk/ql/src/test/results/clientpositive/index_auto_mult_tables.q.out /hive/trunk/ql/src/test/queries/clientpositive/index_auto_mult_tables.q

            People

            • Assignee:
              Syed S. Albiz
              Reporter:
              Russell Melick
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development