Hive
  1. Hive
  2. HIVE-2232

Any query to a partition column should access the metastore and not the data

    Details

      Description

      The metastore contains all of the data on the possible values, etc., for all partition columns (including subpartitions). So, any query that actually reads or uses data from partition columns should avoid table scans.

      For example:

      CREATE TABLE t1 (value1 STRING) PARTITIONED ON (ds STRING, key STRING);
      CREATE TABLE t2 (key STRING, value2 STRING) PARTITIONED ON (ds STRING);

      ...

      SELECT t2.key, t1.value1, t2.value2 FROM t1 JOIN t2 ON t1.key=t2.key AND t1.ds='2010-01-01' AND t2.ds='2010-01-01';

      ...ideally, the JOIN in this case would operate very very quickly without scanning every row of t1--because every value of t1.key is in the metastore because it is a partition column. This is just one example. Partition pruning is another example that currently works well.

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          HIVE-1003 is trying to do this.

          Show
          Ashutosh Chauhan added a comment - HIVE-1003 is trying to do this.

            People

            • Assignee:
              Unassigned
              Reporter:
              Adam Kramer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development