Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29059

[SPIP] Support for Hive Materialized Views in Spark SQL.

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Later
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark Catalyst does not optimize queries against Hive tables using Materialized View the way Apache Calcite does it for Hive. This Jira is to add support for the same.

      We have developed it in our internal trunk and would like to open source it. It would consist of 3 major parts:

      1. Reading MV related Hive Metadata
      2. Implication Engine which would figure out if an expression exp1 implies another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar to RexImplication checker in Apache Calcite.
      3. Catalyst rule to replace tables by it's Materialized view using Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * from foo where x > 10 && x <110'  then query 'select * from foo where x > 70 and x < 100' will be transformed into 'select * from mv where x >70 and x < 100'

      Note that Implication Engine and Catalyst Rule is generic can be used even when Spark decides to have it's own Materialized View.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                amargoor Amogh Margoor
              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: