Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29059

[SPIP] Support for Hive Materialized Views in Spark SQL.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Later
    • 3.1.0
    • None
    • Spark Core
    • None

    Description

      Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark Catalyst does not optimize queries against Hive tables using Materialized View the way Apache Calcite does it for Hive. This Jira is to add support for the same.

      We have developed it in our internal trunk and would like to open source it. It would consist of 3 major parts:

      1. Reading MV related Hive Metadata
      2. Implication Engine which would figure out if an expression exp1 implies another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar to RexImplication checker in Apache Calcite.
      3. Catalyst rule to replace tables by it's Materialized view using Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * from foo where x > 10 && x <110'  then query 'select * from foo where x > 70 and x < 100' will be transformed into 'select * from mv where x >70 and x < 100'

      Note that Implication Engine and Catalyst Rule is generic can be used even when Spark decides to have it's own Materialized View.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              amargoor Amogh Margoor
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: