Uploaded image for project: 'IMPALA'
  2. IMPALA-10723

Allow basic querying and computing stats on a materialized view



    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:
    • Epic Color:


      Currently, in Impala, a Materialized View (MV) created via Hive is visible through the metadata catalog and can be queried but the query is expanded to its corresponding view definition. This is incorrect because a materialized view is a regular (physical) table, not a view. Even though Impala does not support either creating MV or automatic rewriting to use MV, querying an MV directly should be allowed.

      Here's the current behavior:

      [localhost:21050] functional> explain select * from materialized_view;
      Query: explain select * from materialized_view
      | Explain String                                                                     |
      | Max Per-Host Resource Reservation: Memory=4.00MB Threads=2                         |
      | Per-Host Resource Estimates: Memory=10MB                                           |
      | WARNING: The following tables are missing relevant table and/or column statistics. |
      | functional.insert_only_transactional_table                                         |
      |                                                                                    |
      | PLAN-ROOT SINK                                                                     |
      | |                                                                                  |
      | 01:EXCHANGE [UNPARTITIONED]                                                        |
      | |                                                                                  |
      | 00:SCAN HDFS [functional.insert_only_transactional_table]                          |
      |    HDFS partitions=1/1 files=0 size=0B                                             |
      |    row-size=4B cardinality=0                                                       |

      Note that the plan shows the scan of the underlying table instead of the materialized_view table. We should only be scanning the MV (including applying partition pruning, predicate pushdown etc.) and not treating this as a view.

      This JIRA is to enhance the frontend to recognize a materialized view as a table rather than a view. This will further allow commands such as COMPUTE STATS, DROP STATS, SHOW [TABLE | COLUMN] STATS to be run on the MV. One motivation for doing this is to allow an external frontend that supports automatic query rewrites using MVs to leverage the statistics on MVs.

      Since Impala is not creating the MV, we will need to block DML operations on the MV. Further, special handling needs to be done for Ranger authorization policies such that any column masking/row filtering policies defined on the source tables of the MV are taken into consideration.




            • Assignee:
              amansinha Aman Sinha
              amansinha Aman Sinha
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: