[SPARK-29031] Materialized column to accelerate queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: SQL
Labels:
- SPIP

Description

Goals

Add a new SQL grammar of Materialized column
Implicitly rewrite SQL queries on the complex type of columns if there is a materialized columns for it
If the data type of the materialized columns is atomic type, even though the origin column type is in complex type, enable vectorized read and filter pushdown to improve performance

Example

Create a normal table

CREATE TABLE x (

    name STRING,

    age INT,

    params STRING,

    event MAP<STRING, STRING>

) USING parquet;

Add materialized columns to an existing table

ALTER TABLE x ADD COLUMNS (

    new_age INT MATERIALIZED age + 1,

    city STRING MATERIALIZED get_json_object(params, '$.city'),

    label STRING MATERIALIZED event['label']

);

When issue a query as below

SELECT name, age+1, get_json_object(params, '$.city'), event['label']

FROM x

WHER event['label'] = 'newuser';

It's equivalent to

SELECT name, new_age, city, label

FROM x

WHERE label = 'newuser';

The query performance improved dramatically because

The new query (after rewritten) will read the new column city (in string type) instead of read the whole map of params(in map string). Much lesser data are need to read
Vectorized read can be utilized in the new query and can not be used in the old one. Because vectorized read can only be enabled when all required columns are in atomic type
Filter can be pushdown. Only filters on atomic column can be pushdown. The original filter event['label'] = 'newuser' is on complex column, so it can not be pushdown.
The new query do not need to parse JSON any more. JSON parse is a CPU intensive operation which will impact performance dramatically

Attachments

Issue Links

links to

SPIP Proposal: Materialized column for SQL

Activity

People

Assignee:: Unassigned

Reporter:: Jason Guo

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Sep/19 06:42

Updated:: 17/Mar/20 08:50