[SPARK-4849] Pass partitioning information (distribute by) to In-memory caching - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.6.0
Component/s: SQL
Labels:
None

Description

HQL "distribute by <column_name>" partitions data based on specified column values. We can pass this information to in-memory caching for further performance improvements. e..g. in Joins, an extra partition step can be saved based on this information.

Refer - http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-partition-on-specific-column-values-td20350.html

Attachments

Issue Links

duplicates

SPARK-5354 Set InMemoryColumnarTableScan's outputPartitioning and outputOrdering

Resolved

relates to

SPARK-11410 Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY

Resolved

Activity

People

Assignee:: Nong Li

Reporter:: Nitin Goyal

Votes:: 2 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 15/Dec/14 11:20

Updated:: 04/Nov/15 18:01

Resolved:: 04/Nov/15 17:57