[SPARK-16070] DataFrame/Parquet issues with primitive arrays - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: MLlib, SQL
Labels:
- bulk-closed

Description

I created this umbrella JIRA to track DataFrame/Parquet issues with primitive arrays. This is mostly related to machine learning use cases, where feature indices/values are stored as (usually large) primitive arrays.

Issues:

~~SPARK-16043~~: Tungsten array data is not specialized for primitive types
~~SPARK-16071~~: Not sufficient array size checks (NegativeArraySizeException or silent errors)
~~SPARK-16073~~: Performance of Parquet encodings on saving primitive arrays

Attachments

Issue Links

relates to

SPARK-16071 Not sufficient array size checks to avoid integer overflows in Tungsten

Resolved

SPARK-13850 TimSort Comparison method violates its general contract

Resolved

SPARK-14850 VectorUDT/MatrixUDT should take primitive arrays without boxing

Resolved

SPARK-15962 Introduce additonal implementation with a dense format for UnsafeArrayData

Resolved

SPARK-16042 Eliminate nullcheck code at projection for an array type

Resolved

SPARK-16043 Prepare GenericArrayData implementation specialized for a primitive array

Resolved

(1 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Xiangrui Meng

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 20/Jun/16 16:51

Updated:: 21/May/19 04:32

Resolved:: 21/May/19 04:32