Details
-
Umbrella
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.0.0
-
None
Description
I created this umbrella JIRA to track DataFrame/Parquet issues with primitive arrays. This is mostly related to machine learning use cases, where feature indices/values are stored as (usually large) primitive arrays.
Issues:
SPARK-16043: Tungsten array data is not specialized for primitive typesSPARK-16071: Not sufficient array size checks (NegativeArraySizeException or silent errors)SPARK-16073: Performance of Parquet encodings on saving primitive arrays
Attachments
Issue Links
- relates to
-
SPARK-16071 Not sufficient array size checks to avoid integer overflows in Tungsten
- Resolved
-
SPARK-13850 TimSort Comparison method violates its general contract
- Resolved
-
SPARK-14850 VectorUDT/MatrixUDT should take primitive arrays without boxing
- Resolved
-
SPARK-15962 Introduce additonal implementation with a dense format for UnsafeArrayData
- Resolved
-
SPARK-16042 Eliminate nullcheck code at projection for an array type
- Resolved
-
SPARK-16043 Prepare GenericArrayData implementation specialized for a primitive array
- Resolved