Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
1.5.2
-
None
-
None
Description
When use split in sql, if we want to get the different value through index from same array, it will split the same row every time. And the split in Java is poor performance.
spark-sql> explain extended select array[0] as a, array[1] as b, array[2] as c from (select split(value, ',') as array from src_split) t; == Parsed Logical Plan == 'Project [unresolvedalias('array[0] AS a#16),unresolvedalias('array[1] AS b#17),unresolvedalias('array[2] AS c#18)] 'Subquery t 'Project [unresolvedalias('split('value,,) AS array#15)] 'UnresolvedRelation [src_split], None == Analyzed Logical Plan == a: string, b: string, c: string Project [array#15[0] AS a#16,array#15[1] AS b#17,array#15[2] AS c#18] Subquery t Project [split(value#20,,) AS array#15] MetastoreRelation default, src_split, None == Optimized Logical Plan == Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS b#17,split(value#20,,)[2] AS c#18] MetastoreRelation default, src_split, None == Physical Plan == Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS b#17,split(value#20,,)[2] AS c#18] HiveTableScan [value#20], (MetastoreRelation default, src_split, None)
Attachments
Issue Links
- is duplicated by
-
SPARK-17728 UDFs are run too many times
- Closed