Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12352

Reuse the result of split in SQL

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 1.5.2
    • None
    • SQL
    • None

    Description

      When use split in sql, if we want to get the different value through index from same array, it will split the same row every time. And the split in Java is poor performance.

      spark-sql> explain extended select array[0] as a, array[1] as b, array[2] as c from (select split(value, ',') as array from src_split) t;
      == Parsed Logical Plan ==
      'Project [unresolvedalias('array[0] AS a#16),unresolvedalias('array[1] AS b#17),unresolvedalias('array[2] AS c#18)]
       'Subquery t
        'Project [unresolvedalias('split('value,,) AS array#15)]
         'UnresolvedRelation [src_split], None
      
      == Analyzed Logical Plan ==
      a: string, b: string, c: string
      Project [array#15[0] AS a#16,array#15[1] AS b#17,array#15[2] AS c#18]
       Subquery t
        Project [split(value#20,,) AS array#15]
         MetastoreRelation default, src_split, None
      
      == Optimized Logical Plan ==
      Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS b#17,split(value#20,,)[2] AS c#18]
       MetastoreRelation default, src_split, None
      
      == Physical Plan ==
      Project [split(value#20,,)[0] AS a#16,split(value#20,,)[1] AS b#17,split(value#20,,)[2] AS c#18]
       HiveTableScan [value#20], (MetastoreRelation default, src_split, None)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              waterman Yadong Qi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: