Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49520

ArrayRemove() Function Need Remove NULL Value

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.1
    • None
    • SQL
    • Spark Version: 3.2.1

    • Patch

    Description

      I want to calculate the intersection of two arrays like this: 

      select   
          case when intersect_size > 0 then 1 else 0 end as is_include
      from ( 
          select 
              size(array_intersect(array_a, array_b)) as intersect_size 
      from table_a
      )
      

       
      But, the NULL will affect the output:

      SELECT size(array_intersect(array(1, 2, 3, null), array(null)))
      
      Output: 1 

      So I want remove the NULL in first array by using array_remove

      SELECT array_remove(array(1, 2, 3, null, 3), null) 
      
      Output: null

      I want to add extra logic for function array_remove to remove NULL. Shall I overwrite the function (May be named: array_remove(array_a, array_b, isIgnoreNull)) or just fix the original function?
       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Shadowell Feng Jie
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified