Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24574

improve array_contains function of the sql component to deal with Column type

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None

    Description

      Hello all,
       
      I ran into a use case in project with spark sql and want to share with you some thoughts about the function array_contains.
       
      Say I have a Dataframe containing 2 columns. Column A of type "Array of String" and Column B of type "String". I want to determine if the value of column B is contained in the value of column A, without using a udf of course.
      The function array_contains came into my mind naturally:
       
       
      def array_contains(column: Column, value: Any): Column = withExpr

      {   ArrayContains(column.expr, Literal(value)) }

       
      However the function takes the column B and does a "Literal" of column B, which yields a runtime exception: RuntimeException("Unsupported literal type " + v.getClass + " " + v).
       
      Then after discussion with my friends, we fund a solution without using udf:
      new Column(ArrayContains(col("ColumnA").expr, col("ColumnB").expr) 
       
      With this solution, I think of empowering a little bit more the function, by doing like this:
      def array_contains(column: Column, value: Any): Column = withExpr {
        value match

      {     case c: Column => ArrayContains(column.expr, c.expr)     case _ => ArrayContains(column.expr, Literal(value))     }

      }
       
      It does a pattern matching to detect if value is of type Column. If yes, it will use the .expr of the column, otherwise it will work as it used to.
       
       

      Attachments

        Activity

          People

            Chongguang Chongguang LIU
            Chongguang Chongguang LIU
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: