Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25823

map_filter can generate incorrect data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 3.0.0
    • None
    • SQL

    Description

      This is not a regression because this occurs in new high-order functions like `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows the duplication. If we want to allow this difference in new high-order functions, we had better add some warning about this different on these functions after RC4 voting pass at least. Otherwise, this will surprise Presto-based users.

      Spark 2.4

      spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT map_concat(map(1,2), map(1,3)) m);
      spark-sql> SELECT * FROM t;
      {1:3}	{1:2}
      

      Presto 0.212

      presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT map_concat(map(array[1],array[2]), map(array[1],array[3])) a);
         a   | _col1
      -------+-------
       {1=3} | {}
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: