Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25823

map_filter can generate incorrect data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:

      Description

      This is not a regression because this occurs in new high-order functions like `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows the duplication. If we want to allow this difference in new high-order functions, we had better add some warning about this different on these functions after RC4 voting pass at least. Otherwise, this will surprise Presto-based users.

      Spark 2.4

      spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT map_concat(map(1,2), map(1,3)) m);
      spark-sql> SELECT * FROM t;
      {1:3}	{1:2}
      

      Presto 0.212

      presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT map_concat(map(array[1],array[2]), map(array[1],array[3])) a);
         a   | _col1
      -------+-------
       {1=3} | {}
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dongjoon Dongjoon Hyun
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: