Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39885

Behavior differs between arrays_overlap and array_contains for negative 0.0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.2
    • None
    • SQL
    • None

    Description

      array_contains([0.0], -0.0) will return true. array_overlaps([0.0], [-0.0]) will return false. I think we generally want to treat -0.0 and 0.0 as the same (see https://github.com/apache/spark/blob/e9eb28e27d10497c8b36774609823f4bbd2c8500/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SQLOrderingUtil.scala#L28)
      However, the Double::equals method doesn't. Therefore, we should either mark double as false in TypeUtils#typeWithProperEquals, or we should wrap it with our own equals method that handles this case.

      Java code snippets showing the issue:

      dataset = sparkSession.createDataFrame(
                  List.of(RowFactory.create(List.of(-0.0))),
                  DataTypes.createStructType(ImmutableList.of(DataTypes.createStructField(
                          "doubleCol", DataTypes.createArrayType(DataTypes.DoubleType), false))));
              Dataset<Row> df = dataset.withColumn(
                  "overlaps", functions.arrays_overlap(functions.array(functions.lit(+0.0)), dataset.col("doubleCol")));
              List<Row> result = df.collectAsList(); // [[WrappedArray(-0.0),false]]
      
      dataset = sparkSession.createDataFrame(
                      List.of(RowFactory.create(-0.0)),
                      DataTypes.createStructType(
                              ImmutableList.of(DataTypes.createStructField("doubleCol", DataTypes.DoubleType, false))));
              Dataset<Row> df = dataset.withColumn(
                      "contains", functions.array_contains(functions.array(functions.lit(+0.0)), dataset.col("doubleCol")));
              List<Row> result = df.collectAsList(); // [[-0.0,true]]
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            dvogelbacher David Vogelbacher
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: