Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23819

InMemoryTableScanExec prunes orderable complex types due to out of date ColumnStats

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      The data types that can be compared via BinaryComparison was expanded in SPARK-21110 now include Arrays/Structs/etc, but ColumnStats would still have hard coded upper/lower bounds for these types.

      InMemoryTableScanExec used to be safe against these comparisons because the predicate would fail type checking. Now that it passes, the statistics unintentionally allow pruning of the partition, causing correctness issues.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pwoody Patrick Woody
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: