Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38154 Set up a new GA job to run tests with ANSI mode
  3. SPARK-38363

Avoid runtime error in Dataset.summary() when ANSI mode is on

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0, 3.2.2
    • SQL
    • None

    Description

      When executing df.summary(), Spark SQL converts String columns as Double for the 

      percentiles/mean/stddev metrics. 

      This can cause runtime errors with ANSI mode on. 

      Since this API is for getting a quick summary of the Dataframe, I suggest using "TryCast" for the problematic stats so that the API still works under ANSI mode.

      Attachments

        Activity

          People

            Gengliang.Wang Gengliang Wang
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: