Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23304

Spark SQL coalesce() against hive not working

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.2.1, 2.3.0
    • None
    • SQL
    • None

    Description

      The query below seems to ignore the coalesce. This is running spark 2.2 or spark 2.3 against hive, which is reading orc:

       
      Query:
      spark.sql("SELECT COUNT(DISTINCT(something)) FROM sometable WHERE dt >= '20170301' AND dt <= '20170331' AND something IS NOT NULL").coalesce(160000).show()
       

      Attachments

        1. spark22_oldorc_explain.txt
          2 kB
          Thomas Graves
        2. spark23_oldorc_explain_convermetastoreorcfalse.txt
          2 kB
          Thomas Graves
        3. spark23_oldorc_explain.txt
          2 kB
          Thomas Graves

        Activity

          People

            smilegator Xiao Li
            tgraves Thomas Graves
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: