Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15365

Metastore relation should fallback to HDFS size if statistics are not available from table meta data.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      Currently if a table is used in join operation we rely on Metastore returned size to calculate if we can convert the operation to Broadcast join. This optimization only kicks in for table's that have the statistics available in metastore. Hive generally rolls over to HDFS if the statistics are not available directly from metastore and this seems like a reasonable choice to adopt given the optimization benefit of using broadcast joins.

      Attachments

        Issue Links

          Activity

            People

              parth.brahmbhatt Parth Brahmbhatt
              parth.brahmbhatt Parth Brahmbhatt
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: