Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17063

MSCK REPAIR TABLE is super slow with Hive metastore

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.1, 2.1.0
    • SQL
    • None

    Description

      When repair a table with thousands of partitions, it could take hundreds of seconds, Hive metastore can only add a few partitioins per seconds, because it will list all the files for each partition to gather the fast stats (number of files, total size of files).

      We could improve this by listing the files in Spark in parallel, than sending the fast stats to Hive metastore to avoid this sequential listing.

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              davies Davies Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: