Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923 Hive-on-Spark DPP Improvements
  3. HIVE-17638

SparkDynamicPartitionPruner loads all partition metadata into memory

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      The SparkDynamicPartitionPruner first loads the contents of each partition pruning file into memory, and then prunes all the partitions from the MapWork. This can cause increased memory pressure on the HoS Remote Driver because it requires loading all the partition metadata into memory. It would be more efficient if pruning of partitions was done while scanning the files, so that all the partition metadata doesn't need to be buffered in memory.

      Attachments

        Activity

          People

            janulatha Janaki Lahorani
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: