Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2087

Dynamic partition insert performance problem

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • None
    • Metastore
    • None
    • Amazon EMR, S3

    • dynamic partition peformace

    Description

      Create an external(backed by S3) table T, make it partitioned by column P. Populate table T so it has large number of partitions (say 100). Execute statement like

      insert overwrite table T partition (p) select * from another_table

      check hive server log, and it will show that all existing partitions will be read and loaded before any mapper starts working. This feels excessive, given that the insert statement may only create or overwrite a very small number of partitions. Is there other reason that insert using dynamic partition requires loading the whole table?

      Attachments

        Activity

          People

            Unassigned Unassigned
            qlong Q Long
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: