Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17612

Hive does not insert dynamic partition-sets atomically

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0, 3.0.0
    • Fix Version/s: None
    • Component/s: CLI, Hive
    • Labels:
      None

      Description

      If one inserts partitions to a Hive table using a Hive query (e.g. INSERT OVERWRITE TABLE my_table PARTITION (foo, bar) SELECT * FROM another_table;), each dynamic partition is added separately, using HMSC.append_partition(). By contrast, Pig/HCatLoader does the same atomically, using HMSC.add_partitions().

      Because of this behaviour, Oozie workflows might kick off when the first partition is registered, but before the last partition in the set is available.

      This was verified in the metastore-logs, with multiple ADD_PARTITION events fired for the same query (i.e. once per added partition), instead of a single event for the set.

      It would be ideal for Hive to provide atomic partition-adds.

        Attachments

          Activity

            People

            • Assignee:
              mithun Mithun Radhakrishnan
              Reporter:
              mithun Mithun Radhakrishnan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: