Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3106

Add option to make multi inserts more atomic

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, with multi-insert queries as soon the output of one of the inserts is ready the move task associated with that insert is run, creating the table/partition. However, if concurrency is enabled the lock on this table/partition is not released until the entire query finishes, which can be much later.

      This causes issues if, for example, a user is waiting for an output of the multi-insert query which is created long before the other outputs, and checking for it's existence using the metastore's Thrift methods (get_table/get_partition). In which case, the user will run their query which uses the output, and it will experience a timeout trying to acquire the lock on the table/partition.

      If all the move tasks depend on the parent's of all other move tasks, the output creation will be much closer to atomic relieving this problem.

        Attachments

        1. HIVE-3106.2.patch.txt
          462 kB
          Kevin Wilfong
        2. HIVE-3106.1.patch.txt
          313 kB
          Kevin Wilfong

          Activity

            People

            • Assignee:
              kevinwilfong Kevin Wilfong
              Reporter:
              kevinwilfong Kevin Wilfong
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: