Hive
  1. Hive
  2. HIVE-3106

Add option to make multi inserts more atomic

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, with multi-insert queries as soon the output of one of the inserts is ready the move task associated with that insert is run, creating the table/partition. However, if concurrency is enabled the lock on this table/partition is not released until the entire query finishes, which can be much later.

      This causes issues if, for example, a user is waiting for an output of the multi-insert query which is created long before the other outputs, and checking for it's existence using the metastore's Thrift methods (get_table/get_partition). In which case, the user will run their query which uses the output, and it will experience a timeout trying to acquire the lock on the table/partition.

      If all the move tasks depend on the parent's of all other move tasks, the output creation will be much closer to atomic relieving this problem.

      1. HIVE-3106.1.patch.txt
        313 kB
        Kevin Wilfong
      2. HIVE-3106.2.patch.txt
        462 kB
        Kevin Wilfong

        Activity

        Hide
        Kevin Wilfong added a comment -

        Submitted a diff here https://reviews.facebook.net/D3561

        Show
        Kevin Wilfong added a comment - Submitted a diff here https://reviews.facebook.net/D3561
        Hide
        Kevin Wilfong added a comment -

        Spoke with njain offline. He suggested adding a dummy task which depends on the tasks each move task would depend on, and which has move tasks as its children. This will reduce the number of dependency edges in the dependency graph. This dummy task (DependencyCollectionTask) will only be added if this option is turned on.

        Show
        Kevin Wilfong added a comment - Spoke with njain offline. He suggested adding a dummy task which depends on the tasks each move task would depend on, and which has move tasks as its children. This will reduce the number of dependency edges in the dependency graph. This dummy task (DependencyCollectionTask) will only be added if this option is turned on.
        Hide
        Carl Steinbach added a comment -

        @Kevin: I added some comments on phabricator.

        Show
        Carl Steinbach added a comment - @Kevin: I added some comments on phabricator.
        Hide
        Kevin Wilfong added a comment -

        Per Carl's comments, explicitely stated the advantages/disadvantages, removed atomic from the name of the configuration variable, as this is not really true, removed references to "outputs" in description of config.

        Also, fixed an issue, where if a file was taking a long time to produce, there would still be a long time between when the tables/partitions are produced and when the locks on them are released. Now, when the option is set, the DependencyCollection task depends on the dependencies of the move tasks for files, but the move tasks for files do not depend on the DependencyCollection task, as there are no locks on these files so there would not be any advantage.

        Added a new test case for this additional functionality.

        Show
        Kevin Wilfong added a comment - Per Carl's comments, explicitely stated the advantages/disadvantages, removed atomic from the name of the configuration variable, as this is not really true, removed references to "outputs" in description of config. Also, fixed an issue, where if a file was taking a long time to produce, there would still be a long time between when the tables/partitions are produced and when the locks on them are released. Now, when the option is set, the DependencyCollection task depends on the dependencies of the move tasks for files, but the move tasks for files do not depend on the DependencyCollection task, as there are no locks on these files so there would not be any advantage. Added a new test case for this additional functionality.
        Hide
        Kevin Wilfong added a comment -

        Updated diff per Namit's comments.

        Show
        Kevin Wilfong added a comment - Updated diff per Namit's comments.
        Hide
        Namit Jain added a comment -

        Committed. Thanks Kevin

        Show
        Namit Jain added a comment - Committed. Thanks Kevin
        Hide
        Carl Steinbach added a comment -

        @Kevin: Please attach the version of the patch that Namit committed.

        @Namit: Please set the fix version field and add a release note with the name of the configuration property added in this patch.

        Thanks.

        Show
        Carl Steinbach added a comment - @Kevin: Please attach the version of the patch that Namit committed. @Namit: Please set the fix version field and add a release note with the name of the configuration property added in this patch. Thanks.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1490 (See https://builds.apache.org/job/Hive-trunk-h0.21/1490/)
        HIVE-3106 Add option to make multi inserts more atomic
        (Kevin Wilfong via namit) (Revision 1350792)

        Result = FAILURE
        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1350792
        Files :

        • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        • /hive/trunk/conf/hive-default.xml.template
        • /hive/trunk/ql/if/queryplan.thrift
        • /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.cpp
        • /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.h
        • /hive/trunk/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java
        • /hive/trunk/ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php
        • /hive/trunk/ql/src/gen/thrift/gen-py/queryplan/ttypes.py
        • /hive/trunk/ql/src/gen/thrift/gen-rb/queryplan_types.rb
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DependencyCollectionTask.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DependencyCollectionWork.java
        • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_move_tasks_share_dependencies.q
        • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1490 (See https://builds.apache.org/job/Hive-trunk-h0.21/1490/ ) HIVE-3106 Add option to make multi inserts more atomic (Kevin Wilfong via namit) (Revision 1350792) Result = FAILURE namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1350792 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/ql/if/queryplan.thrift /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.cpp /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.h /hive/trunk/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java /hive/trunk/ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php /hive/trunk/ql/src/gen/thrift/gen-py/queryplan/ttypes.py /hive/trunk/ql/src/gen/thrift/gen-rb/queryplan_types.rb /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DependencyCollectionTask.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DependencyCollectionWork.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_move_tasks_share_dependencies.q /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-3106 Add option to make multi inserts more atomic
        (Kevin Wilfong via namit) (Revision 1350792)

        Result = ABORTED
        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1350792
        Files :

        • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
        • /hive/trunk/conf/hive-default.xml.template
        • /hive/trunk/ql/if/queryplan.thrift
        • /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.cpp
        • /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.h
        • /hive/trunk/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java
        • /hive/trunk/ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php
        • /hive/trunk/ql/src/gen/thrift/gen-py/queryplan/ttypes.py
        • /hive/trunk/ql/src/gen/thrift/gen-rb/queryplan_types.rb
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DependencyCollectionTask.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DependencyCollectionWork.java
        • /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_move_tasks_share_dependencies.q
        • /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3106 Add option to make multi inserts more atomic (Kevin Wilfong via namit) (Revision 1350792) Result = ABORTED namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1350792 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/conf/hive-default.xml.template /hive/trunk/ql/if/queryplan.thrift /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.cpp /hive/trunk/ql/src/gen/thrift/gen-cpp/queryplan_types.h /hive/trunk/ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java /hive/trunk/ql/src/gen/thrift/gen-php/queryplan/queryplan_types.php /hive/trunk/ql/src/gen/thrift/gen-py/queryplan/ttypes.py /hive/trunk/ql/src/gen/thrift/gen-rb/queryplan_types.rb /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DependencyCollectionTask.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DependencyCollectionWork.java /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_move_tasks_share_dependencies.q /hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Kevin Wilfong
            Reporter:
            Kevin Wilfong
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development