Hive
  1. Hive
  2. HIVE-2777

ability to add and drop partitions atomically

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.13.0
    • Fix Version/s: None
    • Component/s: Metastore
    • Labels:
      None

      Description

      Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one.
      Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData);
      This jira covers changes required for metastore and thrift.

      1. ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch
        832 kB
        Phabricator
      2. hive-2777.patch
        683 kB
        Xinyu Wang

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        Dupe of HIVE-2224 ?

        Show
        Ashutosh Chauhan added a comment - Dupe of HIVE-2224 ?
        Hide
        Ashutosh Chauhan added a comment -

        Oh, looks like you want more than HIVE-2224. You are passing in both addParts as well as dropParts in one call.

        Show
        Ashutosh Chauhan added a comment - Oh, looks like you want more than HIVE-2224 . You are passing in both addParts as well as dropParts in one call.
        Hide
        Aniket Mokashi added a comment -

        Yes. Basically, I would like to support merge of partitions.

        Show
        Aniket Mokashi added a comment - Yes. Basically, I would like to support merge of partitions.
        Hide
        Alan Gates added a comment -

        Is the goal here to reduce the number of partitions that Hive has or the number of files in HDFS? If the goal is to reduce the number of files then you might consider using har to pack the files into one and then change the location value in the partitions.

        Show
        Alan Gates added a comment - Is the goal here to reduce the number of partitions that Hive has or the number of files in HDFS? If the goal is to reduce the number of files then you might consider using har to pack the files into one and then change the location value in the partitions.
        Hide
        Aniket Mokashi added a comment -

        Yes, our goal is to merge multiple small files (partitions) into big ones. But, we would like users to see the data as soon as it comes to warehouse. Also, one limitation is we run from S3, that doesn't allow a move operation. So, with that we have come up with an idea to transparently create a new location for merged data and atomically drop and add partitions to point to it.

        Show
        Aniket Mokashi added a comment - Yes, our goal is to merge multiple small files (partitions) into big ones. But, we would like users to see the data as soon as it comes to warehouse. Also, one limitation is we run from S3, that doesn't allow a move operation. So, with that we have come up with an idea to transparently create a new location for merged data and atomically drop and add partitions to point to it.
        Hide
        Ashutosh Chauhan added a comment -

        So, this api is purely metadata and data won't be touched at all? Data is managed separately in such cases. I think if thats the case, this new api should have an additional parameter bool moveData, which for now works as you described for moveData = false and throws exception for moveData = true.

        Show
        Ashutosh Chauhan added a comment - So, this api is purely metadata and data won't be touched at all? Data is managed separately in such cases. I think if thats the case, this new api should have an additional parameter bool moveData, which for now works as you described for moveData = false and throws exception for moveData = true.
        Hide
        Aniket Mokashi added a comment -

        I am adding deleteData option there, deletion of data itself cannot be made atomic. But, its good to have as part of the api. deleteData=true attempts to delete the data.

        Show
        Aniket Mokashi added a comment - I am adding deleteData option there, deletion of data itself cannot be made atomic. But, its good to have as part of the api. deleteData=true attempts to delete the data.
        Hide
        Phabricator added a comment -

        aniket486 requested code review of "HIVE-2777 [jira] ability to add and drop partitions atomically".
        Reviewers: JIRA

        https://issues.apache.org/jira/browse/HIVE-2777

        ability to add and drop partitions atomically

        Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one.
        Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData);
        This jira covers changes required for metastore and thrift.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D2271

        AFFECTED FILES
        metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
        metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
        metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
        metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
        metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
        metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
        metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
        metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
        metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
        metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
        metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php
        metastore/if/hive_metastore.thrift

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/5037/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - aniket486 requested code review of " HIVE-2777 [jira] ability to add and drop partitions atomically". Reviewers: JIRA https://issues.apache.org/jira/browse/HIVE-2777 ability to add and drop partitions atomically Hive should have ability to atomically add and drop partitions. This way admins can change partitions atomically without breaking the running jobs. It allows admin to merge several partitions into one. Essentially, we would like to have an api- add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData); This jira covers changes required for metastore and thrift. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D2271 AFFECTED FILES metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php metastore/if/hive_metastore.thrift MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/5037/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Hide
        Ashutosh Chauhan added a comment -

        Unlinking from 0.9

        Show
        Ashutosh Chauhan added a comment - Unlinking from 0.9
        Hide
        Aniket Mokashi added a comment -

        Hi Ashutosh,

        Let me know if the patch is missing something. I will work on fixing those problems. It would be good api to have.

        Thanks,
        Aniket

        Show
        Aniket Mokashi added a comment - Hi Ashutosh, Let me know if the patch is missing something. I will work on fixing those problems. It would be good api to have. Thanks, Aniket
        Hide
        Aniket Mokashi added a comment -

        Can someone take a look at this?

        Show
        Aniket Mokashi added a comment - Can someone take a look at this?
        Hide
        Namit Jain added a comment -

        add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData);

        What are the exact semantics of the above API ?
        Can you give an example of what you are trying to do ?

        Show
        Namit Jain added a comment - add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData); What are the exact semantics of the above API ? Can you give an example of what you are trying to do ?
        Hide
        Aniket Mokashi added a comment -

        Thanks Namit. We are using S3 as our storage, which doesnt support move operation. Hence, to replace n number of partitions with one, we need to support that only through metadata api. In following api, for the given db and tbl_name, addParts partition list will be added to metastore and dropParts partitions will be deleted, atomically. Hence, users querying of one level up of partition hierarchy would not know that the partitions were replaced with data with a better representation (merged into bigger files, sorted etc.)

        add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData);

        Any downsides of this approach?

        Show
        Aniket Mokashi added a comment - Thanks Namit. We are using S3 as our storage, which doesnt support move operation. Hence, to replace n number of partitions with one, we need to support that only through metadata api. In following api, for the given db and tbl_name, addParts partition list will be added to metastore and dropParts partitions will be deleted, atomically. Hence, users querying of one level up of partition hierarchy would not know that the partitions were replaced with data with a better representation (merged into bigger files, sorted etc.) add_drop_partitions(String db, String tbl_name, List<Partition> addParts, List<List<String>> dropParts, boolean deleteData); Any downsides of this approach?
        Hide
        Aniket Mokashi added a comment -

        Canceling old patch, I will submit a rebased one

        Show
        Aniket Mokashi added a comment - Canceling old patch, I will submit a rebased one
        Hide
        Steven Wong added a comment -

        ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch was created by Netflix. We at Netflix intend it to be freely used according to the Apache license.

        Show
        Steven Wong added a comment - ASF.LICENSE.NOT.GRANTED-- HIVE-2777 .D2271.1.patch was created by Netflix. We at Netflix intend it to be freely used according to the Apache license.
        Hide
        Xinyu Wang added a comment -

        This is a rebased patch on top of hive branch-0.13. Please review.

        Show
        Xinyu Wang added a comment - This is a rebased patch on top of hive branch-0.13. Please review.
        Hide
        Hive QA added a comment -

        Overall: -1 no tests executed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12643147/hive-2777.patch

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/112/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/112/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Tests exited with: NonZeroExitCodeException
        Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
        + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
        + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
        + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
        + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
        + cd /data/hive-ptest/working/
        + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-112/source-prep.txt
        + [[ false == \t\r\u\e ]]
        + mkdir -p maven ivy
        + [[ svn = \s\v\n ]]
        + [[ -n '' ]]
        + [[ -d apache-svn-trunk-source ]]
        + [[ ! -d apache-svn-trunk-source/.svn ]]
        + [[ ! -d apache-svn-trunk-source ]]
        + cd apache-svn-trunk-source
        + svn revert -R .
        Reverted 'itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java'
        Reverted 'beeline/src/java/org/apache/hive/beeline/BeeLine.java'
        ++ egrep -v '^X|^Performing status on external'
        ++ awk '{print $2}'
        ++ svn status --no-ignore
        + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
        + svn update
        
        Fetching external item into 'hcatalog/src/test/e2e/harness'
        External at revision 1592204.
        
        At revision 1592204.
        + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
        + patchFilePath=/data/hive-ptest/working/scratch/build.patch
        + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
        + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
        + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
        The patch does not appear to apply with p0, p1, or p2
        + exit 1
        '
        

        This message is automatically generated.

        ATTACHMENT ID: 12643147

        Show
        Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12643147/hive-2777.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/112/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/112/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-112/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java' Reverted 'beeline/src/java/org/apache/hive/beeline/BeeLine.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1592204. At revision 1592204. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12643147
        Hide
        Xinyu Wang added a comment -

        Sorry for the previous patch, I rebased it, and it seems fine now. Can someone please review?

        Show
        Xinyu Wang added a comment - Sorry for the previous patch, I rebased it, and it seems fine now. Can someone please review?
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12643663/hive-2777.patch

        ERROR: -1 due to 11 failed/errored test(s), 5495 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
        org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testPartition
        org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition
        org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testPartition
        org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testPartition
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/136/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/136/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 11 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12643663

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12643663/hive-2777.patch ERROR: -1 due to 11 failed/errored test(s), 5495 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testPartition org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testPartition org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testPartition org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testPartition Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/136/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/136/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed This message is automatically generated. ATTACHMENT ID: 12643663
        Hide
        Sumit Kumar added a comment -

        These failures don't seem to be related to this patch. Can someone more knowledgeable than me comment please?

        Show
        Sumit Kumar added a comment - These failures don't seem to be related to this patch. Can someone more knowledgeable than me comment please?

          People

          • Assignee:
            Aniket Mokashi
            Reporter:
            Aniket Mokashi
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development