Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9359

Export of a large table causes OOM in Metastore and Client

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: Import/Export, Metastore
    • Labels:
      None

      Description

      Running hive export on a table with a large number of partitions winds up making the metastore and client run out of memory. The number of places we wind up having a copy of the entire partitions object wind up being as follows:

      Metastore

      • (temporarily) Metastore MPartition objects
      • List<Partition> that gets persisted before sending to thrift
      • thrift copy of all of those partitions

      Client side

      • thrift copy of partitions
      • deepcopy of above to create List<Partition> objects
      • JSONObject that contains all of those above partition objects
      • List<ReadEntity> which each encapsulates the aforesaid partition objects.

      This memory usage needs to be drastically reduced.

      1. HIVE-9359.2.patch
        11 kB
        Sushanth Sowmyan
      2. HIVE-9359.patch
        11 kB
        Sushanth Sowmyan

        Activity

        Hide
        thejas Thejas M Nair added a comment -

        This issue has been fixed in Apache Hive 1.0.0. If there is any issue with the fix, please open a new jira to address it.

        Show
        thejas Thejas M Nair added a comment - This issue has been fixed in Apache Hive 1.0.0. If there is any issue with the fix, please open a new jira to address it.
        Hide
        thejas Thejas M Nair added a comment -

        Updating release version for jiras resolved in 1.0.0 .

        Show
        thejas Thejas M Nair added a comment - Updating release version for jiras resolved in 1.0.0 .
        Hide
        sushanth Sushanth Sowmyan added a comment -

        Committed to branch-1.0. Thanks, Vikram!

        Show
        sushanth Sushanth Sowmyan added a comment - Committed to branch-1.0. Thanks, Vikram!
        Hide
        vikram.dixit Vikram Dixit K added a comment -

        +1 for 1.0

        Show
        vikram.dixit Vikram Dixit K added a comment - +1 for 1.0
        Hide
        alangates Alan Gates added a comment -

        Patch 2 committed to trunk. I don't believe the test failures are related, since they are all in code not even remotely close to export/import.

        Show
        alangates Alan Gates added a comment - Patch 2 committed to trunk. I don't believe the test failures are related, since they are all in code not even remotely close to export/import.
        Hide
        hiveqa Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12693758/HIVE-9359.2.patch

        ERROR: -1 due to 4 failed/errored test(s), 7347 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
        org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2475/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2475/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2475/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 4 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12693758 - PreCommit-HIVE-TRUNK-Build

        Show
        hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12693758/HIVE-9359.2.patch ERROR: -1 due to 4 failed/errored test(s), 7347 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2475/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2475/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2475/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12693758 - PreCommit-HIVE-TRUNK-Build
        Hide
        alangates Alan Gates added a comment -

        +1.

        Show
        alangates Alan Gates added a comment - +1.
        Hide
        sushanth Sushanth Sowmyan added a comment -

        Updated patch to use JsonGenerator.

        Show
        sushanth Sushanth Sowmyan added a comment - Updated patch to use JsonGenerator.
        Hide
        sushanth Sushanth Sowmyan added a comment -

        I see your concern. Jackson has a streaming api style with JSONGenerator, but I haven't yet been able to confirm if it generates the JSON object and then flushes as many others seem to do, or whether it actually writes out as it receives instructions. I can investigate a bit more on that end.

        Show
        sushanth Sushanth Sowmyan added a comment - I see your concern. Jackson has a streaming api style with JSONGenerator, but I haven't yet been able to confirm if it generates the JSON object and then flushes as many others seem to do, or whether it actually writes out as it receives instructions. I can investigate a bit more on that end.
        Hide
        alangates Alan Gates added a comment -

        One question: in EximUtil.createExportDump - is there a streaming JSON writer you could use so that we're not forced to hand-code JSON. Maybe the JSON we're writing is simple enough, but this looks like it could end in a mess if our JSON gets complex.

        Show
        alangates Alan Gates added a comment - One question: in EximUtil.createExportDump - is there a streaming JSON writer you could use so that we're not forced to hand-code JSON. Maybe the JSON we're writing is simple enough, but this looks like it could end in a mess if our JSON gets complex.
        Hide
        sushanth Sushanth Sowmyan added a comment -

        Vikram Dixit K, this is another bug I'd like to see in 0.14.1, because it has a pretty significant memory impact on the metastore if a user runs an export on a table with too many partituons.

        Show
        sushanth Sushanth Sowmyan added a comment - Vikram Dixit K , this is another bug I'd like to see in 0.14.1, because it has a pretty significant memory impact on the metastore if a user runs an export on a table with too many partituons.
        Hide
        sushanth Sushanth Sowmyan added a comment -

        Alan Gates, could you please review this patch?

        Show
        sushanth Sushanth Sowmyan added a comment - Alan Gates , could you please review this patch?
        Hide
        hiveqa Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12691936/HIVE-9359.patch

        ERROR: -1 due to 5 failed/errored test(s), 7304 tests executed
        Failed tests:

        TestSparkClient - did not produce a TEST-*.xml file
        org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
        org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
        org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
        org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2360/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2360/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2360/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 5 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12691936 - PreCommit-HIVE-TRUNK-Build

        Show
        hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12691936/HIVE-9359.patch ERROR: -1 due to 5 failed/errored test(s), 7304 tests executed Failed tests: TestSparkClient - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.jdbc.TestSSL.testSSLFetchHttp Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2360/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2360/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2360/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12691936 - PreCommit-HIVE-TRUNK-Build
        Hide
        sushanth Sushanth Sowmyan added a comment -

        patch attached.

        Show
        sushanth Sushanth Sowmyan added a comment - patch attached.
        Hide
        sushanth Sushanth Sowmyan added a comment -

        To fix this completely would need a significant retrofit of the client side, as well as some ability to do paginated batch retrieves from the metastore.

        A quick solution that goes a good deal of the way, however, is as follows:

        a) Changing some usages of List<Partition> to Iterable<Partition>, and have a PartitionIterable that implements the above interface to replace usages of List<Partition>, and have that class lazily fetch partitions on need. While having a pagination scheme from the metastore would be great, a good short term solution that's possible is to simply store the partition names rather than the entire partition objects, so a PartitionIterable can, in the meanwhile, get the partition names, and then handle the pagination itself.

        This solves the oom issues on the metastore completely, and gets rid of the thrift copy problem as well as the List<Partition> deepcopy problem. It introduces a load of storing all the partition names, but this is far less costly than the above.

        b) Changing the json serialization to output each element as they come, rather than constructing one large JSONObject, and writing that out in one go. This solves the large JSONObject problem.

        This still does not solve the problem of having a large number of ReadEntities, but that's something that's better tacked by doing something like a metadata-only-export, or changing export to be able to export a partial partition specification at a time, both of which are the subjects of further jiras I will be filing shortly.

        Show
        sushanth Sushanth Sowmyan added a comment - To fix this completely would need a significant retrofit of the client side, as well as some ability to do paginated batch retrieves from the metastore. A quick solution that goes a good deal of the way, however, is as follows: a) Changing some usages of List<Partition> to Iterable<Partition>, and have a PartitionIterable that implements the above interface to replace usages of List<Partition>, and have that class lazily fetch partitions on need. While having a pagination scheme from the metastore would be great, a good short term solution that's possible is to simply store the partition names rather than the entire partition objects, so a PartitionIterable can, in the meanwhile, get the partition names, and then handle the pagination itself. This solves the oom issues on the metastore completely, and gets rid of the thrift copy problem as well as the List<Partition> deepcopy problem. It introduces a load of storing all the partition names, but this is far less costly than the above. b) Changing the json serialization to output each element as they come, rather than constructing one large JSONObject, and writing that out in one go. This solves the large JSONObject problem. This still does not solve the problem of having a large number of ReadEntities, but that's something that's better tacked by doing something like a metadata-only-export, or changing export to be able to export a partial partition specification at a time, both of which are the subjects of further jiras I will be filing shortly.

          People

          • Assignee:
            sushanth Sushanth Sowmyan
            Reporter:
            sushanth Sushanth Sowmyan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development