Hive
  1. Hive
  2. HIVE-6518

Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0
    • Component/s: Query Processor
    • Labels:
      None
    • Release Note:
      Flush VectorGroupBy aggregation hashes in case of a full GC

      Description

      The current VectorGroupByOperator implementation flushes the in-memory hashes when the maximum entries or fraction of memory is hit.

      This works for most cases, but there are some corner cases where we hit GC ovehead limits or heap size limits before either of those conditions are reached due to the rest of the pipeline.

      This patch adds a SoftReference as a GC canary. If the soft reference is dead, then a full GC pass happened sometime in the near past & the aggregation hashtables should be flushed immediately before another full GC is triggered.

      1. HIVE-6518.3.patch
        3 kB
        Gunther Hagleitner
      2. HIVE-6518.2.patch
        3 kB
        Gunther Hagleitner
      3. HIVE-6518.2-tez.patch
        3 kB
        Gopal V
      4. HIVE-6518.1-tez.patch
        2 kB
        Gopal V

        Activity

        Hide
        Gunther Hagleitner added a comment -

        I like it. Sounds like this will allow you to be more aggressive with caching/flushing params, while having a trigger that will flush out stuff when necessary.

        +1 (assuming tests pass)

        Show
        Gunther Hagleitner added a comment - I like it. Sounds like this will allow you to be more aggressive with caching/flushing params, while having a trigger that will flush out stuff when necessary. +1 (assuming tests pass)
        Hide
        Gopal V added a comment -

        Yes, also the ORC scenario is more complex for strings in dictionaries.

        A substring does not drop the rest of the data off the memory overhead because in vectorized mode, only the start:len get modified, no new allocations are made.

        So a group by SUBSTR() will keep the entire string in memory, except the VGBY does not know that it does.

        Show
        Gopal V added a comment - Yes, also the ORC scenario is more complex for strings in dictionaries. A substring does not drop the rest of the data off the memory overhead because in vectorized mode, only the start:len get modified, no new allocations are made. So a group by SUBSTR() will keep the entire string in memory, except the VGBY does not know that it does.
        Hide
        Remus Rusanu added a comment -

        Can you somehow modify the LOG.debug at top of flush() to call out that the flush was triggered by the gcCanary.get() == null? I was thinking: keep a count of gcCanary allocations and print it in the LOG.debug message, this will tell us if the GC is the trigger and also will tell how often has occured in the operator lifetime, when debugging etc.
        +1

        Show
        Remus Rusanu added a comment - Can you somehow modify the LOG.debug at top of flush() to call out that the flush was triggered by the gcCanary.get() == null? I was thinking: keep a count of gcCanary allocations and print it in the LOG.debug message, this will tell us if the GC is the trigger and also will tell how often has occured in the operator lifetime, when debugging etc. +1
        Hide
        Gopal V added a comment -

        Add DEBUG lines

        Show
        Gopal V added a comment - Add DEBUG lines
        Hide
        Gunther Hagleitner added a comment -

        Reuploading .2 for precommit.

        Show
        Gunther Hagleitner added a comment - Reuploading .2 for precommit.
        Hide
        Gopal V added a comment -

        Resubmit for pre-commit tests

        Show
        Gopal V added a comment - Resubmit for pre-commit tests
        Hide
        Gunther Hagleitner added a comment -

        pre-commit is back. let's try again.

        Show
        Gunther Hagleitner added a comment - pre-commit is back. let's try again.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12634401/HIVE-6518.3.patch

        ERROR: -1 due to 2 failed/errored test(s), 5389 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 2 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12634401

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12634401/HIVE-6518.3.patch ERROR: -1 due to 2 failed/errored test(s), 5389 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1769/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12634401
        Hide
        Gopal V added a comment -

        The test failures don't seem to be related to this fix - they aren't vectorized.

        Show
        Gopal V added a comment - The test failures don't seem to be related to this fix - they aren't vectorized.
        Hide
        Jitendra Nath Pandey added a comment -

        I have committed this trunk. Thanks to Gopal!

        Harish Butani This is an important fix to vector group by because the aggregates must flush more aggressively in case of GC. Therefore, I intend to commit it to branch-0.13. as well.

        Show
        Jitendra Nath Pandey added a comment - I have committed this trunk. Thanks to Gopal! Harish Butani This is an important fix to vector group by because the aggregates must flush more aggressively in case of GC. Therefore, I intend to commit it to branch-0.13. as well.
        Hide
        Harish Butani added a comment -

        +1 for port to 0.13

        Show
        Harish Butani added a comment - +1 for port to 0.13
        Hide
        Jitendra Nath Pandey added a comment -

        Committed to branch-0.13 as well.

        Show
        Jitendra Nath Pandey added a comment - Committed to branch-0.13 as well.

          People

          • Assignee:
            Gopal V
            Reporter:
            Gopal V
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development