Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4895

Memory limit exceeded in TestTPCHJoinQueries.test_outer_joins on local filesystem and non-partitioned-aggs-and-joins

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

       TestTPCHJoinQueries.test_outer_joins[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      query_test/test_join_queries.py:122: in test_outer_joins
          self.run_test_case('tpch-outer-joins', new_vector)
      common/impala_test_suite.py:359: in run_test_case
          result = self.__execute_query(target_impalad_client, query, user=user)
      common/impala_test_suite.py:567: in __execute_query
          return impalad_client.execute(query, user=user)
      common/impala_connection.py:160: in execute
          return self.__beeswax_client.execute(sql_stmt, user=user)
      beeswax/impala_beeswax.py:173: in execute
          handle = self.__execute_query(query_string.strip(), user=user)
      beeswax/impala_beeswax.py:339: in __execute_query
          self.wait_for_completion(handle)
      beeswax/impala_beeswax.py:359: in wait_for_completion
          raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
      E   ImpalaBeeswaxException: ImpalaBeeswaxException:
      E    Query aborted:
      E   Memory limit exceeded
      E   
      E   
      E   
      E   Memory Limit Exceeded by fragment: 644cda3570148404:3e76dd4d00000002
      E   Query(644cda3570148404:3e76dd4d00000000): memory limit exceeded. Limit=1.00 GB Total=1.00 GB Peak=1.00 GB
      E     Fragment 644cda3570148404:3e76dd4d00000000: Total=10.27 KB Peak=339.00 KB
      E       EXCHANGE_NODE (id=6): Total=0 Peak=0
      E       DataStreamRecvr: Total=0 Peak=0
      E       PLAN_ROOT_SINK: Total=0 Peak=0
      E       CodeGen: Total=2.27 KB Peak=331.00 KB
      E     Block Manager: Limit=819.20 MB Total=760.00 MB Peak=760.00 MB
      E     Fragment 644cda3570148404:3e76dd4d00000001: Total=85.19 MB Peak=85.32 MB
      E       HDFS_SCAN_NODE (id=0): Total=85.01 MB Peak=85.31 MB
      E       DataStreamSender (dst_id=4): Total=1.16 KB Peak=1.16 KB
      E       CodeGen: Total=1.22 KB Peak=228.00 KB
      E     Fragment 644cda3570148404:3e76dd4d00000003: Total=477.88 MB Peak=785.88 MB
      E       Runtime Filter Bank: Total=2.00 MB Peak=2.00 MB
      E       SORT_NODE (id=3): Total=4.00 KB Peak=4.00 KB
      E       HASH_JOIN_NODE (id=2): Total=449.05 MB Peak=761.31 MB
      E         Exprs: Total=4.00 KB Peak=4.00 KB
      E         Hash Join Builder (join_node_id=2): Total=449.02 MB Peak=761.02 MB
      E           Hash Join Builder (join_node_id=2) Exprs: Total=8.00 KB Peak=8.00 KB
      E       EXCHANGE_NODE (id=4): Total=0 Peak=0
      E       DataStreamRecvr: Total=21.98 MB Peak=21.98 MB
      E       EXCHANGE_NODE (id=5): Total=0 Peak=0
      E       DataStreamRecvr: Total=4.79 MB Peak=22.91 MB
      E       DataStreamSender (dst_id=6): Total=432.00 B Peak=432.00 B
      E       CodeGen: Total=41.86 KB Peak=2.86 MB
      E     Fragment 644cda3570148404:3e76dd4d00000002: Total=141.00 MB Peak=181.91 MB
      E       HDFS_SCAN_NODE (id=1): Total=140.74 MB Peak=181.66 MB
      E       DataStreamSender (dst_id=5): Total=4.67 KB Peak=4.67 KB
      E       CodeGen: Total=653.00 B Peak=116.50 KB
      E   Memory Limit Exceeded by fragment: 644cda3570148404:3e76dd4d00000003
      E   Query(644cda3570148404:3e76dd4d00000000): memory limit exceeded. Limit=1.00 GB Total=1.00 GB Peak=1.00 GB
      E     Fragment 644cda3570148404:3e76dd4d00000000: Total=10.27 KB Peak=339.00 KB
      E       EXCHANGE_NODE (id=6): Total=0 Peak=0
      E       DataStreamRecvr: Total=0 Peak=0
      E       PLAN_ROOT_SINK: Total=0 Peak=0
      E       CodeGen: Total=2.27 KB Peak=331.00 KB
      E     Block Manager: Limit=819.20 MB Total=760.00 MB Peak=760.00 MB
      E     Fragment 644cda3570148404:3e76dd4d00000001: Total=85.19 MB Peak=85.32 MB
      E       HDFS_SCAN_NODE (id=0): Total=85.01 MB Peak=85.31 MB
      E       DataStreamSender (dst_id=4): Total=1.16 KB Peak=1.16 KB
      E       CodeGen: Total=1.22 KB Peak=228.00 KB
      E     Fragment 644cda3570148404:3e76dd4d00000003: Total=477.88 MB Peak=785.88 MB
      E       Runtime Filter Bank: Total=2.00 MB Peak=2.00 MB
      E       SORT_NODE (id=3): Total=4.00 KB Peak=4.00 KB
      E       HASH_JOIN_NODE (id=2): Total=449.05 MB Peak=761.31 MB
      E         Exprs: Total=4.00 KB Peak=4.00 KB
      E         Hash Join Builder (join_node_id=2): Total=449.02 MB Peak=761.02 MB
      E           Hash Join Builder (join_node_id=2) Exprs: Total=8.00 KB Peak=8.00 KB
      E       EXCHANGE_NODE (id=4): Total=0 Peak=0
      E       DataStreamRecvr: Total=21.98 MB Peak=21.98 MB
      E       EXCHANGE_NODE (id=5): Total=0 Peak=0
      E       DataStreamRecvr: Total=4.79 MB Peak=22.91 MB
      E       DataStreamSender (dst_id=6): Total=432.00 B Peak=432.00 B
      E       CodeGen: Total=41.86 KB Peak=2.86 MB
      E     Fragment 644cda3570148404:3e76dd4d00000002: Total=141.00 MB Peak=181.91 MB
      E       HDFS_SCAN_NODE (id=1): Total=140.74 MB Peak=181.66 MB
      E       DataStreamSender (dst_id=5): Total=4.67 KB Peak=4.67 KB
      E       CodeGen: Total=653.00 B Peak=116.50 KB
      

      Thomas, assigning to you because it broke on commit

      https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=commit;h=6a9df540967e07b09524268d0cc52b7d10835676

        Activity

        Hide
        dhecht Dan Hecht added a comment -

        Unless we can fix quickly, let's backout the change and recommit it with the fix.

        Show
        dhecht Dan Hecht added a comment - Unless we can fix quickly, let's backout the change and recommit it with the fix.
        Hide
        jbapple Jim Apple added a comment -

        I have also seen this in a no-partitioned-aggs-and-joins build

        Show
        jbapple Jim Apple added a comment - I have also seen this in a no-partitioned-aggs-and-joins build
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        Looking into this now.

        Show
        twmarshall Thomas Tauber-Marshall added a comment - Looking into this now.
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        I haven't been able to repro this locally yet, so I'm not sure how long it'll take to fix.

        If its occurring frequently in the builds, I agree with Dan that it should be reverted. Not sure what our procedure for doing that is - do I need to submit a review?

        Show
        twmarshall Thomas Tauber-Marshall added a comment - I haven't been able to repro this locally yet, so I'm not sure how long it'll take to fix. If its occurring frequently in the builds, I agree with Dan that it should be reverted. Not sure what our procedure for doing that is - do I need to submit a review?
        Hide
        jbapple Jim Apple added a comment -

        Yes, the way to do it is to submit a review

        Show
        jbapple Jim Apple added a comment - Yes, the way to do it is to submit a review
        Hide
        dhecht Dan Hecht added a comment -

        Right, and you can use git revert to help generate the change.

        Show
        dhecht Dan Hecht added a comment - Right, and you can use git revert to help generate the change.
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        I submitted a fix: https://gerrit.cloudera.org/#/c/5941/

        If it looks like its going to take awhile to review I can still submit the revert, but its a pretty small fix.

        Show
        twmarshall Thomas Tauber-Marshall added a comment - I submitted a fix: https://gerrit.cloudera.org/#/c/5941/ If it looks like its going to take awhile to review I can still submit the revert, but its a pretty small fix.
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        commit 82290d61adc53f10342276e96e32e889385eecf8
        Author: Thomas Tauber-Marshall <tmarshall@cloudera.com>
        Date: Wed Feb 8 11:18:23 2017 -0800

        IMPALA-4895: Memory limit exceeded in test_outer_joins

        A recent change (IMPALA-3524) removed a 'CATCH' section for a
        mem limit exceeded error because the other changes in the patch
        reduced the memory requirements for that particular query and
        the error was no longer being hit.

        This seemed okay because the point of the test wasn't to trigger
        the mem limit exceeded error, and I manually verified that the
        situation was the test was addressing was still covered even
        without the error being hit.

        It turns out, though, that the test still hits the error in some
        situations (local-filesystem and non-partitioned-aggs-and-joins
        builds).

        The fix is to make the test more permissive by adding '_NO_ERROR'
        as one of the options in the 'CATCH: ANY_OF' section, so that it
        passes whether or not the mem limit is exceeded.

        Change-Id: I4731a3e83dd2142a1d83be963f83cd1847472295
        Reviewed-on: http://gerrit.cloudera.org:8080/5941
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        twmarshall Thomas Tauber-Marshall added a comment - commit 82290d61adc53f10342276e96e32e889385eecf8 Author: Thomas Tauber-Marshall <tmarshall@cloudera.com> Date: Wed Feb 8 11:18:23 2017 -0800 IMPALA-4895 : Memory limit exceeded in test_outer_joins A recent change ( IMPALA-3524 ) removed a 'CATCH' section for a mem limit exceeded error because the other changes in the patch reduced the memory requirements for that particular query and the error was no longer being hit. This seemed okay because the point of the test wasn't to trigger the mem limit exceeded error, and I manually verified that the situation was the test was addressing was still covered even without the error being hit. It turns out, though, that the test still hits the error in some situations (local-filesystem and non-partitioned-aggs-and-joins builds). The fix is to make the test more permissive by adding '_ NO_ERROR ' as one of the options in the 'CATCH: ANY_OF' section, so that it passes whether or not the mem limit is exceeded. Change-Id: I4731a3e83dd2142a1d83be963f83cd1847472295 Reviewed-on: http://gerrit.cloudera.org:8080/5941 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            twmarshall Thomas Tauber-Marshall
            Reporter:
            jbapple Jim Apple
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development