Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5419

PhjBuilder::Partition::InsertBatch () continue to make progress even after query cancellation

    Details

    • Epic Color:
      ghx-label-1

      Description

      Canceling query while remote fragments are in PhjBuilder::Partition::InsertBatch kills the query but some of the remote fragments continue to spin indefinitely and doesn't release memory back.

      stack for stuck thread

      Thread 1 (process 26668):
      #0  0x00007f8f0a138500 in impala::PhjBuilder::Partition::InsertBatch ()
      #1  0x0000000000d37f72 in impala::PhjBuilder::Partition::BuildHashTable(bool*) ()
      #2  0x0000000000d38d65 in impala::PhjBuilder::BuildHashTablesAndPrepareProbeStreams() ()
      #3  0x0000000000d392a0 in impala::PhjBuilder::FlushFinal(impala::RuntimeState*) ()
      #4  0x0000000000d20447 in impala::Status impala::BlockingJoinNode::SendBuildInputToSink<true>(impala::RuntimeState*, impala::DataSink*) ()
      #5  0x0000000000d1ed5f in impala::BlockingJoinNode::ProcessBuildInputAsync(impala::RuntimeState*, impala::DataSink*, impala::Status*) ()
      #6  0x0000000000bec7d9 in impala::Thread::SuperviseThread(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::Promise<long>*) ()
      #7  0x0000000000bed194 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() ()
      #8  0x0000000000e540ea in ?? ()
      #9  0x000000316da07aa1 in start_thread () from /lib64/libpthread.so.0
      #10 0x000000316d6e893d in clone () from /lib64/libc.so.6
      

      Query

      SELECT /* +straight_join */ count(*)
      FROM lineitem
      JOIN /* +broadcast */ supplier ON l_suppkey = s_suppkey
      JOIN /* +broadcast */ part ON l_partkey = p_partkey
      JOIN /* +broadcast */ orders ON l_orderkey = o_orderkey
      WHERE s_name='Supplier#001880004'
        AND p_brand < 'Brand#30'
        AND o_orderdate < '1994-01-01'
      

      Snapshot from memz page

      Process: Limit=201.73 GB Total=148.77 GB Peak=182.63 GB
        Free Disk IO Buffers: Total=1.88 GB Peak=1.88 GB
        RequestPool=root.default: Total=133.39 GB Peak=167.99 GB
          Query(d84d4f69e2366d60:72d5ad1800000000): Total=133.39 GB Peak=167.99 GB
            Fragment d84d4f69e2366d60:72d5ad18000001d1: Total=39.18 GB Peak=163.04 GB
              Runtime Filter Bank: Total=4.01 MB Peak=4.01 MB
              AGGREGATION_NODE (id=7): Total=40.00 KB Peak=8.51 MB
                Exprs: Total=4.00 KB Peak=4.00 KB
              HASH_JOIN_NODE (id=6): Total=39.17 GB Peak=39.17 GB
                Hash Join Builder (join_node_id=6): Total=39.17 GB Peak=39.17 GB
              HASH_JOIN_NODE (id=5): Total=50.25 KB Peak=129.25 GB
                Hash Join Builder (join_node_id=5): Total=13.12 KB Peak=129.24 GB
              HASH_JOIN_NODE (id=4): Total=42.25 KB Peak=1.05 MB
                Hash Join Builder (join_node_id=4): Total=13.12 KB Peak=1.01 MB
              HDFS_SCAN_NODE (id=0): Total=0 Peak=2.24 GB
              EXCHANGE_NODE (id=8): Total=0 Peak=0
              DataStreamRecvr: Total=0 Peak=4.01 KB
              EXCHANGE_NODE (id=9): Total=0 Peak=0
              DataStreamRecvr: Total=0 Peak=29.98 MB
              EXCHANGE_NODE (id=10): Total=0 Peak=0
              DataStreamRecvr: Total=0 Peak=16.39 MB
              DataStreamSender (dst_id=11): Total=16.00 KB Peak=16.00 KB
              CodeGen: Total=22.08 KB Peak=3.56 MB
            Block Manager: Limit=161.39 GB Total=133.38 GB Peak=161.38 GB
          Query(73440af14cf09128:21f5ed1400000000): Total=31.27 KB Peak=216.54 MB
            Fragment 73440af14cf09128:21f5ed140000003a: Total=31.27 KB Peak=216.54 MB
              AGGREGATION_NODE (id=1): Total=8.00 KB Peak=1.89 MB
                Exprs: Total=4.00 KB Peak=4.00 KB
              HDFS_SCAN_NODE (id=0): Total=0 Peak=216.45 MB
              DataStreamSender (dst_id=2): Total=14.22 KB Peak=14.22 KB
              CodeGen: Total=1.05 KB Peak=188.00 KB
            Block Manager: Limit=161.39 GB Total=0 Peak=0
        Untracked Memory: Total=13.50 GB
      

      From the impalad.INFO log

      [mmokhtar@va1026 impalad]$ cat  impalad.va1026.foo.net.impala.log.INFO.20170531-170204.96499  | grep d84d4f69e2366d60:72d5ad1800000000 | head -n 500
      I0601 00:45:10.883314 96906 admission-controller.cc:442] Schedule for id=d84d4f69e2366d60:72d5ad1800000000 in pool_name=root.default cluster_mem_needed=17179.57 GB PoolConfig: max_requests=100 max_queued=200 max_mem=-1.00 B
      I0601 00:45:10.891855 96906 admission-controller.cc:453] Admitted query id=d84d4f69e2366d60:72d5ad1800000000
      I0601 00:45:10.903158 96906 coordinator.cc:121] Exec() query_id=d84d4f69e2366d60:72d5ad1800000000 stmt=-- Description : Selective broadcast joins between lineitem, supplier, part and orders.
      I0601 00:45:10.935317 96906 coordinator.cc:383] starting execution on 130 backends for query d84d4f69e2366d60:72d5ad1800000000
      I0601 00:45:10.996497 135837 impala-internal-service.cc:44] ExecQueryFInstances(): query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 00:45:10.996796 135837 query-exec-mgr.cc:46] StartQueryFInstances() query_id=d84d4f69e2366d60:72d5ad1800000000 coord=va1026.foo.net:22000
      I0601 00:45:10.997879 58950 query-state.cc:263] StartFInstances(): query_id=d84d4f69e2366d60:72d5ad1800000000 #instances=1
      I0601 00:45:11.011160 58950 query-state.cc:276] descriptor table for query=d84d4f69e2366d60:72d5ad1800000000
      I0601 00:45:11.012342 58951 query-state.cc:325] Executing instance. instance_id=d84d4f69e2366d60:72d5ad1800000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1
      I0601 00:45:11.034893 58950 query-exec-mgr.cc:138] ReleaseQueryState(): query_id=d84d4f69e2366d60:72d5ad1800000000 refcnt=3
      I0601 00:45:12.325824 96906 coordinator.cc:396] started execution on 130 backends for query d84d4f69e2366d60:72d5ad1800000000
      I0601 00:45:12.353492 96906 impala-server.cc:926] Query d84d4f69e2366d60:72d5ad1800000000 has timeout of 10m
      I0601 02:42:08.973990 115288 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:09.172730 11880 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:09.579278 127412 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:11.640924 112391 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:19.061241 115418 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:19.554117 112209 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:29.405820 127414 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:30.752419 145416 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:36.649454 111018 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:42.374414 123586 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:47.766850 143075 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:42:56.262635 11526 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:43:00.584983 132521 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 02:43:40.918494 132978 coordinator.cc:983] query_id=d84d4f69e2366d60:72d5ad1800000000: first in-progress backend: va1026.foo.net:22000
      I0601 08:34:45.016852 106703 impala-server.cc:937] UnregisterQuery(): query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.017141 106703 impala-server.cc:1022] Cancel(): query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.248535 106703 coordinator.cc:920] Cancel() query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.248984 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1026.foo.net:22000
      I0601 08:34:45.249454 120728 impala-internal-service.cc:63] CancelQueryFInstances(): query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.249811 120728 query-exec-mgr.cc:83] QueryState: query_id=d84d4f69e2366d60:72d5ad1800000000 refcnt=3
      I0601 08:34:45.250104 120728 query-state.cc:342] Cancel: query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.250383 120728 data-stream-mgr.cc:258] cancelling all streams for fragment=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.250669 120728 data-stream-recvr.cc:237] cancelled stream: fragment_instance_id_=d84d4f69e2366d60:72d5ad1800000000 node_id=11
      I0601 08:34:45.250993 120728 query-exec-mgr.cc:138] ReleaseQueryState(): query_id=d84d4f69e2366d60:72d5ad1800000000 refcnt=3
      I0601 08:34:45.251452 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1029.foo.net:22000
      I0601 08:34:45.253317 58951 data-stream-mgr.cc:226] DeregisterRecvr(): fragment_instance_id=d84d4f69e2366d60:72d5ad1800000000, node=11
      I0601 08:34:45.255867 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1030.foo.net:22000
      I0601 08:34:45.266269 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1031.foo.net:22000
      I0601 08:34:45.273082 58951 query-state.cc:332] Instance completed. instance_id=d84d4f69e2366d60:72d5ad1800000000 #in-flight=0 status=CANCELLED: Cancelled
      I0601 08:34:45.273423 58951 query-state.cc:342] Cancel: query_id=d84d4f69e2366d60:72d5ad1800000000
      I0601 08:34:45.273550 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1032.foo.net:22000
      I0601 08:34:45.273679 58951 query-exec-mgr.cc:138] ReleaseQueryState(): query_id=d84d4f69e2366d60:72d5ad1800000000 refcnt=2
      I0601 08:34:45.278386 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1033.foo.net:22000
      I0601 08:34:45.283445 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1034.foo.net:22000
      I0601 08:34:45.291430 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1035.foo.net:22000
      I0601 08:34:45.296653 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1036.foo.net:22000
      I0601 08:34:45.301268 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1037.foo.net:22000
      I0601 08:34:45.306454 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1038.foo.net:22000
      I0601 08:34:45.309927 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1039.foo.net:22000
      I0601 08:34:45.315886 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1041.foo.net:22000
      I0601 08:34:45.321213 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1042.foo.net:22000
      I0601 08:34:45.328903 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1043.foo.net:22000
      I0601 08:34:45.333417 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1045.foo.net:22000
      I0601 08:34:45.338429 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=va1046.foo.net:22000
      I0601 08:34:45.344604 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1501.foo.net:22000
      I0601 08:34:45.348870 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1502.foo.net:22000
      I0601 08:34:45.355267 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1503.foo.net:22000
      I0601 08:34:45.360085 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1504.foo.net:22000
      I0601 08:34:45.365375 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1505.foo.net:22000
      I0601 08:34:45.371361 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1507.foo.net:22000
      I0601 08:34:45.380437 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1508.foo.net:22000
      I0601 08:34:45.384851 106703 coordinator-backend-state.cc:332] sending CancelQueryFInstances rpc for query_id=d84d4f69e2366d60:72d5ad1800000000 backend=vc1509.foo.net:22000
      

        Activity

        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Tried to repro and I was able to, yet the remote fragments don't spin indefinitely.
        The amount of time the resources will be held is dependent on how much data was inserted into the hash table by ProcessBuildBatch, so impala::PhjBuilder::Partition::BuildHashTable will run until completion and not indefinitely.

        Show
        mmokhtar Mostafa Mokhtar added a comment - Tried to repro and I was able to, yet the remote fragments don't spin indefinitely. The amount of time the resources will be held is dependent on how much data was inserted into the hash table by ProcessBuildBatch, so impala::PhjBuilder::Partition::BuildHashTable will run until completion and not indefinitely.
        Hide
        kwho Michael Ho added a comment -

        Yes, we are probably missing a RETURN_IF_CANCELLED() in the loop of BuildHashTable(). FWIW, the code has the same problem even as far back as 2.5.0

        Show
        kwho Michael Ho added a comment - Yes, we are probably missing a RETURN_IF_CANCELLED() in the loop of BuildHashTable(). FWIW, the code has the same problem even as far back as 2.5.0
        Hide
        tarmstrong Tim Armstrong added a comment -

        I took a quick look at the git history and this bug appears to have been present since at least Impala 2.3.0

        Show
        tarmstrong Tim Armstrong added a comment - I took a quick look at the git history and this bug appears to have been present since at least Impala 2.3.0
        Hide
        kwho Michael Ho added a comment -

        https://github.com/apache/incubator-impala/commit/cde19ab8c7801436070ce0438e28d5042265dfd1

        IMPALA-5419: Check for cancellation when building hash tables
        When inserting build rows into the hash table for each partition
        in PhjBuilder::Partition::BuildHashTable(), we didn't check for
        cancellation. This may lead to orphaned fragments running for
        excessive amount of time and consuming a lot of memory after the
        query has been cancelled.

        This change fixes the problem by checking for cancellation per
        row batch in the loop inside PhjBuilder::Partition::BuildHashTable().

        Manually verified with the following query that cancelling the query
        right after all build rows have been returned by the scan node will
        not cause memory usage to continue going up.

        select /* +straight_join */ count
        from tpch20_parquet.lineitem t1
        join /* +broadcast */ tpch20_parquet.lineitem t2
        on t1.l_orderkey = t2.l_orderkey

        Change-Id: I8047f532f55dc0118f7a843c91275f752c8a190d
        Reviewed-on: http://gerrit.cloudera.org:8080/7047
        Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
        Reviewed-by: Dan Hecht <dhecht@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        kwho Michael Ho added a comment - https://github.com/apache/incubator-impala/commit/cde19ab8c7801436070ce0438e28d5042265dfd1 IMPALA-5419 : Check for cancellation when building hash tables When inserting build rows into the hash table for each partition in PhjBuilder::Partition::BuildHashTable(), we didn't check for cancellation. This may lead to orphaned fragments running for excessive amount of time and consuming a lot of memory after the query has been cancelled. This change fixes the problem by checking for cancellation per row batch in the loop inside PhjBuilder::Partition::BuildHashTable(). Manually verified with the following query that cancelling the query right after all build rows have been returned by the scan node will not cause memory usage to continue going up. select /* +straight_join */ count from tpch20_parquet.lineitem t1 join /* +broadcast */ tpch20_parquet.lineitem t2 on t1.l_orderkey = t2.l_orderkey Change-Id: I8047f532f55dc0118f7a843c91275f752c8a190d Reviewed-on: http://gerrit.cloudera.org:8080/7047 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            kwho Michael Ho
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development