Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      I wanted to sort my table of 28.8 billion rows, so I ran

      create table partsupp_sorted stored as parquet as select * from partsupp order by ps_availqty;

      Impala eventually finished but warned

      WARNINGS: Ignoring ORDER BY clause without LIMIT or OFFSET: ORDER BY ps_availqty ASC.
      An ORDER BY appearing in a view, subquery, union operand, or an insert/ctas statement has no effect on the query result unless a LIMIT and/or OFFSET is used in conjunction with the ORDER BY.

      The table was not sorted. So I did the logical thing: I dropped the table and reran my query with the limit clause.

      create table partsupp_sorted stored as parquet as select * from partsupp order by ps_availqty limit 28800000000;

      After working for about half an hour impalad crashed.
      It looks like be/src/exec/topn-node.cc

      int index = sorted_top_n_.size() - 1;

      is the culprit. An int doesn't go up to 28800000000, so it probably wound up as a negative value which caused the SIGABRT when trying to set the array value in

      sorted_top_n_[index] = tuple;

      Stack trace

      #0  0x000000305cc32625 in raise () from /lib64/libc.so.6
      #1  0x000000305cc33e05 in abort () from /lib64/libc.so.6
      #2  0x00007f634b192c55 in os::abort(bool) () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #3  0x00007f634b314cd7 in VMError::report_and_die() () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #4  0x00007f634b31525e in crash_handler(int, siginfo*, void*) () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #5  0x00007f634b191df2 in os::Linux::chained_handler(int, siginfo*, void*) () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #6  0x00007f634b197ad6 in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #7  <signal handler called>
      #8  0x00007f634b1896f1 in os::is_first_C_frame(frame*) () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #9  0x00007f634b3133cd in VMError::report(outputStream*) () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #10 0x00007f634b3148da in VMError::report_and_die() () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #11 0x00007f634b197b6f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #12 <signal handler called>
      #13 0x0000000000c9fd01 in impala::TopNNode::PrepareForOutput (this=0x3778dcc0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/topn-node.cc:229
      #14 0x0000000000ca0b48 in impala::TopNNode::Open (this=0x3778dcc0, state=0x45230a00) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/topn-node.cc:167
      #15 0x0000000000db3671 in impala::PlanFragmentExecutor::OpenInternal (this=0x45231ab0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:324
      #16 0x0000000000db4db0 in impala::PlanFragmentExecutor::Open (this=0x45231ab0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:296
      #17 0x0000000000daddd0 in impala::FragmentInstanceState::Exec (this=0x45231800) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/fragment-instance-state.cc:65
      #18 0x0000000000db710f in impala::QueryExecMgr::ExecFInstance (this=0x99d00c0, fis=0x45231800) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/query-exec-mgr.cc:109
      #19 0x0000000000bbc374 in operator() (name=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:767
      #20 impala::Thread::SuperviseThread (name=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/thread.cc:317
      #21 0x0000000000bbcd54 in operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> (this=0x31e4e400)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457
      #22 operator() (this=0x31e4e400) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
      #23 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (
          this=0x31e4e400) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/thread/detail/thread.hpp:116
      #24 0x0000000000e08dca in thread_proxy ()
      #25 0x000000305d0079d1 in start_thread () from /lib64/libpthread.so.0
      #26 0x000000305cce88fd in clone () from /lib64/libc.so.6
      
      1. stack_trace.txt
        4 kB
        Matthew Mulder

        Issue Links

          Activity

          Hide
          lv Lars Volker added a comment -

          IMPALA-4995: Fix integer overflow in TopNNode::PrepareForOutput

          To test this, Matt Mulder ran the failing query from IMPALA-4995 on a
          private cluster and it did not crash. However the query did not finish
          within several hours. We should switch to using the Sorter for large
          TopN queries, as tracked by IMPALA-5004.

          Change-Id: I5048ec67d8f086346220d56e027e6583fbb5ddad
          Reviewed-on: http://gerrit.cloudera.org:8080/6171
          Reviewed-by: Lars Volker <lv@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          lv Lars Volker added a comment - IMPALA-4995 : Fix integer overflow in TopNNode::PrepareForOutput To test this, Matt Mulder ran the failing query from IMPALA-4995 on a private cluster and it did not crash. However the query did not finish within several hours. We should switch to using the Sorter for large TopN queries, as tracked by IMPALA-5004 . Change-Id: I5048ec67d8f086346220d56e027e6583fbb5ddad Reviewed-on: http://gerrit.cloudera.org:8080/6171 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins
          Hide
          lv Lars Volker added a comment -

          Opened IMPALA-5004 to track switching from TopN to Sort.

          Show
          lv Lars Volker added a comment - Opened IMPALA-5004 to track switching from TopN to Sort.
          Hide
          tarmstrong Tim Armstrong added a comment -

          There's already logic in the fe to select Sort versus TopN: https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L289 so I think it should be straightforward. There's already a query option disable_outermost_topn that can force use of Sort.

          Probably makes sense to track that in a separate JIRA.

          Show
          tarmstrong Tim Armstrong added a comment - There's already logic in the fe to select Sort versus TopN: https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L289 so I think it should be straightforward. There's already a query option disable_outermost_topn that can force use of Sort. Probably makes sense to track that in a separate JIRA.
          Hide
          lv Lars Volker added a comment -

          Tim Armstrong, how much effort would switching to the sort operator be? I pushed a change to address the overflow, should we create a separate Jira to switch the operator?

          Show
          lv Lars Volker added a comment - Tim Armstrong , how much effort would switching to the sort operator be? I pushed a change to address the overflow, should we create a separate Jira to switch the operator?
          Hide
          tarmstrong Tim Armstrong added a comment -

          This definitely looks like a bug, but we should also consider switching to the sort operator for large limits. This allows it to spill. The memory requirements for TopN also are problematic for large limits, since it would allocate large vectors that are untracked and also require a large amount of contiguous memory.

          Show
          tarmstrong Tim Armstrong added a comment - This definitely looks like a bug, but we should also consider switching to the sort operator for large limits. This allows it to spill. The memory requirements for TopN also are problematic for large limits, since it would allocate large vectors that are untracked and also require a large amount of contiguous memory.

            People

            • Assignee:
              lv Lars Volker
              Reporter:
              mmulder Matthew Mulder
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development