Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4997

crash when using sortby hint on a very large table

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      I have a 28.8 billion row table that I'm trying to sort into another table:

      insert into partsupp_sorted /*+ sortby(ps_availqty) */ select * from partsupp;

      This caused impalad to crash.
      Stack trace:

      #0  0x000000305cc32625 in raise () from /lib64/libc.so.6
      #1  0x000000305cc33e05 in abort () from /lib64/libc.so.6
      #2  0x00007f37b6a47c55 in os::abort(bool) () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #3  0x00007f37b6bc9cd7 in VMError::report_and_die() () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #4  0x00007f37b6a4cb6f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_75-cloudera/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  0x00007f37602580db in Compare ()
      #7  0x0000000000dbb9d1 in Compare (this=0xa633200, lhs=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/tuple-row-compare.h:99
      #8  Less (this=0xa633200, lhs=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/tuple-row-compare.h:108
      #9  impala::Sorter::TupleSorter::Less (this=0xa633200, lhs=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/sorter.cc:1144
      #10 0x0000000000dbbeae in impala::Sorter::TupleSorter::Partition (this=0xa633200, begin=..., end=..., pivot=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/sorter.cc:1222
      #11 0x0000000000dbe00d in impala::Sorter::TupleSorter::SortHelper (this=0xa633200, begin=..., end=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/sorter.cc:1249
      #12 0x0000000000dbe253 in impala::Sorter::TupleSorter::Sort (this=0xa633200, run=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/sorter.cc:1151
      #13 0x0000000000dbe334 in impala::Sorter::SortCurrentInputRun (this=0xa76aaa0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/sorter.cc:1478
      #14 0x0000000000dbf20b in impala::Sorter::InputDone (this=0xa76aaa0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/sorter.cc:1413
      #15 0x0000000000c9a8a2 in impala::SortNode::SortInput (this=0x214a7c00, state=0xa50bc00) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/sort-node.cc:166
      #16 0x0000000000c9b55e in impala::SortNode::Open (this=0x214a7c00, state=0xa50bc00) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/sort-node.cc:81
      #17 0x0000000000db3671 in impala::PlanFragmentExecutor::OpenInternal (this=0xa50b7b0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:324
      #18 0x0000000000db4db0 in impala::PlanFragmentExecutor::Open (this=0xa50b7b0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:296
      #19 0x0000000000daddd0 in impala::FragmentInstanceState::Exec (this=0xa50b500) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/fragment-instance-state.cc:65
      #20 0x0000000000db710f in impala::QueryExecMgr::ExecFInstance (this=0xad0e4e0, fis=0xa50b500) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/query-exec-mgr.cc:109
      #21 0x0000000000bbc374 in operator() (name=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:767
      #22 impala::Thread::SuperviseThread (name=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/thread.cc:317
      #23 0x0000000000bbcd54 in operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> (this=0xa9a1400)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457
      #24 operator() (this=0xa9a1400) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
      #25 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0xa9a1400)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/thread/detail/thread.hpp:116
      #26 0x0000000000e08dca in thread_proxy ()
      #27 0x000000305d0079d1 in start_thread () from /lib64/libpthread.so.0
      #28 0x000000305cce88fd in clone () from /lib64/libc.so.6

        Activity

        Hide
        lv Lars Volker added a comment -

        Interesting find. Does an order by limit clause work on the same table? The crash seems to occur in the sort node that gets inserted into the plan by the sortby hint and I'd be curious whether it happens with order by, too

        Show
        lv Lars Volker added a comment - Interesting find. Does an order by limit clause work on the same table? The crash seems to occur in the sort node that gets inserted into the plan by the sortby hint and I'd be curious whether it happens with order by, too
        Hide
        mmulder Matthew Mulder added a comment -

        Lars Volker See IMPALA-4995 for the results or trying the order by limit clause on this table. It had a different crash.

        Show
        mmulder Matthew Mulder added a comment - Lars Volker See IMPALA-4995 for the results or trying the order by limit clause on this table. It had a different crash.
        Hide
        lv Lars Volker added a comment -

        Does it also crash with an ORDER BY but without a LIMIT?

        Show
        lv Lars Volker added a comment - Does it also crash with an ORDER BY but without a LIMIT?
        Hide
        mmulder Matthew Mulder added a comment -
        Query: explain insert into partsupp_sorted /*+ sortby(ps_availqty) */ select * from partsupp
        +------------------------------------------------------------------------------------+
        | Explain String                                                                     |
        +------------------------------------------------------------------------------------+
        | Estimated Per-Host Requirements: Memory=1.00GB VCores=1                            |
        | WARNING: The following tables are missing relevant table and/or column statistics. |
        | impala_2328_no_minmax_meta.partsupp                                                |
        |                                                                                    |
        | WRITE TO HDFS [impala_2328_no_minmax_meta.partsupp_sorted, OVERWRITE=false]        |
        | |  partitions=1                                                                    |
        | |                                                                                  |
        | 01:SORT                                                                            |
        | |  order by: ps_availqty DESC NULLS LAST                                           |
        | |                                                                                  |
        | 00:SCAN HDFS [impala_2328_no_minmax_meta.partsupp]                                 |
        |    partitions=1/1 files=44052 size=318.36GB                                        |
        +------------------------------------------------------------------------------------+
        Show
        mmulder Matthew Mulder added a comment - Query: explain insert into partsupp_sorted /*+ sortby(ps_availqty) */ select * from partsupp +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=1.00GB VCores=1 | | WARNING: The following tables are missing relevant table and/or column statistics. | | impala_2328_no_minmax_meta.partsupp | | | | WRITE TO HDFS [impala_2328_no_minmax_meta.partsupp_sorted, OVERWRITE= false ] | | | partitions=1 | | | | | 01:SORT | | | order by: ps_availqty DESC NULLS LAST | | | | | 00:SCAN HDFS [impala_2328_no_minmax_meta.partsupp] | | partitions=1/1 files=44052 size=318.36GB | +------------------------------------------------------------------------------------+
        Hide
        lv Lars Volker added a comment -

        Matthew Mulder and I debugged this and this is actually several integer overflows in be/runtime/sorter.cc:1074ff. I'll push a fix.

        Show
        lv Lars Volker added a comment - Matthew Mulder and I debugged this and this is actually several integer overflows in be/runtime/sorter.cc:1074ff . I'll push a fix.
        Hide
        lv Lars Volker added a comment -

        IMPALA-4997: Fix overflows in Sorter::TupleIterator

        Various places in Sorter::TupleIterator multiply two int values
        (Sorter::Run::block_capacity_ and Sorter::TupleIterator::block_index_)
        and assigned the result to an int64_t value
        (Sorter::TupleIterator::buffer_start_index_). One such occurrence is in
        be/src/runtime/sorter.cc#L1080. This multiplication could overflow for
        runs with a large number of rows. Changing one of the operands to
        int64_t fixes this.

        To test this Matt Mulder ran the failing query from IMPALA-4997 on a
        private cluster and it succeeded.

        Change-Id: Iea22aa96e0cc86102b60c6e551e9e607cef485c8
        Reviewed-on: http://gerrit.cloudera.org:8080/6169
        Reviewed-by: Lars Volker <lv@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        lv Lars Volker added a comment - IMPALA-4997 : Fix overflows in Sorter::TupleIterator Various places in Sorter::TupleIterator multiply two int values (Sorter::Run::block_capacity_ and Sorter::TupleIterator::block_index_) and assigned the result to an int64_t value (Sorter::TupleIterator::buffer_start_index_). One such occurrence is in be/src/runtime/sorter.cc#L1080. This multiplication could overflow for runs with a large number of rows. Changing one of the operands to int64_t fixes this. To test this Matt Mulder ran the failing query from IMPALA-4997 on a private cluster and it succeeded. Change-Id: Iea22aa96e0cc86102b60c6e551e9e607cef485c8 Reviewed-on: http://gerrit.cloudera.org:8080/6169 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            lv Lars Volker
            Reporter:
            mmulder Matthew Mulder
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development