Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:
    • Environment:
      centos 6

      Description

      log.WARNING.20170105-180851.32662:F0105 19:26:32.863729 1907 sorter.cc:1085] Check failed: index_ < run->num_tuples() (7486 vs. 7486) Can only advance one past end of run

      log.ERROR.20170105-202624.26390:F0105 21:27:56.650339 25273 sorter.cc:1103] Check failed: index_ >= 0 (-1 vs. 0) Can only advance one before start of run

      (gdb) bt
      #0 0x000000314b832635 in raise () from /lib64/libc.so.6
      #1 0x000000314b833e15 in abort () from /lib64/libc.so.6
      #2 0x00007fd8667a67b9 in google::DumpStackTraceAndExit () at src/utilities.cc:147
      #3 0x00007fd86679e5bd in google::LogMessage::Fail () at src/logging.cc:1315
      #4 0x00007fd8667a030d in google::LogMessage::SendToLog (this=<optimized out>) at src/logging.cc:1269
      #5 0x00007fd86679e0ca in google::LogMessage::Flush (this=this@entry=0x7fd6c8389ef0) at src/logging.cc:1138
      #6 0x00007fd8667a0dce in google::LogMessageFatal::~LogMessageFatal (this=0x7fd6c8389ef0, __in_chrg=<optimized out>) at src/logging.cc:1836
      #7 0x0000000001f4878e in impala::Sorter::TupleIterator::Next (this=0x7fd6c8389f40, run=0x7fcfb74a0100, tuple_size=263)
      at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1085
      #8 0x0000000001f443f2 in impala::Sorter::TupleSorter::Partition (this=0x7fd5190b6000, begin=..., end=..., pivot=0x7fd2fd7c7418, cut=0x7fd6c838a0b0)
      at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1219
      #9 0x0000000001f44753 in impala::Sorter::TupleSorter::SortHelper (this=0x7fd5190b6000, begin=..., end=...) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1248
      #10 0x0000000001f44943 in impala::Sorter::TupleSorter::SortHelper (this=0x7fd5190b6000, begin=..., end=...) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1257
      #11 0x0000000001f44943 in impala::Sorter::TupleSorter::SortHelper (this=0x7fd5190b6000, begin=..., end=...) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1257
      #12 0x0000000001f44943 in impala::Sorter::TupleSorter::SortHelper (this=0x7fd5190b6000, begin=..., end=...) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1257
      #13 0x0000000001f43c88 in impala::Sorter::TupleSorter::Sort (this=0x7fd5190b6000, run=0x7fcfb74a0100) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1150
      #14 0x0000000001f46b5f in impala::Sorter::SortCurrentInputRun (this=0x3f3b3700) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1477
      #15 0x0000000001f46267 in impala::Sorter::InputDone (this=0x3f3b3700) at /export/ldb/online/impala_gitlab/be/src/runtime/sorter.cc:1412
      #16 0x0000000001c85fee in impala::SortNode::SortInput (this=0xd11f660, state=0x951f0800) at /export/ldb/online/impala_gitlab/be/src/exec/sort-node.cc:166
      #17 0x0000000001c85202 in impala::SortNode::Open (this=0xd11f660, state=0x951f0800) at /export/ldb/online/impala_gitlab/be/src/exec/sort-node.cc:81
      #18 0x0000000001f38f99 in impala::PlanFragmentExecutor::OpenInternal (this=0x7fd550c0d3f0) at /export/ldb/online/impala_gitlab/be/src/runtime/plan-fragment-executor.cc:321
      #19 0x0000000001f38a8c in impala::PlanFragmentExecutor::Open (this=0x7fd550c0d3f0) at /export/ldb/online/impala_gitlab/be/src/runtime/plan-fragment-executor.cc:293
      #20 0x00000000019e2b84 in impala::FragmentMgr::FragmentExecState::Exec (this=0x7fd550c0d000) at /export/ldb/online/impala_gitlab/be/src/service/fragment-exec-state.cc:61
      #21 0x00000000019dc112 in impala::FragmentMgr::FragmentThread (this=0xa324880, fragment_instance_id=...) at /export/ldb/online/impala_gitlab/be/src/service/fragment-mgr.cc:90
      #22 0x00000000019dfeba in boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>::operator() (this=0x88fd8960, p=0xa324880, a1=...)
      at /usr/local/include/boost/bind/mem_fn_template.hpp:165
      #23 0x00000000019dfc77 in boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> >::operator()<boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0> (this=0x88fd8970, f=..., a=...) at /usr/local/include/boost/bind/bind.hpp:313
      #24 0x00000000019df5a1 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >::operator() (this=0x88fd8960) at /usr/local/include/boost/bind/bind_template.hpp:20
      #25 0x00000000019def34 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >, void>::invoke (function_obj_ptr=...)
      at /usr/local/include/boost/function/function_template.hpp:153
      #26 0x00000000017e8504 in boost::function0<void>::operator() (this=0x7fd6c838ba40) at /usr/local/include/boost/function/function_template.hpp:767
      #27 0x0000000001aa0ad3 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) (name=..., category=...,
      functor=..., thread_started=0x7fd80c62ab00) at /export/ldb/online/impala_gitlab/be/src/util/thread.cc:317

      1. hs_err_pid31407.log
        461 kB
        fishing
      2. TimLine截图20170106210007.png
        35 kB
        fishing

        Issue Links

          Activity

          Hide
          tarmstrong Tim Armstrong added a comment -

          Any idea what query caused this? And which Impala commit is your branch based on?

          Show
          tarmstrong Tim Armstrong added a comment - Any idea what query caused this? And which Impala commit is your branch based on?
          Hide
          tarmstrong Tim Armstrong added a comment -

          I think we can only hit this if the comparison function is broken.

          Do you have the fix for IMPALA-4436 in your branch?

          Show
          tarmstrong Tim Armstrong added a comment - I think we can only hit this if the comparison function is broken. Do you have the fix for IMPALA-4436 in your branch?
          Hide
          fish0515_impala_49b1 fishing added a comment -

          Hi tim apche branch d68b4f5f0f3b185a634f49fc926704df113ff009

          IMPALA-4436 had fixed

          Show
          fish0515_impala_49b1 fishing added a comment - Hi tim apche branch d68b4f5f0f3b185a634f49fc926704df113ff009 IMPALA-4436 had fixed
          Hide
          fish0515_impala_49b1 fishing added a comment -

          some server coredump breaktrace :

          hs_err file: hs_err_pid31407.log

          impalad.log have nothing useful info

          Show
          fish0515_impala_49b1 fishing added a comment - some server coredump breaktrace : hs_err file: hs_err_pid31407.log impalad.log have nothing useful info
          Hide
          fish0515_impala_49b1 fishing added a comment -

          Many days did not update the online code, many query contain have order by
          Yesterday afternoon began several times Crash with two types of errors : 1. crash at dcheck 2. NPE

          Show
          fish0515_impala_49b1 fishing added a comment - Many days did not update the online code, many query contain have order by Yesterday afternoon began several times Crash with two types of errors : 1. crash at dcheck 2. NPE
          Hide
          tarmstrong Tim Armstrong added a comment -

          I think this can happen if the sort expression is non-deterministic. There's an assumption in Sorter::Partition() that a row will never be < itself.

          Show
          tarmstrong Tim Armstrong added a comment - I think this can happen if the sort expression is non-deterministic. There's an assumption in Sorter::Partition() that a row will never be < itself.
          Hide
          tarmstrong Tim Armstrong added a comment -

          This is somewhat related to order by random(), since I think that could trigger this bug.

          Show
          tarmstrong Tim Armstrong added a comment - This is somewhat related to order by random(), since I think that could trigger this bug.
          Hide
          tarmstrong Tim Armstrong added a comment -

          This also relates to IMPALA-4728: part of the problem here is that we lazily evaluate the sort expressions. We would avoid the problem if we materialized the sort keys upfront.

          Show
          tarmstrong Tim Armstrong added a comment - This also relates to IMPALA-4728 : part of the problem here is that we lazily evaluate the sort expressions. We would avoid the problem if we materialized the sort keys upfront.
          Hide
          twmarshall Thomas Tauber-Marshall added a comment -

          I agree with Tim Armstrong that this is probably caused by sorting on a non-deterministic expression.

          FWIW, I think its very unlikely (perhaps impossible) to see this happen with "order by random()" with our current implementation of random() due to weird interactions between the way Sorter works and the way we seed random(), but I have repro-ed it with a modified version of random().

          I see a couple of ways of addressing this issue:
          1. Remove the DCHECK and replace it with an error message/failing the query. Probably the easiest solution, but somewhat hacky.

          2. Disallow ordering on non-deterministic exprs. We don't really lose any functionality since the current behavior in this situation doesn't make sense even if you don't hit the crash. Its easy to do for builtins that we know are non-deterministic, the hard part would be giving clients a way to mark udfs as non-deterministic, but we need that anyways (eg. for constant folding). Doesn't help poorly written udfs that weren't intended to be non-deterministic (though that could be addressed by also doing option 1).

          3. Actually address IMPALA-4728. Solves this issue in a more elegant way, and potentially gives a general performance boost. I'm guessing that this is probably a pretty large and invasive change, though, and would need some design work, thinking about the tradeoffs with memory usage, etc.

          4. Something else I haven't thought of.

          I don't have a strong opinion between those options.

          Show
          twmarshall Thomas Tauber-Marshall added a comment - I agree with Tim Armstrong that this is probably caused by sorting on a non-deterministic expression. FWIW, I think its very unlikely (perhaps impossible) to see this happen with "order by random()" with our current implementation of random() due to weird interactions between the way Sorter works and the way we seed random(), but I have repro-ed it with a modified version of random(). I see a couple of ways of addressing this issue: 1. Remove the DCHECK and replace it with an error message/failing the query. Probably the easiest solution, but somewhat hacky. 2. Disallow ordering on non-deterministic exprs. We don't really lose any functionality since the current behavior in this situation doesn't make sense even if you don't hit the crash. Its easy to do for builtins that we know are non-deterministic, the hard part would be giving clients a way to mark udfs as non-deterministic, but we need that anyways (eg. for constant folding). Doesn't help poorly written udfs that weren't intended to be non-deterministic (though that could be addressed by also doing option 1). 3. Actually address IMPALA-4728 . Solves this issue in a more elegant way, and potentially gives a general performance boost. I'm guessing that this is probably a pretty large and invasive change, though, and would need some design work, thinking about the tradeoffs with memory usage, etc. 4. Something else I haven't thought of. I don't have a strong opinion between those options.
          Hide
          tarmstrong Tim Armstrong added a comment -

          For #1 - I think we'd need to restructure the code a bit, since I think the while(Less()) loop can run off the end of the array before we hit the DCHECK.

          I'd vote against #2 since I think reliably determining whether exprs are deterministic is a way off - we'd need UDF annotations at a minimum. I also think that even then, we shouldn't crash if an annotation is wrong, so we'd need one of the other solutions anyway.

          #3 may not be that bad to implement (famous last words). I looked at this a while back and I don't think any changes to the backend are required. The backend code is already pretty general - it materializes a new tuple from a list of exprs, then evaluates the ordering exprs over the tuple. Lazy or eager evaluation just depends on how those exprs are set up, which is done this function: https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/analysis/SortInfo.java#L156 . The tricky thing is deciding which strategy to use, but to me it seems like it's usually better to materialize the exprs unless they're trivial and the underlying slotref is needed for another ordering or output expr.

          Show
          tarmstrong Tim Armstrong added a comment - For #1 - I think we'd need to restructure the code a bit, since I think the while(Less()) loop can run off the end of the array before we hit the DCHECK. I'd vote against #2 since I think reliably determining whether exprs are deterministic is a way off - we'd need UDF annotations at a minimum. I also think that even then, we shouldn't crash if an annotation is wrong, so we'd need one of the other solutions anyway. #3 may not be that bad to implement (famous last words). I looked at this a while back and I don't think any changes to the backend are required. The backend code is already pretty general - it materializes a new tuple from a list of exprs, then evaluates the ordering exprs over the tuple. Lazy or eager evaluation just depends on how those exprs are set up, which is done this function: https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/analysis/SortInfo.java#L156 . The tricky thing is deciding which strategy to use, but to me it seems like it's usually better to materialize the exprs unless they're trivial and the underlying slotref is needed for another ordering or output expr.
          Hide
          jbapple Jim Apple added a comment -

          Just checking on this since it's a P1 for 2.9.0 – how is this going?

          Show
          jbapple Jim Apple added a comment - Just checking on this since it's a P1 for 2.9.0 – how is this going?
          Hide
          twmarshall Thomas Tauber-Marshall added a comment -

          Just submitted a review: https://gerrit.cloudera.org/#/c/5914/

          Show
          twmarshall Thomas Tauber-Marshall added a comment - Just submitted a review: https://gerrit.cloudera.org/#/c/5914/
          Hide
          twmarshall Thomas Tauber-Marshall added a comment -

          We've decided to put more careful thought into the design here, to try to avoid potential performance regressions. I've downgraded this from a blocker because even though its a crash, its not a regression and its a corner case thats unlikely to affect many people.

          Show
          twmarshall Thomas Tauber-Marshall added a comment - We've decided to put more careful thought into the design here, to try to avoid potential performance regressions. I've downgraded this from a blocker because even though its a crash, its not a regression and its a corner case thats unlikely to affect many people.
          Hide
          twmarshall Thomas Tauber-Marshall added a comment -

          commit 6cddb952cefedd373b2a1ce71a1b3cff2e774d70
          Author: Thomas Tauber-Marshall <tmarshall@cloudera.com>
          Date: Tue Jan 31 10:33:07 2017 -0800

          IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs

          Previously, exprs used in sorts were evaluated lazily. This can
          potentially be bad for performance if the exprs are expensive to
          evaluate, and it can lead to crashes if the exprs are
          non-deterministic, as this violates assumptions of our sorting
          algorithm.

          This patch addresses these issues by materializing ordering exprs.
          It does so when the expr is non-deterministic (including when it
          contains a UDF, which we cannot currently know if they are
          non-deterministic), or when its cost exceeds a threshold (or the
          cost is unknown).

          Testing:

          • Added e2e tests in test_sort.py.
          • Updated planner tests.

          Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
          Reviewed-on: http://gerrit.cloudera.org:8080/6322
          Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          twmarshall Thomas Tauber-Marshall added a comment - commit 6cddb952cefedd373b2a1ce71a1b3cff2e774d70 Author: Thomas Tauber-Marshall <tmarshall@cloudera.com> Date: Tue Jan 31 10:33:07 2017 -0800 IMPALA-4731 / IMPALA-397 / IMPALA-4728 : Materialize sort exprs Previously, exprs used in sorts were evaluated lazily. This can potentially be bad for performance if the exprs are expensive to evaluate, and it can lead to crashes if the exprs are non-deterministic, as this violates assumptions of our sorting algorithm. This patch addresses these issues by materializing ordering exprs. It does so when the expr is non-deterministic (including when it contains a UDF, which we cannot currently know if they are non-deterministic), or when its cost exceeds a threshold (or the cost is unknown). Testing: Added e2e tests in test_sort.py. Updated planner tests. Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Reviewed-on: http://gerrit.cloudera.org:8080/6322 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              twmarshall Thomas Tauber-Marshall
              Reporter:
              fish0515_impala_49b1 fishing
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development