Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5115

Occasional crash in HdfsTableSink while using mod(cast(rand(7) * 1000000000 as int),2) as partition column

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Query

       insert into store_sales partition(ss_sold_date_sk) /*+ clustered,shuffle */ select *, mod(cast(rand(7) * 1000000000 as int),2) from store_sales_source
      

      version

      [vd1317.halxg.cloudera.com:21000] > version;
      Shell version: Impala Shell v2.8.0-cdh5.11.0 (e09660d) built on Tue Mar 21 04:21:59 PDT 2017
      Server version: impalad version 2.9.0-SNAPSHOT RELEASE (build 9d08cd4e192d964900a17715a98ea1928ef07ae8)
      [
      

      Target table DDL

      | CREATE EXTERNAL TABLE table_level_tests.store_sales (                                              |
      |   ss_sold_time_sk INT,                                                                             |
      |   ss_item_sk BIGINT,                                                                               |
      |   ss_customer_sk INT,                                                                              |
      |   ss_cdemo_sk INT,                                                                                 |
      |   ss_hdemo_sk INT,                                                                                 |
      |   ss_addr_sk INT,                                                                                  |
      |   ss_store_sk INT,                                                                                 |
      |   ss_promo_sk INT,                                                                                 |
      |   ss_ticket_number BIGINT,                                                                         |
      |   ss_quantity INT,                                                                                 |
      |   ss_wholesale_cost DECIMAL(7,2),                                                                  |
      |   ss_list_price DECIMAL(7,2),                                                                      |
      |   ss_sales_price DECIMAL(7,2),                                                                     |
      |   ss_ext_discount_amt DECIMAL(7,2),                                                                |
      |   ss_ext_sales_price DECIMAL(7,2),                                                                 |
      |   ss_ext_wholesale_cost DECIMAL(7,2),                                                              |
      |   ss_ext_list_price DECIMAL(7,2),                                                                  |
      |   ss_ext_tax DECIMAL(7,2),                                                                         |
      |   ss_coupon_amt DECIMAL(7,2),                                                                      |
      |   ss_net_paid DECIMAL(7,2),                                                                        |
      |   ss_net_paid_inc_tax DECIMAL(7,2),                                                                |
      |   ss_net_profit DECIMAL(7,2)                                                                       |
      | )                                                                                                  |
      | PARTITIONED BY (                                                                                   |
      |   ss_sold_date_sk INT                                                                              |
      | )                                                                                                  |
      | STORED AS PARQUET
      

      Stack

      (gdb) bt
      #0  0x0000003655832625 in raise () from /lib64/libc.so.6
      #1  0x0000003655833e05 in abort () from /lib64/libc.so.6
      #2  0x00007f1ec26d2a55 in os::abort(bool) () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #3  0x00007f1ec2852f87 in VMError::report_and_die() () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #4  0x00007f1ec26d796f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  impala::HdfsTableSink::WriteRowsToPartition (this=0x2136b900, state=0x213a0700, batch=0x217400a0, partition_pair=0x7f185da2e658)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/hdfs-table-sink.cc:281
      #7  0x0000000000d0069e in impala::HdfsTableSink::WriteClusteredRowBatch (this=0x2136b900, state=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/hdfs-table-sink.cc:328
      #8  0x0000000000d00e95 in impala::HdfsTableSink::Send (this=0x2136b900, state=0x213a0700, batch=0x217400a0)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/exec/hdfs-table-sink.cc:630
      #9  0x0000000000de2dbb in impala::PlanFragmentExecutor::ExecInternal (this=0x1484e2d0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:366
      #10 0x0000000000de4fcd in impala::PlanFragmentExecutor::Exec (this=0x1484e2d0) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/plan-fragment-executor.cc:337
      #11 0x0000000000dde2c8 in impala::FragmentInstanceState::Exec (this=0x1484e000) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/fragment-instance-state.cc:68
      #12 0x0000000000de75ff in impala::QueryExecMgr::ExecFInstance (this=0x112f85a0, fis=0x1484e000) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/query-exec-mgr.cc:110
      #13 0x0000000000bd9e29 in operator() (name=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:767
      #14 impala::Thread::SuperviseThread (name=Unhandled dwarf expression opcode 0xf3
      ) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/util/thread.cc:325
      #15 0x0000000000bda874 in operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> (this=0x233ed200)
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457
      #16 operator() (this=0x233ed200) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
      #17 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x233ed200)
      ---Type <return> to continue, or q <return> to quit---
          at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/boost-1.57.0-p1/include/boost/thread/detail/thread.hpp:116
      #18 0x0000000000e4631a in thread_proxy ()
      #19 0x0000003655c07aa1 in start_thread () from /lib64/libpthread.so.0
      #20 0x00000036558e893d in clone () from /lib64/libc.so.6
      
      1. 2ae3c77c-bc9e-7d3a-568ee4b2-458404da.dmp
        9.31 MB
        Mostafa Mokhtar
      2. stacks.txt.gz
        329 kB
        Lars Volker

        Activity

        Hide
        lv Lars Volker added a comment -

        Can you provide access to the core file? Do we still have the binaries to dump the symbols from to symbolize the minidump?

        Show
        lv Lars Volker added a comment - Can you provide access to the core file? Do we still have the binaries to dump the symbols from to symbolize the minidump?
        Hide
        lv Lars Volker added a comment -

        Attaching output of "thread apply all bt"

        Show
        lv Lars Volker added a comment - Attaching output of "thread apply all bt"
        Hide
        lv Lars Volker added a comment - - edited

        Here is what I think happened:

        The clustered insert with the random partition led to the sort node returning unsorted results. This led to unclustered data being written in the HdfsTableSink. Since the sink assumed clustered data it closed the previous partition every time it saw a new one, leading to a very large number of files being written (thousands). During these writes some errors occured:

        I0323 19:05:07.066632 88300 status.cc:114] Temporary HDFS file already exists: hdfs://vd1320.halxg.cloudera.com:8020/user/hive/warehouse/table_level_tests/store_sales/_impala_insert_staging/e34b788bf7ef8a0a_5736bbc400000000/.e34b788bf7ef8a0a-5736bbc400000001_128221507_dir/ss_sold_date_sk=1/e34b788bf7ef8a0a-5736bbc400000001_463566371_data.0.parq
        Error(2): No such file or directory
            @           0x8353d9  impala::Status::Status()
            @           0xcfeed4  impala::HdfsTableSink::CreateNewTmpFile()
            @           0xcff87f  impala::HdfsTableSink::InitOutputPartition()
            @           0xd03942  impala::HdfsTableSink::GetOutputPartition()
            @           0xd00743  impala::HdfsTableSink::WriteClusteredRowBatch()
            @           0xd00e95  impala::HdfsTableSink::Send()
            @           0xde2dbb  impala::PlanFragmentExecutor::ExecInternal()
            @           0xde4fcd  impala::PlanFragmentExecutor::Exec()
            @           0xdde2c8  impala::FragmentInstanceState::Exec()
            @           0xde75ff  impala::QueryExecMgr::ExecFInstance()
            @           0xbd9e29  impala::Thread::SuperviseThread()
            @           0xbda874  boost::detail::thread_data<>::run()
            @           0xe4631a  thread_proxy
            @       0x3655c07aa1  (unknown)
            @       0x36558e893d  (unknown)
        

        Another one from a different occurrence of the same problem:

        I0323 13:58:28.547646 48938 status.cc:114] Failed to get info on temporary HDFS file: hdfs://vd1320.halxg.cloudera.com:8020/user/hive/warehouse/table_level_tests/store_sales/_impala_insert_staging/634bb59e34eaa4ed_57dade8a00000000/.634bb59e34eaa4ed-57dade8a00000008_1870038242_dir/ss_sold_date_sk=82/634bb59e34eaa4ed-57dade8a00000008_872401052_data.0.parq
        Error(2): No such file or directory
            @           0x8353d9  impala::Status::Status()
            @           0xcff0e1  impala::HdfsTableSink::CreateNewTmpFile()
            @           0xcff87f  impala::HdfsTableSink::InitOutputPartition()
            @           0xd03942  impala::HdfsTableSink::GetOutputPartition()
            @           0xd00743  impala::HdfsTableSink::WriteClusteredRowBatch()
            @           0xd00e95  impala::HdfsTableSink::Send()
            @           0xde2dbb  impala::PlanFragmentExecutor::ExecInternal()
            @           0xde4fcd  impala::PlanFragmentExecutor::Exec()
            @           0xdde2c8  impala::FragmentInstanceState::Exec()
            @           0xde75ff  impala::QueryExecMgr::ExecFInstance()
            @           0xbd9e29  impala::Thread::SuperviseThread()
            @           0xbda874  boost::detail::thread_data<>::run()
            @           0xe4631a  thread_proxy
            @       0x3220e07aa1  (unknown)
            @       0x3220ae893d  (unknown)
        

        The error was not propagated upwards by the call to HdfsTableSink::WriteClusteredRowBatch(), leaving the HdfsTableSink in an inconsistent state. Subsequent calls to HdfsTableSink::Send() then crashed.

        To fix this we need to handle the status returned by HdfsTableSink::WriteClusteredRowBatch() correctly. I pushed a fix here: https://gerrit.cloudera.org/6486

        Even with this fix the query would fail in Debug mode, since we don't support calling HdfsTableSink::GetOutputPartition() again after closing a partition:

        588     // Save the partition name so that the coordinator can create the partition directory
        589     // structure if needed
        590     DCHECK(state->per_partition_status()->find(partition->partition_name) ==
        591         state->per_partition_status()->end());
        

        In release builds we don't hit this DCHECK and the queries succeed.

        Sending unclustered data to the sink will eventually be prevented by materializing non-deterministic sort exprs. Thomas Tauber-Marshall has a change out for review to address this: https://gerrit.cloudera.org/#/c/6322/

        Mostafa Mokhtar Do you know why these HDFS errors occurred? Are they expected as part of what you were doing?

        Show
        lv Lars Volker added a comment - - edited Here is what I think happened: The clustered insert with the random partition led to the sort node returning unsorted results. This led to unclustered data being written in the HdfsTableSink. Since the sink assumed clustered data it closed the previous partition every time it saw a new one, leading to a very large number of files being written (thousands). During these writes some errors occured: I0323 19:05:07.066632 88300 status.cc:114] Temporary HDFS file already exists: hdfs://vd1320.halxg.cloudera.com:8020/user/hive/warehouse/table_level_tests/store_sales/_impala_insert_staging/e34b788bf7ef8a0a_5736bbc400000000/.e34b788bf7ef8a0a-5736bbc400000001_128221507_dir/ss_sold_date_sk=1/e34b788bf7ef8a0a-5736bbc400000001_463566371_data.0.parq Error(2): No such file or directory @ 0x8353d9 impala::Status::Status() @ 0xcfeed4 impala::HdfsTableSink::CreateNewTmpFile() @ 0xcff87f impala::HdfsTableSink::InitOutputPartition() @ 0xd03942 impala::HdfsTableSink::GetOutputPartition() @ 0xd00743 impala::HdfsTableSink::WriteClusteredRowBatch() @ 0xd00e95 impala::HdfsTableSink::Send() @ 0xde2dbb impala::PlanFragmentExecutor::ExecInternal() @ 0xde4fcd impala::PlanFragmentExecutor::Exec() @ 0xdde2c8 impala::FragmentInstanceState::Exec() @ 0xde75ff impala::QueryExecMgr::ExecFInstance() @ 0xbd9e29 impala::Thread::SuperviseThread() @ 0xbda874 boost::detail::thread_data<>::run() @ 0xe4631a thread_proxy @ 0x3655c07aa1 (unknown) @ 0x36558e893d (unknown) Another one from a different occurrence of the same problem: I0323 13:58:28.547646 48938 status.cc:114] Failed to get info on temporary HDFS file: hdfs://vd1320.halxg.cloudera.com:8020/user/hive/warehouse/table_level_tests/store_sales/_impala_insert_staging/634bb59e34eaa4ed_57dade8a00000000/.634bb59e34eaa4ed-57dade8a00000008_1870038242_dir/ss_sold_date_sk=82/634bb59e34eaa4ed-57dade8a00000008_872401052_data.0.parq Error(2): No such file or directory @ 0x8353d9 impala::Status::Status() @ 0xcff0e1 impala::HdfsTableSink::CreateNewTmpFile() @ 0xcff87f impala::HdfsTableSink::InitOutputPartition() @ 0xd03942 impala::HdfsTableSink::GetOutputPartition() @ 0xd00743 impala::HdfsTableSink::WriteClusteredRowBatch() @ 0xd00e95 impala::HdfsTableSink::Send() @ 0xde2dbb impala::PlanFragmentExecutor::ExecInternal() @ 0xde4fcd impala::PlanFragmentExecutor::Exec() @ 0xdde2c8 impala::FragmentInstanceState::Exec() @ 0xde75ff impala::QueryExecMgr::ExecFInstance() @ 0xbd9e29 impala::Thread::SuperviseThread() @ 0xbda874 boost::detail::thread_data<>::run() @ 0xe4631a thread_proxy @ 0x3220e07aa1 (unknown) @ 0x3220ae893d (unknown) The error was not propagated upwards by the call to HdfsTableSink::WriteClusteredRowBatch() , leaving the HdfsTableSink in an inconsistent state. Subsequent calls to HdfsTableSink::Send() then crashed. To fix this we need to handle the status returned by HdfsTableSink::WriteClusteredRowBatch() correctly. I pushed a fix here: https://gerrit.cloudera.org/6486 Even with this fix the query would fail in Debug mode, since we don't support calling HdfsTableSink::GetOutputPartition() again after closing a partition: 588 // Save the partition name so that the coordinator can create the partition directory 589 // structure if needed 590 DCHECK(state->per_partition_status()->find(partition->partition_name) == 591 state->per_partition_status()->end()); In release builds we don't hit this DCHECK and the queries succeed. Sending unclustered data to the sink will eventually be prevented by materializing non-deterministic sort exprs. Thomas Tauber-Marshall has a change out for review to address this: https://gerrit.cloudera.org/#/c/6322/ Mostafa Mokhtar Do you know why these HDFS errors occurred? Are they expected as part of what you were doing?
        Hide
        lv Lars Volker added a comment -

        IMPALA-5115: Handle status from HdfsTableSink::WriteClusteredRowBatch

        Change-Id: Id6f5d87ae5e46d6019d9988011d3f2657e81ae15
        Reviewed-on: http://gerrit.cloudera.org:8080/6486
        Reviewed-by: Marcel Kornacker <marcel@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        lv Lars Volker added a comment - IMPALA-5115 : Handle status from HdfsTableSink::WriteClusteredRowBatch Change-Id: Id6f5d87ae5e46d6019d9988011d3f2657e81ae15 Reviewed-on: http://gerrit.cloudera.org:8080/6486 Reviewed-by: Marcel Kornacker <marcel@cloudera.com> Tested-by: Impala Public Jenkins
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Lars Volker Sailesh Mukil Dan Hecht
        Do we know what error HDFS returned when trying to create the file? would Impala return a 255?

        Show
        mmokhtar Mostafa Mokhtar added a comment - Lars Volker Sailesh Mukil Dan Hecht Do we know what error HDFS returned when trying to create the file? would Impala return a 255?
        Hide
        lv Lars Volker added a comment -

        Mostafa Mokhtar, I only have what's in the logs (see my previous comment). Especially the first error reads like HDFS always returns the same error code (2), which would explain the discrepancy between "file already exists" and "No such file or directory".

        I0323 19:05:07.066632 88300 status.cc:114] Temporary HDFS file already exists: hdfs://vd1320.halxg.cloudera.com:8020/user/hive/warehouse/table_level_tests/store_sales/_impala_insert_staging/e34b788bf7ef8a0a_5736bbc400000000/.e34b788bf7ef8a0a-5736bbc400000001_128221507_dir/ss_sold_date_sk=1/e34b788bf7ef8a0a-5736bbc400000001_463566371_data.0.parq
        Error(2): No such file or directory
        
        Show
        lv Lars Volker added a comment - Mostafa Mokhtar , I only have what's in the logs (see my previous comment). Especially the first error reads like HDFS always returns the same error code (2), which would explain the discrepancy between "file already exists" and "No such file or directory". I0323 19:05:07.066632 88300 status.cc:114] Temporary HDFS file already exists: hdfs://vd1320.halxg.cloudera.com:8020/user/hive/warehouse/table_level_tests/store_sales/_impala_insert_staging/e34b788bf7ef8a0a_5736bbc400000000/.e34b788bf7ef8a0a-5736bbc400000001_128221507_dir/ss_sold_date_sk=1/e34b788bf7ef8a0a-5736bbc400000001_463566371_data.0.parq Error(2): No such file or directory

          People

          • Assignee:
            lv Lars Volker
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development