Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4828

Altering Kudu table schema outside of Impala may result in crash on read

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.8.0
    • Impala 2.9.0
    • Backend

    Description

      Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Neither Impala nor the Kudu client validates the schema immediately before reading, so Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name.

      In Impala:

      [localhost:21000] > create table t4 (i int primary key, s string) partition by hash(i) partitions 2 stored as kudu;
      Query: create table t4 (i int primary key, s string) partition by hash(i) partitions 2 stored as kudu
      
      Fetched 0 row(s) in 0.49s
      [localhost:21000] > insert into t4 values (-1, "foo");
      Query: insert into t4 values (-1, "foo")
      Query submitted at: 2017-01-26 09:24:49 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
      Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=1e4e1c4693ecfaee:966d7d8d00000000
      Modified 1 row(s), 0 row error(s) in 4.56s
      [localhost:21000] > select * from t4;
      Query: select * from t4
      Query submitted at: 2017-01-26 09:24:57 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
      Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=ff4a805116b35e18:1911962100000000
      +----+-----+
      | i  | s   |
      +----+-----+
      | -1 | foo |
      +----+-----+
      Fetched 1 row(s) in 0.13s
      

      In Python:

      client = kudu.connect("localhost")
      table = client.table("impala::default.t4")
      
      # Drop 's' which was a string col
      ta = client.new_table_alterer(table)
      ta.drop_column("s")
      table = ta.alter()
      
      # Add 's' as an int
      ta = client.new_table_alterer(table)
      ta.add_column("s", "int32")
      table = ta.alter()
      
      # Add some rows
      session = client.new_session()
      for i in range(100):
        op = table.new_insert((i, i))
        session.apply(op)
      session.flush()
      

      And finally reading again in Impala:

      Query: select * from t4
      Query submitted at: 2017-01-26 09:27:49 (Coordinator: http://mj-desktop.ca.cloudera.com:25000)
      Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=874c96e67e4d5f19:f32fdd400000000
      Socket error 104: Connection reset by peer
      [Not connected] > Goodbye mj
      

      Crash stack:

      Stack: [0x00007f67acea7000,0x00007f67ad6a8000],  sp=0x00007f67ad6a61f8,  free space=8188k
      Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      C  [libc.so.6+0x9807e]  envz_strip+0x7ae
      C  [libRuntime.so+0x5904b9]  impala::Tuple::DeepCopy(impala::Tuple*, impala::TupleDescriptor const&, impala::MemPool*)+0x61
      C  [libExec.so+0x574469]  impala::KuduScanner::DecodeRowsIntoRowBatch(impala::RowBatch*, impala::Tuple**)+0x1b7
      C  [libExec.so+0x57306b]  impala::KuduScanner::GetNext(impala::RowBatch*, bool*)+0x1a7
      C  [libExec.so+0x577a07]  impala::KuduScanNode::ProcessScanToken(impala::KuduScanner*, std::string const&)+0x141
      C  [libExec.so+0x577d21]  impala::KuduScanNode::RunScannerThread(std::string const&, std::string const*)+0x193
      C  [libExec.so+0x579cf6]  boost::_mfi::mf2<void, impala::KuduScanNode, std::string const&, std::string const*>::operator()(impala::KuduScanNode*, std::string const&, std::string const*) const+0x76
      C  [libExec.so+0x579b9e]  void boost::_bi::list3<boost::_bi::value<impala::KuduScanNode*>, boost::_bi::value<std::string>, boost::_bi::value<std::string const*> >::operator()<boost::_mfi::mf2<void, impala::KuduScanNode, std::string const&, std::string const*>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf2<void, impala::KuduScanNode, std::string const&, std::string const*>&, boost::_bi::list0&, int)+0x88
      C  [libExec.so+0x57971f]  boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::KuduScanNode, std::string const&, std::string const*>, boost::_bi::list3<boost::_bi::value<impala::KuduScanNode*>, boost::_bi::value<std::string>, boost::_bi::value<std::string const*> > >::operator()()+0x3b
      C  [libExec.so+0x579555]  boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::KuduScanNode, std::string const&, std::string const*>, boost::_bi::list3<boost::_bi::value<impala::KuduScanNode*>, boost::_bi::value<std::string>, boost::_bi::value<std::string const*> > >, void>::invoke(boost::detail::function::function_buffer&)+0x23
      C  [libUtil.so+0x2f3f20]  boost::function0<void>::operator()() const+0x52
      C  [libUtil.so+0x2f34c7]  impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()()>, impala::Promise<long>*)+0x2c5
      C  [libUtil.so+0x2fadbe]  void boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list0&, int)+0xb2
      C  [libUtil.so+0x2fad01]  boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()()+0x3b
      C  [libUtil.so+0x2fac5c]  boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()()> >, boost::_bi::value<impala::Promise<long>*> > > >::run()+0x1e
      

      We need to validate the schema when opening the kudu scan tokens (I believe it is guaranteed to work while we have a Kudu scan token open) and fail the query if the schema doesn't match anymore.

      Attachments

        Activity

          People

            mjacobs Matthew Jacobs
            mjacobs Matthew Jacobs
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: