Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9955

Internal error for a query with large rows and spilling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
    • Impala 4.0.0
    • Backend
    • None
    • ghx-label-11

    Description

      Encounter a query failure due to internal error:

      create table bigstrs stored as parquet as select *, repeat(uuid(), cast(random() * 100000 as int)) as bigstr from functional.alltypes;
      set MAX_ROW_SIZE=3.5MB;
      set MEM_LIMIT=4GB;
      set DISABLE_CODEGEN=true;
      create table my_cnt stored as parquet as select count(*) as cnt, bigstr from bigstrs group by bigstr;
      

      The error is

      ERROR: Internal error: couldn't pin large page of 4194304 bytes, client only had 2097152 bytes of unused reservation:
      <BufferPool::Client> 0xcf9dae0 internal state: {<BufferPool::Client> 0xbdf6ac0 name: GroupingAggregator id=3 ptr=0xcf9d900 write_status:  buffers allocated 2097152 num_pages: 2094 pinned_bytes: 41943040 dirty_unpinned_bytes: 0 in_flight_write_bytes: 0 reservation: {<ReservationTracker>: reservation_limit 9223372036854775807 reservation 46137344 used_reservation 44040192 child_reservations 0 parent:
      <ReservationTracker>: reservation_limit 9223372036854775807 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
      <ReservationTracker>: reservation_limit 9223372036854775807 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
      <ReservationTracker>: reservation_limit 3435973836 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
      <ReservationTracker>: reservation_limit 6647046144 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
      NULL}
        12 pinned pages: <BufferPool::Page> 0xc9160a0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0xc916118 client: 0xcf9dae0/0xbdf6ac0 data: 0x13200000 len: 2097152
      <BufferPool::Page> 0xc919d40 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xc919db8 client: 0xcf9dae0/0xbdf6ac0 data: 0x124600000 len: 4194304
      <BufferPool::Page> 0xd42aaa0 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42ab18 client: 0xcf9dae0/0xbdf6ac0 data: 0x12b200000 len: 4194304
      <BufferPool::Page> 0xd42b900 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42b978 client: 0xcf9dae0/0xbdf6ac0 data: 0x132a00000 len: 4194304
      <BufferPool::Page> 0xd42d3e0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42d458 client: 0xcf9dae0/0xbdf6ac0 data: 0xc6a00000 len: 2097152
      <BufferPool::Page> 0xd42dd40 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42ddb8 client: 0xcf9dae0/0xbdf6ac0 data: 0x132e00000 len: 4194304
      <BufferPool::Page> 0xd42de80 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42def8 client: 0xcf9dae0/0xbdf6ac0 data: 0x137c00000 len: 4194304
      <BufferPool::Page> 0x12d48320 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d48398 client: 0xcf9dae0/0xbdf6ac0 data: 0x102c00000 len: 4194304
      <BufferPool::Page> 0x12d483c0 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d48438 client: 0xcf9dae0/0xbdf6ac0 data: 0x108a00000 len: 4194304
      <BufferPool::Page> 0x12d48780 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d487f8 client: 0xcf9dae0/0xbdf6ac0 data: 0x108e00000 len: 4194304
      <BufferPool::Page> 0x12d492c0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d49338 client: 0xcf9dae0/0xbdf6ac0 data: 0x127600000 len: 2097152
      <BufferPool::Page> 0x12d4a9e0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d4aa58 client: 0xcf9dae0/0xbdf6ac0 data: 0x12d200000 len: 2097152
      
        0 dirty unpinned pages: 
        0 in flight write pages: }
      

      Found the stacktrace from the log:

          @          0x1c9dfbe  impala::Status::Status()
          @          0x1ca5a78  impala::Status::Status()
          @          0x2bfe4ec  impala::BufferedTupleStream::NextReadPage()
          @          0x2c04b72  impala::BufferedTupleStream::GetNextInternal<>()
          @          0x2c029e6  impala::BufferedTupleStream::GetNextInternal<>()
          @          0x2bffd19  impala::BufferedTupleStream::GetNext()
          @          0x28aa43f  impala::GroupingAggregator::ProcessStream<>()
          @          0x28a2e17  impala::GroupingAggregator::BuildSpilledPartition()
          @          0x28a2401  impala::GroupingAggregator::NextPartition()
          @          0x289df5a  impala::GroupingAggregator::GetRowsFromPartition()
          @          0x289db20  impala::GroupingAggregator::GetNext()
          @          0x28dbfc7  impala::AggregationNode::GetNext()
          @          0x2259dfc  impala::FragmentInstanceState::ExecInternal()
          @          0x22564a0  impala::FragmentInstanceState::Exec()
          @          0x22801ed  impala::QueryState::ExecFInstance()
          @          0x227e5ef  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
          @          0x2281d8e  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
          @          0x204d7d1  boost::function0<>::operator()()
          @          0x26702d5  impala::Thread::SuperviseThread()
          @          0x2678272  boost::_bi::list5<>::operator()<>()
          @          0x2678196  boost::_bi::bind_t<>::operator()()
          @          0x2678157  boost::detail::thread_data<>::run()
          @          0x3e45d71  thread_proxy
          @     0x7fc8339656b9  start_thread
          @     0x7fc8305314dc  clone
      

      From the codes, the comment indicates that there is a bug:

      Status BufferedTupleStream::NextReadPage(ReadIterator* read_iter) {
        ...
        int64_t read_page_len = read_iter->read_page_->len();
        if (!pinned_ && read_page_len > default_page_len_
            && buffer_pool_client_->GetUnusedReservation() < read_page_len) {
          // If we are iterating over an unpinned stream and encounter a page that is larger
          // than the default page length, then unpinning the previous page may not have
          // freed up enough reservation to pin the next one. The client is responsible for
          // ensuring the reservation is available, so this indicates a bug.
          return Status(TErrorCode::INTERNAL_ERROR, Substitute("Internal error: couldn't pin "
                "large page of $0 bytes, client only had $1 bytes of unused reservation:\n$2",
                read_page_len, buffer_pool_client_->GetUnusedReservation(),
                buffer_pool_client_->DebugString()));
        }
      

      Attachments

        1. impalad_node1.INFO
          258 kB
          Quanlong Huang
        2. impalad_node2.INFO
          252 kB
          Quanlong Huang
        3. impalad.INFO
          362 kB
          Quanlong Huang

        Activity

          People

            stigahuang Quanlong Huang
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: