Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6932

Simple LIMIT 1 query can be really slow on many-filed sequence datasets

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 3.2.0
    • Component/s: Backend
    • Labels:
      None

      Description

      I recently ran across really slow behavior with the trivial SELECT * FROM table LIMIT 1 query. The table used Avro as a file format and had about 45,000 files across about 250 partitions. An optimization kicked in to set NUM_NODES to 1.

      The query ran for about an hour, and the profile indicated that it was opening files:

      • TotalRawHdfsOpenFileTime: 1.0h (3622833666032)
        I took a single minidump while this query was running, and I suspect the query was here:
        1 impalad!impala::ScannerContext::Stream::GetNextBuffer(long) [scanner-context.cc : 115 + 0x13]
        2 impalad!impala::ScannerContext::Stream::GetBytesInternal(long, unsigned char**, bool, long*) [scanner-context.cc : 241 + 0x5]
        3 impalad!impala::HdfsAvroScanner::ReadFileHeader() [scanner-context.inline.h : 54 + 0x1f]
        4 impalad!impala::BaseSequenceScanner::GetNextInternal(impala::RowBatch*) [base-sequence-scanner.cc : 157 + 0x13]
        5 impalad!impala::HdfsScanner::ProcessSplit() [hdfs-scanner.cc : 129 + 0xc]
        6 impalad!impala::HdfsScanNode::ProcessSplit(std::vector<impala::FilterContext, std::allocator<impala::FilterContext> > const&, impala::MemPool*, impala::io::ScanRange*) [hdfs-scan-node.cc : 527 + 0x17]
        7 impalad!impala::HdfsScanNode::ScannerThread() [hdfs-scan-node.cc : 437 + 0x1c]
        8 impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*) [function_template.hpp : 767 + 0x7]

         

        Attachments

          Activity

            People

            • Assignee:
              poojanilangekar Pooja Nilangekar
              Reporter:
              philip Philip Martin
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: