Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2213

Parquet read can fail if file metadata is stale

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.2.4
    • Impala 2.3.0, Impala 2.2.8
    • None
    • None

    Description

      This will cause the same DCHECK to fail as IMPALA-1291, but the cause is different.

      IMPALA-1291 will cause failures if the --read_size impalad flag is set very low (below 100KB, default is 8MB), and will fail consistently over any files that are > 100KB in size.

      This failure occurs if Impala thinks the file is longer than it actually is, e.g., the file was rewritten and "invalidate metadata" was not called. This could appear as sporadic crashes if the workload involves concurrent reads and writes to the same table.

      Here's the stack trace from the DCHECK:

      F0922 12:20:03.093201 24325 hdfs-parquet-scanner.cc:897] Check failed: stream_->eosr() 
      *** Check failure stack trace: ***
          @           0xf251dd  google::LogMessage::Fail()
          @           0xf2797f  google::LogMessage::SendToLog()
          @           0xf24dcb  google::LogMessage::Flush()
          @           0xf2820d  google::LogMessageFatal::~LogMessageFatal()
          @     0x7f68a73e9f25  impala::HdfsParquetScanner::ProcessFooter()
          @     0x7f68a73e8af6  impala::HdfsParquetScanner::ProcessSplit()
          @     0x7f68a7390e8d  impala::HdfsScanNode::ScannerThread()
          @     0x7f68a73a8dae  boost::_mfi::mf0<>::operator()()
          @     0x7f68a73a8436  boost::_bi::list1<>::operator()<>()
          @     0x7f68a73a7093  boost::_bi::bind_t<>::operator()()
          @     0x7f68a73a594b  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @     0x7f68a5be9d6c  boost::function0<>::operator()()
          @     0x7f68a47cd33c  impala::Thread::SuperviseThread()
          @     0x7f68a47d5d6a  boost::_bi::list4<>::operator()<>()
          @     0x7f68a47d5cb3  boost::_bi::bind_t<>::operator()()
          @     0x7f68a47d5c46  boost::detail::thread_data<>::run()
          @     0x7f68a3cf8ce9  (unknown)
          @     0x7f68a3ad6e9a  start_thread
          @     0x7f68a116accd  (unknown)
      

      Attachments

        Issue Links

          Activity

            People

              skye Skye Wanderman-Milne
              skye Skye Wanderman-Milne
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: