Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2558

Hit DCHECK in parquet scanner after block read error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.1, Impala 2.2, Impala 2.3.0
    • Impala 2.5.0, Impala 2.3.2
    • None

    Description

      If the Parquet scanner encounters an error while materializing rows, it will hit a DCHECK producing a FATAL log message like this:

      F1015 14:48:19.447789 31131 hdfs-parquet-scanner.cc:1552] Check failed: continue_execution
      

      This only affects debug builds, release builds do not include DCHECKs and the rest of the logic is correct. The DCHECK is also only triggered for certain error cases that happen after most of the error-checking has occurred, e.g., bad file metadata will not trigger the DCHECK, but failure to read the middle of a Parquet file could. The only workaround is to run a release build.

      Example of stack included in INFO log:

      W1015 13:55:21.861934 12024 DFSInputStream.java:976] DFS chooseDataNode: got # 2 IOException, will wait for 8386.246989009956 msec.
      W1015 13:55:23.448084 12027 DFSInputStream.java:802] Found Checksum error for BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK] at 5758976
      W1015 13:55:23.450142 12027 DFSInputStream.java:802] Found Checksum error for BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK] at 5758976
      W1015 13:55:23.452246 12027 DFSInputStream.java:802] Found Checksum error for BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK] at 5758976
      I1015 13:55:23.452486 12027 DFSInputStream.java:960] Could not obtain BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from any node: java.io.IOException: No live nodes contain block BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK], DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK], DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK] DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK] DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK] Dead nodes:  DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK] DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK] DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK]. Will get new block locations from namenode and retry...
      W1015 13:55:23.452569 12027 DFSInputStream.java:976] DFS chooseDataNode: got # 2 IOException, will wait for 5645.947022951619 msec.
      I1015 13:55:29.138447 12027 status.cc:112] Error reading from HDFS file: hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/customer/000002_0
      Error(255): Unknown error 255
          @     0x7f26e5e07e2f  impala::Status::Status()
          @     0x7f26e3eeeba5  impala::DiskIoMgr::ScanRange::Read()
          @     0x7f26e3ed594c  impala::DiskIoMgr::ReadRange()
          @     0x7f26e3ed4dda  impala::DiskIoMgr::WorkLoop()
          @     0x7f26e3ee5226  boost::_mfi::mf1<>::operator()()
          @     0x7f26e3ee4ce7  boost::_bi::list2<>::operator()<>()
          @     0x7f26e3ee421e  boost::_bi::bind_t<>::operator()()
          @     0x7f26e3ee3408  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @     0x7f26e3f1d515  boost::function0<>::operator()()
          @     0x7f26e2669209  impala::Thread::SuperviseThread()
          @     0x7f26e2672ad7  boost::_bi::list4<>::operator()<>()
          @     0x7f26e26729f8  boost::_bi::bind_t<>::operator()()
          @     0x7f26e26729ac  boost::detail::thread_data<>::run()
          @     0x7f26e1a7b09a  (unknown)
          @     0x7f26e0f1d6aa  start_thread
          @     0x7f26df0e3eed  (unknown)
      I1015 13:55:29.138649 12443 runtime-state.cc:229] Error from query 234d2c541fd7e4df:5db2930c66c57e9b: Error reading from HDFS file: hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/customer/000002_0
      Error(255): Unknown error 255
      F1015 13:55:29.138742
      

      Workaround
      Use the RELEASE build.

      Attachments

        Activity

          People

            skye Skye Wanderman-Milne
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: