Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.1, Impala 2.2, Impala 2.3.0
-
None
Description
If the Parquet scanner encounters an error while materializing rows, it will hit a DCHECK producing a FATAL log message like this:
F1015 14:48:19.447789 31131 hdfs-parquet-scanner.cc:1552] Check failed: continue_execution
This only affects debug builds, release builds do not include DCHECKs and the rest of the logic is correct. The DCHECK is also only triggered for certain error cases that happen after most of the error-checking has occurred, e.g., bad file metadata will not trigger the DCHECK, but failure to read the middle of a Parquet file could. The only workaround is to run a release build.
Example of stack included in INFO log:
W1015 13:55:21.861934 12024 DFSInputStream.java:976] DFS chooseDataNode: got # 2 IOException, will wait for 8386.246989009956 msec. W1015 13:55:23.448084 12027 DFSInputStream.java:802] Found Checksum error for BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK] at 5758976 W1015 13:55:23.450142 12027 DFSInputStream.java:802] Found Checksum error for BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK] at 5758976 W1015 13:55:23.452246 12027 DFSInputStream.java:802] Found Checksum error for BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK] at 5758976 I1015 13:55:23.452486 12027 DFSInputStream.java:960] Could not obtain BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 from any node: java.io.IOException: No live nodes contain block BP-943058948-127.0.1.1-1442461085565:blk_1073747601_6777 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK], DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK], DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK] DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK] DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK] Dead nodes: DatanodeInfoWithStorage[127.0.0.1:31002,DS-08ceffe7-2425-46e4-9a68-8eb63fdf23aa,DISK] DatanodeInfoWithStorage[127.0.0.1:31000,DS-5e1902ad-76d9-4338-bc84-2134bed2bdc2,DISK] DatanodeInfoWithStorage[127.0.0.1:31001,DS-25bd285c-c9b3-4e03-b041-5ad2fb5a3f53,DISK]. Will get new block locations from namenode and retry... W1015 13:55:23.452569 12027 DFSInputStream.java:976] DFS chooseDataNode: got # 2 IOException, will wait for 5645.947022951619 msec. I1015 13:55:29.138447 12027 status.cc:112] Error reading from HDFS file: hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/customer/000002_0 Error(255): Unknown error 255 @ 0x7f26e5e07e2f impala::Status::Status() @ 0x7f26e3eeeba5 impala::DiskIoMgr::ScanRange::Read() @ 0x7f26e3ed594c impala::DiskIoMgr::ReadRange() @ 0x7f26e3ed4dda impala::DiskIoMgr::WorkLoop() @ 0x7f26e3ee5226 boost::_mfi::mf1<>::operator()() @ 0x7f26e3ee4ce7 boost::_bi::list2<>::operator()<>() @ 0x7f26e3ee421e boost::_bi::bind_t<>::operator()() @ 0x7f26e3ee3408 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f26e3f1d515 boost::function0<>::operator()() @ 0x7f26e2669209 impala::Thread::SuperviseThread() @ 0x7f26e2672ad7 boost::_bi::list4<>::operator()<>() @ 0x7f26e26729f8 boost::_bi::bind_t<>::operator()() @ 0x7f26e26729ac boost::detail::thread_data<>::run() @ 0x7f26e1a7b09a (unknown) @ 0x7f26e0f1d6aa start_thread @ 0x7f26df0e3eed (unknown) I1015 13:55:29.138649 12443 runtime-state.cc:229] Error from query 234d2c541fd7e4df:5db2930c66c57e9b: Error reading from HDFS file: hdfs://localhost:20500/test-warehouse/tpch_nested_parquet.db/customer/000002_0 Error(255): Unknown error 255 F1015 13:55:29.138742
Workaround
Use the RELEASE build.