Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Product Backlog
-
None
-
None
Description
The possibility of a busy spinning loop for parquet and base-sequence scanner threads can happen during these windows:
1) Parquet - after the footer range is taken by a scanner thread, but before the call to scan_node_->MarkFileDescIssued(desc) in HdfsParquetScanner::ProcessFooter(). Note that there are potential blocking disk IO in this window. I see no reason this window can't be shrunk by moving the MarkFileDesIssued() call to right after the footer ranges are issued in IssueInitialRanges() [the thread for the footer range will process the rest of the file anyway so we can consider the file issued at this point].
2) base-sequence - after the header range is taken by a scanner thread, but before the file range has been issued in BaseSequenceScanner::ProcessSplit() by the call scan_node_->AddDiskIoRanges(desc). There are potential blocking (remote) disk IOs. This one doesn't look as straight forward to deal with.
Those windows are somewhat mitigated by the fact that there are usually multiple files being scanned and they won't have perfectly overlapping windows (and so there's a good chance there are unstarted ranges which means scanner threads will block rather than busy loop), but it still seems worth shrinking/eliminating these windows as well.
For additional context see IMPALA-1722.
Attachments
Issue Links
- is duplicated by
-
IMPALA-2901 Impala query using 100% CPU system state
- Resolved