Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1730

Reduce the window of spinning for Parquet and base-sequence scanners

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Product Backlog
    • Impala 2.3.0
    • None
    • None

    Description

      The possibility of a busy spinning loop for parquet and base-sequence scanner threads can happen during these windows:

      1) Parquet - after the footer range is taken by a scanner thread, but before the call to scan_node_->MarkFileDescIssued(desc) in HdfsParquetScanner::ProcessFooter(). Note that there are potential blocking disk IO in this window. I see no reason this window can't be shrunk by moving the MarkFileDesIssued() call to right after the footer ranges are issued in IssueInitialRanges() [the thread for the footer range will process the rest of the file anyway so we can consider the file issued at this point].

      2) base-sequence - after the header range is taken by a scanner thread, but before the file range has been issued in BaseSequenceScanner::ProcessSplit() by the call scan_node_->AddDiskIoRanges(desc). There are potential blocking (remote) disk IOs. This one doesn't look as straight forward to deal with.

      Those windows are somewhat mitigated by the fact that there are usually multiple files being scanned and they won't have perfectly overlapping windows (and so there's a good chance there are unstarted ranges which means scanner threads will block rather than busy loop), but it still seems worth shrinking/eliminating these windows as well.

      For additional context see IMPALA-1722.

      Attachments

        Issue Links

          Activity

            People

              dhecht Daniel Hecht
              ippokratis Ippokratis Pandis
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: