[SPARK-23040] BlockStoreShuffleReader's return Iterator isn't interruptible if aggregator or ordering is specified - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
Fix Version/s: 2.3.1, 2.4.0
Component/s: Spark Core
Labels:
None

Description

For example, if ordering is specified, the returned iterator is an CompletionIterator

    dep.keyOrdering match {
      case Some(keyOrd: Ordering[K]) =>
        // Create an ExternalSorter to sort the data.
        val sorter =
          new ExternalSorter[K, C, C](context, ordering = Some(keyOrd), serializer = dep.serializer)
        sorter.insertAll(aggregatedIter)
        context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled)
        context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled)
        context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes)
        CompletionIterator[Product2[K, C], Iterator[Product2[K, C]]](sorter.iterator, sorter.stop())
      case None =>
        aggregatedIter
    }

However the sorter would consume(in sorter.insertAll) the aggregatedIter(which may be interruptible), then creates an iterator which isn't interruptible.

The problem with this is that Spark task cannot be cancelled due to stage fail(without interruptThread enabled, which is disabled by default), which wasting executor resource.

Attachments

Issue Links

links to

[Github] Pull Request #20449 (advancedxy)

[Github] Pull Request #20920 (jiangxb1987)

[Github] Pull Request #20954 (jiangxb1987)

Activity

People

Assignee:: YE

Reporter:: YE

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Jan/18 09:29

Updated:: 01/Apr/18 15:00

Resolved:: 05/Mar/18 22:58