Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40912

Overhead of Exceptions in DeserializationStream

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.3.0
    • 3.5.0
    • Spark Core
    • None

    Description

      The interface of DeserializationStream forces implementation to raise EOFException to indicate that there is no more data. And for the KryoDeserializtionStream it even worse since the kryo library does not raise EOFException we pay for the price of two exceptions for each stream. For large shuffles with lots of small stream this is quite a bit large overhead (seen couple % of cpu time). It also less safe to depend exceptions as it might me raised for different reasons like corrupt data and that currently cause data loss.

      Attachments

        Activity

          People

            eejbyfeldt Emil Ejbyfeldt
            eejbyfeldt Emil Ejbyfeldt
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: