Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11350

There is no best practice to handle warnings or messages produced by Executors in a distributed manner

    XMLWordPrintableJSON

    Details

    • Type: Wish
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core

      Description

      I looked around on the web and I couldn’t find any way to deal, in a distributed way with malformed/faulty records during computation. All I was able to find was the flatMap/Some/None technique + logging.
      I’m facing this problem because I have a processing algorithm that extracts more than one value from each record, but can fail in extracting one of those multiple values, and I want to keep track of them. Logging is not feasible because this “warning” happens so frequently that the logs would become overwhelming and impossibile to read.
      Since I have 3 different possible outcomes from my processing I modeled it with this class hierarchy:

      http://i.imgur.com/NIesYUm.png?1

      That holds result and/or warnings. Since Result implements Traversable it can be used in a flatMap, discarding all warnings and failure results, in the other hand, if we want to keep track of warnings, we can elaborate them and output them if we need.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tmnd91 Antonio Murgia
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: