Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21882

OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.1, 2.2.0
    • Fix Version/s: 2.3.4, 2.4.4, 3.0.0
    • Component/s: Spark Core
    • Labels:
      None
    • Flags:
      Patch

      Description

      The first job called from saveAsHadoopDataset, running in each executor, does not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 0). The reason is that we did not initialize the callback function called to find bytes written in the right way. As usual, statisticsTable which records statistics in a FileSystem must be initialized at the beginning (this will be triggered when open SparkHadoopWriter). The solution for this issue is to adjust the order of callback function initialization.

        Attachments

        1. SPARK-21882.patch
          0.7 kB
          linxiaojun

          Activity

            People

            • Assignee:
              linxiaojun linxiaojun
              Reporter:
              linxiaojunchina linxiaojun
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: