Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20244

Incorrect input size in UI with pyspark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0, 2.1.0
    • 2.2.0
    • Web UI
    • None

    Description

      In Spark UI (Details for Stage) Input Size is 64.0 KB when running in PySparkShell.
      Also it is incorrect in Tasks table:
      64.0 KB / 132120575 in pyspark
      252.0 MB / 132120575 in spark-shell

      I will attach screenshots.

      Reproduce steps:
      Run this to generate big file (press Ctrl+C after 5-6 seconds)
      $ yes > /tmp/yes.txt
      $ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
      $ ./bin/pyspark

      Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
      [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      Setting default log level to "WARN".
      To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
            /_/
      
      Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
      SparkSession available as 'spark'.

      >>> a = sc.textFile("/tmp/yes.txt")
      >>> a.count()

      Open Spark UI and check Stage 0.

      Attachments

        1. sparkshell_correct_inputsize.png
          101 kB
          Artur Sukhenko
        2. pyspark_incorrect_inputsize.png
          100 kB
          Artur Sukhenko

        Activity

          People

            jerryshao Saisai Shao
            asukhenko Artur Sukhenko
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: