Description
In Spark UI (Details for Stage) Input Size is 64.0 KB when running in PySparkShell.
Also it is incorrect in Tasks table:
64.0 KB / 132120575 in pyspark
252.0 MB / 132120575 in spark-shell
I will attach screenshots.
Reproduce steps:
Run this to generate big file (press Ctrl+C after 5-6 seconds)
$ yes > /tmp/yes.txt
$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
$ ./bin/pyspark
Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Python version 2.7.5 (default, Nov 6 2016 00:28:07) SparkSession available as 'spark'.
>>> a = sc.textFile("/tmp/yes.txt")
>>> a.count()
Open Spark UI and check Stage 0.