Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
2.3.3
-
None
-
None
-
Important
Description
This issue is a clone of the (SPARK-29055). After Spark version 2.3.3, I observe that the JVM memory is increasing slightly overtime. This behavior also affects the application performance because when I run my real application in testing environment, after a while the persisted dataframes stop fitting into the executors memory and I have spill to disk.
JVM memory usage (based on htop command)
Time | RES | SHR | MEM% |
---|---|---|---|
1min | 1349 | 32724 | 1.5 |
3min | 1936 | 32724 | 2.2 |
5min | 2506 | 32724 | 2.6 |
7min | 2564 | 32724 | 2.7 |
9min | 2584 | 32724 | 2.7 |
11min | 2585 | 32724 | 2.7 |
13min | 2592 | 32724 | 2.7 |
15min | 2591 | 32724 | 2.7 |
17min | 2591 | 32724 | 2.7 |
30min | 2600 | 32724 | 2.7 |
1h | 2618 | 32724 | 2.7 |
HOW TO REPRODUCE THIS BEHAVIOR:
Reproduce the above behavior, by running the snippet code (I prefer to run without any sleeping delay) and track the JVM memory with top or htop command.
import time import os from pyspark.sql import SparkSession from pyspark.sql import functions as F from pyspark.sql import types as T target_dir = "..." spark=SparkSession.builder.appName("DataframeCount").getOrCreate() while True: for f in os.listdir(target_dir): df = spark.read.load(target_dir + f, format="csv") print("Number of records: {0}".format(df.count())) time.sleep(15)
TESTED CASES WITH THE SAME BEHAVIOUR:
- I tested with default settings (spark-defaults.conf)
- Add spark.cleaner.periodicGC.interval 1min (or less)
- Turn spark.cleaner.referenceTracking.blocking=false
- Run the application in cluster mode
- Increase/decrease the resources of the executors and driver
- I tested with extraJavaOptions in driver and executor -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
- It is also tested with the Spark 2.4.4 (latest) and had the same behavior.
DEPENDENCIES
- Operation system: Ubuntu 16.04.3 LTS
- Java: jdk1.8.0_131 (tested also with jdk1.8.0_221)
- Python: Python 2.7.12
Attachments
Attachments
Issue Links
- is a clone of
-
SPARK-29055 Spark UI storage memory increasing overtime
- Resolved