[SPARK-30443] "Managed memory leak detected" even with no calls to take() or limit() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.3.2, 2.4.4, 3.0.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

Our Spark code is causing a "Managed memory leak detected" warning to appear, even though we are not calling take() or limit().

According to ~~SPARK-14168~~ https://issues.apache.org/jira/browse/SPARK-14168 managed memory leaks should only be caused by not reading an iterator to completion, i.e. take() or limit()

Our exact warning text is: "2020-01-06 14:54:59 WARN Executor:66 - Managed memory leak detected; size = 2097152 bytes, TID = 118"
The size of the managed memory leak is always 2MB.

I have created a minimal test program that reproduces the warning:

import pyspark.sql
import pyspark.sql.functions as fx


def main():
    builder = pyspark.sql.SparkSession.builder
    builder = builder.appName("spark-jira")
    spark = builder.getOrCreate()

    reader = spark.read
    reader = reader.format("csv")
    reader = reader.option("inferSchema", "true")
    reader = reader.option("header", "true")

    table_c = reader.load("c.csv")
    table_a = reader.load("a.csv")
    table_b = reader.load("b.csv")

    primary_filter = fx.col("some_code").isNull()

    new_primary_data = table_a.filter(primary_filter)

    new_ids = new_primary_data.select("some_id")

    new_data = table_b.join(new_ids, "some_id")

    new_data = new_data.select("some_id")
    result = table_c.join(new_data, "some_id", "left")

    result.repartition(1).write.json("results.json", mode="overwrite")

    spark.stop()


if __name__ == "__main__":
    main()

Our code isn't anything out of the ordinary, just some filters, selects and joins.

The input data is made up of 3 CSV files. The input data files are quite large, roughly 2.6GB in total uncompressed. I attempted to reduce the number of rows in the CSV input files but this caused the warning to no longer appear. After compressing the files I was able to attach them below.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

a.csv.zip
07/Jan/20 16:31
48.82 MB
Luke Richter
b.csv.zip
07/Jan/20 16:30
18.87 MB
Luke Richter
c.csv.zip
07/Jan/20 16:31
35.26 MB
Luke Richter

Activity

People

Assignee:: Unassigned

Reporter:: Luke Richter

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Jan/20 23:25

Updated:: 01/Apr/20 15:17