[SPARK-15157] String fields in Dataframe behaves weirdly when executor-memory >= 32GB - ASF JIRA

XML

Word

Printable

JSON

Consider the below snippet. This issue was faced on an infrastructure with CDH 5.5.2 (which ships with Spark 1.5.0).

test.scala

case class A(id: Int, name: String)
sc.parallelize(List(A(1, "varadharajan"), A(2, "thiyagu"))).toDF.show

When i execute the above snippet with the below command line arguments, it works as expected.

spark-shell --master yarn  --num-executors 4 --executor-cores 10 --executor-memory 31g

id name

1 varadharajan

2 thiyagu

But things become weird, when i increase executor memory beyond 32GB.

spark-shell --master yarn  --num-executors 4 --executor-cores 10 --executor-memory 32g

id name

1 ajan

2

duplicates

SPARK-15156 String fields in Dataframe behaves weirdly when executor-memory >= 32GB

is duplicated by

SPARK-15156 String fields in Dataframe behaves weirdly when executor-memory >= 32GB