Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15157

String fields in Dataframe behaves weirdly when executor-memory >= 32GB

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 1.5.0
    • None
    • SQL
    • None

    Description

      Consider the below snippet. This issue was faced on an infrastructure with CDH 5.5.2 (which ships with Spark 1.5.0).

      test.scala
      case class A(id: Int, name: String)
      sc.parallelize(List(A(1, "varadharajan"), A(2, "thiyagu"))).toDF.show
      

      When i execute the above snippet with the below command line arguments, it works as expected.

      spark-shell --master yarn  --num-executors 4 --executor-cores 10 --executor-memory 31g
      
      id name
      1 varadharajan
      2 thiyagu

      But things become weird, when i increase executor memory beyond 32GB.

      spark-shell --master yarn  --num-executors 4 --executor-cores 10 --executor-memory 32g
      
      id name
      1 ajan
      2  

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              srinathsmn Varadharajan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: