Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15156

String fields in Dataframe behaves weirdly when executor-memory >= 32GB

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      Consider the below snippet. This issue was faced on an infrastructure with CDH 5.5.2 (which ships with Spark 1.5.0).

      test.scala
      case class A(id: Int, name: String)
      sc.parallelize(List(A(1, "varadharajan"), A(2, "thiyagu"))).toDF.show
      

      When i execute the above snippet with the below command line arguments, it works as expected.

      spark-shell --master yarn  --num-executors 4 --executor-cores 10 --executor-memory 31g
      
      id name
      1 varadharajan
      2 thiyagu

      But things become weird, when i increase executor memory beyond 32GB.

      spark-shell --master yarn  --num-executors 4 --executor-cores 10 --executor-memory 32g
      
      id name
      1 ajan
      2  

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                srinathsmn Varadharajan
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: