Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10914

UnsafeRow serialization breaks when two machines have different Oops size

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0, 1.5.1
    • 1.5.2, 1.6.0
    • SQL
    • Ubuntu 14.04 (spark-slave), 12.04 (master)

    Description

      Updated description (by rxin on Oct 8, 2015)

      To reproduce, launch Spark using

      MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
      

      And then run the following

      scala> sql("select 1 xx").collect()
      

      The problem is that UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).

      Original bug report description:

      Using an inner join, to match together two integer columns, I generally get no results when there should be matches. But the results vary and depend on whether the dataframes are coming from SQL, JSON, or cached, as well as the order in which I cache things and query them.

      This minimal example reproduces it consistently for me in the spark-shell, on new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from http://spark.apache.org/downloads.html.)

      /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
      val x = sql("select 1 xx union all select 2") 
      val y = sql("select 1 yy union all select 2")
      
      x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
      /* If I cache both tables it works: */
      x.cache()
      y.cache()
      x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
      
      /* but this still doesn't work: */
      x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rxin Reynold Xin
            benm Ben Moran
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment