Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11009

RowNumber in HiveContext returns negative values in cluster mode

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.5.2, 1.6.0
    • Component/s: Spark Core
    • Labels:
      None
    • Environment:

      Standalone cluster mode. No hadoop/hive is present in the environment (no hive-site.xml), only using HiveContext. Spark build as with hadoop 2.6.0. Default spark configuration variables. cluster has 4 nodes, but happens with n nodes as well.

      Description

      This issue happens when submitting the job into a standalone cluster. Have not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 does not fix the issue. Also tried having only one node in the cluster, with same result. Other shuffle configuration changes do not alter the results either.

      The issue does NOT happen in --master local[*].

      val ws = Window.
      partitionBy("client_id").
      orderBy("date")

      val nm = "repeatMe"
      df.select(df.col("*"), rowNumber().over(ws).as(nm))

      df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_))

      --->

      Long, DateType, Int
      [219483904822,2006-06-01,-1863462909]
      [219483904822,2006-09-01,-1863462909]
      [219483904822,2007-01-01,-1863462909]
      [219483904822,2007-08-01,-1863462909]
      [219483904822,2007-07-01,-1863462909]
      [192489238423,2007-07-01,-1863462774]
      [192489238423,2007-02-01,-1863462774]
      [192489238423,2006-11-01,-1863462774]
      [192489238423,2006-08-01,-1863462774]
      [192489238423,2007-08-01,-1863462774]
      [192489238423,2006-09-01,-1863462774]
      [192489238423,2007-03-01,-1863462774]
      [192489238423,2006-10-01,-1863462774]
      [192489238423,2007-05-01,-1863462774]
      [192489238423,2006-06-01,-1863462774]
      [192489238423,2006-12-01,-1863462774]

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                davies Davies Liu
                Reporter:
                saif.a.ellafi Saif Addin Ellafi
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: