Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11009

RowNumber in HiveContext returns negative values in cluster mode

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.5.1
    • 1.5.2, 1.6.0
    • Spark Core
    • None
    • Standalone cluster mode. No hadoop/hive is present in the environment (no hive-site.xml), only using HiveContext. Spark build as with hadoop 2.6.0. Default spark configuration variables. cluster has 4 nodes, but happens with n nodes as well.

    Description

      This issue happens when submitting the job into a standalone cluster. Have not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 does not fix the issue. Also tried having only one node in the cluster, with same result. Other shuffle configuration changes do not alter the results either.

      The issue does NOT happen in --master local[*].

      val ws = Window.
      partitionBy("client_id").
      orderBy("date")

      val nm = "repeatMe"
      df.select(df.col("*"), rowNumber().over(ws).as(nm))

      df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_))

      --->

      Long, DateType, Int
      [219483904822,2006-06-01,-1863462909]
      [219483904822,2006-09-01,-1863462909]
      [219483904822,2007-01-01,-1863462909]
      [219483904822,2007-08-01,-1863462909]
      [219483904822,2007-07-01,-1863462909]
      [192489238423,2007-07-01,-1863462774]
      [192489238423,2007-02-01,-1863462774]
      [192489238423,2006-11-01,-1863462774]
      [192489238423,2006-08-01,-1863462774]
      [192489238423,2007-08-01,-1863462774]
      [192489238423,2006-09-01,-1863462774]
      [192489238423,2007-03-01,-1863462774]
      [192489238423,2006-10-01,-1863462774]
      [192489238423,2007-05-01,-1863462774]
      [192489238423,2006-06-01,-1863462774]
      [192489238423,2006-12-01,-1863462774]

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            davies Davies Liu
            saif.a.ellafi Saif Addin Ellafi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment