Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10893

Lag Analytic function broken

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.5.0
    • None
    • Spark Core, SQL
    • None
    • Spark Standalone Cluster on Linux

    Description

      Trying to aggregate with the LAG Analytic function gives the wrong result. In my testcase it was always giving the fixed value '103079215105' when I tried to run on an integer.
      Note that this only happens on Spark 1.5.0, and only when running in cluster mode.
      It works fine when running on Spark 1.4.1, or when running in local mode.
      I did not test on a yarn cluster.
      I did not test other analytic aggregates.

      Input Jason:

      /home/app/input.json
      {"VAA":"A", "VBB":1}
      {"VAA":"B", "VBB":-1}
      {"VAA":"C", "VBB":2}
      {"VAA":"d", "VBB":3}
      {"VAA":null, "VBB":null}
      

      Java:

          SparkContext sc = new SparkContext(conf);
          HiveContext sqlContext = new HiveContext(sc);
          DataFrame df = sqlContext.read().json("file:///home/app/input.json");
          
          df = df.withColumn(
            "previous",
            lag(dataFrame.col("VBB"), 1)
              .over(Window.orderBy(dataFrame.col("VAA")))
            );
      

      Important to understand the conditions under which the job ran, I submitted to a standalone spark cluster in client mode as follows:

      spark-submit \
        --master spark:\\xxxxxx:7077 \
        --deploy-mode client \
        --class package.to.DriverClass \
        --driver-java-options -Dhdp.version=2.2.0.0–2041 \
        --num-executors 2 \
        --driver-memory 2g \
        --executor-memory 2g \
        --executor-cores 2 \
        /path/to/sample-program.jar
      

      Expected Result:

      {"VAA":null, "VBB":null, "previous":null}
      {"VAA":"A", "VBB":1, "previous":null}
      {"VAA":"B", "VBB":-1, "previous":1}
      {"VAA":"C", "VBB":2, "previous":-1}
      {"VAA":"d", "VBB":3, "previous":2}
      

      Actual Result:

      {"VAA":null, "VBB":null, "previous":103079215105}
      {"VAA":"A", "VBB":1, "previous":103079215105}
      {"VAA":"B", "VBB":-1, "previous":103079215105}
      {"VAA":"C", "VBB":2, "previous":103079215105}
      {"VAA":"d", "VBB":3, "previous":103079215105}
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jdesmet Jo Desmet
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: