Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7937

Cannot compare Hive named_struct. (when using argmax, argmin)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • SQL

    Description

      Imagine the following SQL:

      Intention: get last used bank account country.

      select bank_account_id, 
        max(named_struct(
          'src_row_update_ts', unix_timestamp(src_row_update_ts,'yyyy/M/D HH:mm:ss'), 
          'bank_country', bank_country)).bank_country 
      from bank_account_monthly
      where year_month='201502' 
      group by bank_account_id
      

      =>

      Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 94 in stage 96.0 failed 4 times, most recent failure: Lost task 94.3 in stage 96.0 (TID 22281, xxxx): java.lang.RuntimeException: Type StructType(StructField(src_row_update_ts,LongType,true), StructField(bank_country,StringType,true)) does not support ordered operations
              at scala.sys.package$.error(package.scala:27)
              at org.apache.spark.sql.catalyst.expressions.LessThan.ordering$lzycompute(predicates.scala:222)
              at org.apache.spark.sql.catalyst.expressions.LessThan.ordering(predicates.scala:215)
              at org.apache.spark.sql.catalyst.expressions.LessThan.eval(predicates.scala:235)
              at org.apache.spark.sql.catalyst.expressions.MaxFunction.update(aggregates.scala:147)
              at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:165)
              at org.apache.spark.sql.execution.Aggregate$$anonfun$doExecute$1$$anonfun$7.apply(Aggregate.scala:149)
              at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
              at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
              at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
              at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
              at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
              at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
              at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
              at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
              at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
              at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
              at org.apache.spark.scheduler.Task.run(Task.scala:70)
              at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:724)
      

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            huangjs Jianshi Huang
            Reynold Xin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: