Description
Run below SQL and get transformation script error for python script like below error message.
Query SQL:
CREATE VIEW q02_spark_sql_engine_validation_power_test_0_temp AS SELECT DISTINCT sessionid, wcs_item_sk FROM ( FROM ( SELECT wcs_user_sk, wcs_item_sk, (wcs_click_date_sk * 24 * 60 * 60 + wcs_click_time_sk) AS tstamp_inSec FROM web_clickstreams WHERE wcs_item_sk IS NOT NULL AND wcs_user_sk IS NOT NULL DISTRIBUTE BY wcs_user_sk SORT BY wcs_user_sk, tstamp_inSec -- "sessionize" reducer script requires the cluster by uid and sort by tstamp ) clicksAnWebPageType REDUCE wcs_user_sk, tstamp_inSec, wcs_item_sk USING 'python q2-sessionize.py 3600' AS ( wcs_item_sk BIGINT, sessionid STRING) ) q02_tmp_sessionize CLUSTER BY sessionid
Error Message:
16/07/06 16:59:02 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 157.0 (TID 171, hw-node5): org.apache.spark.SparkException: Subprocess exited with status 1. Error: Traceback (most recent call last): File "q2-sessionize.py", line 49, in <module> user_sk, tstamp_str, item_sk = line.strip().split("\t") ValueError: too many values to unpack at org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.checkFailureAndPropagate(ScriptTransformation.scala:144) at org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.hasNext(ScriptTransformation.scala:192) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.SparkException: Subprocess exited with status 1. Error: Traceback (most recent call last): File "q2-sessionize.py", line 49, in <module> user_sk, tstamp_str, item_sk = line.strip().split("\t") ValueError: too many values to unpack at org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.checkFailureAndPropagate(ScriptTransformation.scala:144) at org.apache.spark.sql.hive.execution.ScriptTransformation$$anon$1.hasNext(ScriptTransformation.scala:181) ... 14 more 16/07/06 16:59:02 INFO scheduler.TaskSetManager: Lost task 7.0 in stage 157.0 (TID 173) on executor hw-node5: org.apache.spark.SparkException (Subprocess exited with status 1. Error: Traceback (most recent call last): File "q2-sessionize.py", line 49, in <module> user_sk, tstamp_str, item_sk = line.strip().split("\t") ValueError: too many values to unpack ) [duplicate 1]