[SPARK-10310] [Spark SQL] All result records will be popluated into ONE line during the script transform due to missing the correct line/filed delimiter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.1, 1.6.0
Component/s: SQL
Labels:
None

Target Version/s:

1.5.1

Description

There is real case using python stream script in Spark SQL query. We found that all result records were wroten in ONE line as input from "select" pipeline for python script and so it caused script will not identify each record.Other, filed separator in spark sql will be '^A' or '\001' which is inconsistent/incompatible the '\t' in Hive implementation.

Key query:

CREATE VIEW temp1 AS
SELECT *
FROM
(
  FROM
  (
    SELECT
      c.wcs_user_sk,
      w.wp_type,
      (wcs_click_date_sk * 24 * 60 * 60 + wcs_click_time_sk) AS tstamp_inSec
    FROM web_clickstreams c, web_page w
    WHERE c.wcs_web_page_sk = w.wp_web_page_sk
    AND   c.wcs_web_page_sk IS NOT NULL
    AND   c.wcs_user_sk     IS NOT NULL
    AND   c.wcs_sales_sk    IS NULL --abandoned implies: no sale
    DISTRIBUTE BY wcs_user_sk SORT BY wcs_user_sk, tstamp_inSec
  ) clicksAnWebPageType
  REDUCE
    wcs_user_sk,
    tstamp_inSec,
    wp_type
  USING 'python sessionize.py 3600'
  AS (
    wp_type STRING,
    tstamp BIGINT, 
    sessionid STRING)
) sessionized

Key Python script:

for line in sys.stdin:
     user_sk,  tstamp_str, value  = line.strip().split("\t")

Sample SELECT result:

^V31^A3237764860^Afeedback^U31^A3237769106^Adynamic^T31^A3237779027^Areview

Expected result:

31   3237764860   feedback
31   3237769106   dynamic
31   3237779027   review

Attachments

Issue Links

links to

[Github] Pull Request #8476 (zhichao-li)

[Github] Pull Request #8860 (liancheng)

Activity

People

Assignee:: zhichao-li

Reporter:: Yi Zhou

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Aug/15 02:53

Updated:: 23/Sep/15 03:10

Resolved:: 23/Sep/15 03:10