[SPARK-1964] Timestamp missing from HiveMetastore types parser - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.0.1, 1.1.0
Component/s: SQL
Labels:
None

Description


---------- Forwarded message ----------
From: dataginjaninja <rickett.stephanie@gmail.com>
Date: Thu, May 29, 2014 at 8:54 AM
Subject: Timestamp support in v1.0
To: dev@spark.incubator.apache.org


Can anyone verify which rc  [SPARK-1360] Add Timestamp Support for SQL #275
<https://github.com/apache/spark/pull/275>   is included in? I am running
rc3, but receiving errors with TIMESTAMP as a datatype in my Hive tables
when trying to use them in åçpyspark.

*The error I get:
*
14/05/29 15:44:47 INFO ParseDriver: Parsing command: SELECT COUNT(*) FROM
aol
14/05/29 15:44:48 INFO ParseDriver: Parse Completed
14/05/29 15:44:48 INFO metastore: Trying to connect to metastore with URI
thrift:
14/05/29 15:44:48 INFO metastore: Waiting 1 seconds before next connection
attempt.
14/05/29 15:44:49 INFO metastore: Connected to metastore.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark-1.0.0-rc3/python/pyspark/sql.py", line 189, in hql
    return self.hiveql(hqlQuery)
  File "/opt/spark-1.0.0-rc3/python/pyspark/sql.py", line 183, in hiveql
    return SchemaRDD(self._ssql_ctx.hiveql(hqlQuery), self)
  File
"/opt/spark-1.0.0-rc3/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
line 537, in __call__
  File
"/opt/spark-1.0.0-rc3/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o14.hiveql.
: java.lang.RuntimeException: Unsupported dataType: timestamp

*The table I loaded:*
DROP TABLE IF EXISTS aol;
CREATE EXTERNAL TABLE aol (
        userid STRING,
        query STRING,
        query_time TIMESTAMP,
        item_rank INT,
        click_url STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/tmp/data/aol';

*The pyspark commands:*
from pyspark.sql import HiveContext
hctx= HiveContext(sc)
results = hctx.hql("SELECT COUNT(*) FROM aol").collect()

Attachments

Activity

People

Assignee:: Michael Armbrust

Reporter:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/May/14 16:46

Updated:: 19/Jun/14 05:37

Resolved:: 19/Jun/14 05:37