Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1964

Timestamp missing from HiveMetastore types parser

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      
      ---------- Forwarded message ----------
      From: dataginjaninja <rickett.stephanie@gmail.com>
      Date: Thu, May 29, 2014 at 8:54 AM
      Subject: Timestamp support in v1.0
      To: dev@spark.incubator.apache.org
      
      
      Can anyone verify which rc  [SPARK-1360] Add Timestamp Support for SQL #275
      <https://github.com/apache/spark/pull/275>   is included in? I am running
      rc3, but receiving errors with TIMESTAMP as a datatype in my Hive tables
      when trying to use them in åçpyspark.
      
      *The error I get:
      *
      14/05/29 15:44:47 INFO ParseDriver: Parsing command: SELECT COUNT(*) FROM
      aol
      14/05/29 15:44:48 INFO ParseDriver: Parse Completed
      14/05/29 15:44:48 INFO metastore: Trying to connect to metastore with URI
      thrift:
      14/05/29 15:44:48 INFO metastore: Waiting 1 seconds before next connection
      attempt.
      14/05/29 15:44:49 INFO metastore: Connected to metastore.
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/opt/spark-1.0.0-rc3/python/pyspark/sql.py", line 189, in hql
          return self.hiveql(hqlQuery)
        File "/opt/spark-1.0.0-rc3/python/pyspark/sql.py", line 183, in hiveql
          return SchemaRDD(self._ssql_ctx.hiveql(hqlQuery), self)
        File
      "/opt/spark-1.0.0-rc3/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
      line 537, in __call__
        File
      "/opt/spark-1.0.0-rc3/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line
      300, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o14.hiveql.
      : java.lang.RuntimeException: Unsupported dataType: timestamp
      
      *The table I loaded:*
      DROP TABLE IF EXISTS aol;
      CREATE EXTERNAL TABLE aol (
              userid STRING,
              query STRING,
              query_time TIMESTAMP,
              item_rank INT,
              click_url STRING)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY '\t'
      LOCATION '/tmp/data/aol';
      
      *The pyspark commands:*
      from pyspark.sql import HiveContext
      hctx= HiveContext(sc)
      results = hctx.hql("SELECT COUNT(*) FROM aol").collect()
      

        Attachments

          Activity

            People

            • Assignee:
              marmbrus Michael Armbrust
              Reporter:
              marmbrus Michael Armbrust
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: