Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1964

Timestamp missing from HiveMetastore types parser

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.0.1, 1.1.0
    • SQL
    • None

    Description

      
      ---------- Forwarded message ----------
      From: dataginjaninja <rickett.stephanie@gmail.com>
      Date: Thu, May 29, 2014 at 8:54 AM
      Subject: Timestamp support in v1.0
      To: dev@spark.incubator.apache.org
      
      
      Can anyone verify which rc  [SPARK-1360] Add Timestamp Support for SQL #275
      <https://github.com/apache/spark/pull/275>   is included in? I am running
      rc3, but receiving errors with TIMESTAMP as a datatype in my Hive tables
      when trying to use them in åçpyspark.
      
      *The error I get:
      *
      14/05/29 15:44:47 INFO ParseDriver: Parsing command: SELECT COUNT(*) FROM
      aol
      14/05/29 15:44:48 INFO ParseDriver: Parse Completed
      14/05/29 15:44:48 INFO metastore: Trying to connect to metastore with URI
      thrift:
      14/05/29 15:44:48 INFO metastore: Waiting 1 seconds before next connection
      attempt.
      14/05/29 15:44:49 INFO metastore: Connected to metastore.
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/opt/spark-1.0.0-rc3/python/pyspark/sql.py", line 189, in hql
          return self.hiveql(hqlQuery)
        File "/opt/spark-1.0.0-rc3/python/pyspark/sql.py", line 183, in hiveql
          return SchemaRDD(self._ssql_ctx.hiveql(hqlQuery), self)
        File
      "/opt/spark-1.0.0-rc3/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
      line 537, in __call__
        File
      "/opt/spark-1.0.0-rc3/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line
      300, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o14.hiveql.
      : java.lang.RuntimeException: Unsupported dataType: timestamp
      
      *The table I loaded:*
      DROP TABLE IF EXISTS aol;
      CREATE EXTERNAL TABLE aol (
              userid STRING,
              query STRING,
              query_time TIMESTAMP,
              item_rank INT,
              click_url STRING)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY '\t'
      LOCATION '/tmp/data/aol';
      
      *The pyspark commands:*
      from pyspark.sql import HiveContext
      hctx= HiveContext(sc)
      results = hctx.hql("SELECT COUNT(*) FROM aol").collect()
      

      Attachments

        Activity

          People

            marmbrus Michael Armbrust
            marmbrus Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: