Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4768

Add Support For Impala Encoded Timestamp (INT96)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • None
    • 1.3.0
    • SQL
    • None

    Description

      Impala is using INT96 for timestamps. Spark SQL should be able to read this data despite the fact that it is not part of the spec.

      Perhaps adding a flag to act like impala when reading parquet (like we do for strings already) would be useful.

      Here's an example of the error you might see:

      Caused by: java.lang.RuntimeException: Potential loss of precision: cannot convert INT96
              at scala.sys.package$.error(package.scala:27)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
              at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
              at scala.collection.Iterator$class.foreach(Iterator.scala:727)
              at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
              at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
              at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
              at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
              at scala.collection.AbstractTraversable.map(Traversable.scala:105)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441)
              at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:66)
              at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
      

      Attachments

        1. string_timestamp.gz
          0.7 kB
          Taiji Okada
        2. 5e4481a02f951e29-651ee94ed14560bf_922627129_data.0.parq
          0.4 kB
          Taiji Okada

        Issue Links

          Activity

            People

              yhuai Yin Huai
              cheffpj Pat McDonough
              Votes:
              8 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: