Description
Impala is using INT96 for timestamps. Spark SQL should be able to read this data despite the fact that it is not part of the spec.
Perhaps adding a flag to act like impala when reading parquet (like we do for strings already) would be useful.
Here's an example of the error you might see:
Caused by: java.lang.RuntimeException: Potential loss of precision: cannot convert INT96 at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$.toPrimitiveDataType(ParquetTypes.scala:61) at org.apache.spark.sql.parquet.ParquetTypesConverter$.toDataType(ParquetTypes.scala:113) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:314) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$convertToAttributes$1.apply(ParquetTypes.scala:311) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToAttributes(ParquetTypes.scala:310) at org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:441) at org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:66) at org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:141)
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-4709 Spark SQL support error reading Parquet with timestamp type field
- Resolved
-
SPARK-4987 Parquet support for timestamp type
- Resolved
- is related to
-
SPARK-4987 Parquet support for timestamp type
- Resolved