Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
I made a Kudu table that was a clone of a regular Impala table. Trying to INSERT...SELECT all the data from Impala to Kudu gave an error that the TIMESTAMP type wasn't supported, although the Kudu docs say that TIMESTAMP is OK:
create TABLE log_ingest_docs_kudu ( id bigint , ip string , f2 string , f3 string , the_date string , method string , path string , status smallint , size bigint , referer string , agent string , is_search_term boolean , search_term string , is_doc_page boolean , doc_page string) TBLPROPERTIES ( 'kudu.master_addresses'='yadayada:7051' , 'kudu.key_columns'='id' , 'kudu.table_name'='log_ingest_docs_kudu' , 'storage_handler'='com.cloudera.kudu.hive.KuduStorageHandler' ); [localhost:21000] > insert into log_ingest_docs_kudu select row_number() over (order by the_date, path, ip) as id, * from log_ingest_docs_parquet; Query: insert into log_ingest_docs_kudu select row_number() over (order by the_date, path, ip) as id, * from log_ingest_docs_parquet ERROR: AnalysisException: Possible loss of precision for target table 'weblogs.log_ingest_docs_kudu'. Expression 'weblogs.log_ingest_docs_parquet.the_date' (type: TIMESTAMP) would need to be cast to STRING for column 'the_date'
The Impala table is unpartitioned and uses Parquet file format, if that's significant.
The Kudu table is also unpartitioned. That is slightly significant because I got the original INSERT...SELECT to work by changing the type of the TIMESTAMP column, but then I ran into a similar INSERT...SELECT bug involving a BOOLEAN column, that only occurred if the Kudu table used partitioning. (I'll open a separate JIRA for that.)