Description
For the case:
SELECT * FROM [table]
JDBC direct reads the table backing data, versus cranking up a MR and creating a result set. This report is another direct read JDBC issue with TIMESTAMPS, see HIVE-8297 also.
As in title, a succeeding row with no value corrupts the value read for the current row. To reproduce using beeline:
1) Create this file as follows in HDFS.
$ cat > /tmp/ts2.txt
2014-09-28 00:00:00,2014-09-28 00:00:00,
,,
<ctrl-D>
$ hadoop fs -copyFromLocal /tmp/ts2.txt /tmp/ts2.txt
2) In beeline load above HDFS data to a TEXTFILE table:
$ beeline
> !connect jdbc:hive2://<host>:<port>/<db> hive pass org.apache.hive.jdbc.HiveDriver
> drop table `TIMESTAMP_TEXT2`;
> CREATE TABLE `TIMESTAMP_TEXT2` (`ts1` TIMESTAMP, `ts2` TIMESTAMP) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' LINES TERMINATED BY '\012' STORED AS TEXTFILE;
> LOAD DATA INPATH '/tmp/ts2.txt' OVERWRITE INTO TABLE
`TIMESTAMP_TEXT2`;
3) To demonstrate the corrupt data read, in beeline:
> select * from `TIMESTAMP_TEXT2`;
Note 1: The incorrect conduct demonstrated above replicates with a standalone Java/JDBC program.
Note 2: Don't know if this is an issue with any other data types, also don't know what releases affected, however this occurs in Hive 13. Hive CLI works fine. Also works fine if you force a MR:
select * from `TIMESTAMP_TEXT2` where 1=1;