Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.5.0
-
nightly cluster
Description
The attached csv file has 7300 rows. If I register it as an external table and execute a SELECT COUNT query, the result is 2347 rows when I run it on the nightly cluster.
To reproduce, login to root@nightly-2.vpc.cloudera.com. Note that the csv file is already in the temp directory.
Place the attached csv file somewhere on HDFS.
wc -l /tmp/0.csv hadoop fs -mkdir -p /tmp/my_csv hadoop fs -put 0.csv /tmp/my_csv
Open up an impala-shell and execute
CREATE EXTERNAL TABLE my_csv (`id` int, `bool_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, `int_col` int, `bigint_col` bigint, `float_col` float, `double_col` double, `date_string_col` string, `string_col` string, `timestamp_col` timestamp, `year` int, `month` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' LOCATION '/tmp/my_csv' TBLPROPERTIES('serialization.null.format'='#NULL'); SELECT COUNT(*) FROM my_csv;
And you will get 2347 rows.