Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
1.9.0
-
None
-
None
-
4 node cluster
Description
incorrect results : Query on directory containing CSV data
directory has 4534327 number of rows ~ 4.5M records (there are 6 CSV files)
Drill 1.9.0 commit ID: f3c26e34
I can share the data to reproduce the issue.
Note that data in columns[3] has the value "B02512\r" in query results.
0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` limit 5; +----------------------------------------------------------+ | columns | +----------------------------------------------------------+ | ["2014-08-01 00:03:00","40.7366","-73.9906","B02512\r"] | | ["2014-08-01 00:09:00","40.726","-73.9918","B02512\r"] | | ["2014-08-01 00:12:00","40.7209","-74.0507","B02512\r"] | | ["2014-08-01 00:12:00","40.7387","-73.9856","B02512\r"] | | ["2014-08-01 00:12:00","40.7323","-74.0077","B02512\r"] | +----------------------------------------------------------+ 5 rows selected (0.184 seconds)
But when we do a select on columns[3] we see a different value in the query result.
0: jdbc:drill:schema=dfs.tmp> select columns[3] from `uber_trip_data` limit 5; +----------+ | EXPR$0 | +----------+ |02512 |02512 |02512 |02512 |02512 +----------+ 5 rows selected (0.159 seconds)
Searching for 'B02512' returns no rows. (where as it should have returned data)
0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` where columns[3]='B02512'; +----------+ | columns | +----------+ +----------+ No rows selected (1.707 seconds)