Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
1.4.0
-
None
-
None
-
4 node cluster CentOS
Description
When there is a star in project and there are many nulls followed by a non null value in the column c1 in JSON file, query does not return correct results.
json file with 4096 records that have a field
{c1: null}followed by the 4097th record that has
{c1: "Hello World"}Drill version
http://yum.qa.lab/drill-opensource.ebf/mapr-drill-1.4.0.201603151147-1.noarch.rpm
0: jdbc:drill:schema=dfs.tmp> select * from sys.version; +---------+-----------+----------------+-------------+-------------+------------+ | version | commit_id | commit_message | commit_time | build_email | build_time | +---------+-----------+----------------+-------------+-------------+------------+ | 1.4.0 | 99ec15919cd73f5e7b976f6193af45c0e46ffd7a | DRILL-4479: For empty fields under all_text_mode enabled (a) use varchar for the default columns and (b) ensure we create fields corresponding to all columns. | 14.03.2016 @ 20:52:13 UTC | Unknown | 15.03.2016 @ 18:47:50 UTC | +---------+-----------+----------------+-------------+-------------+------------+ 1 row selected (0.375 seconds) 0: jdbc:drill:schema=dfs.tmp> alter system set `store.json.all_text_mode`=true; +-------+------------------------------------+ | ok | summary | +-------+------------------------------------+ | true | store.json.all_text_mode updated. | +-------+------------------------------------+ 1 row selected (0.136 seconds) 0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 = 'Hello World'; Error: SYSTEM ERROR: NumberFormatException: Hello World Fragment 0:0 [Error Id: f9022f22-12cd-46c7-b471-470a50506ba2 on centos-04.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 IS NOT NULL; +-------+ | * | +-------+ | null | +-------+ 1 row selected (0.239 seconds) 0: jdbc:drill:schema=dfs.tmp> select t.c1 from `tooManyNulls.json` t where t.c1 = 'Hello World'; +--------------+ | c1 | +--------------+ | Hello World | +--------------+ 1 row selected (0.282 seconds) 0: jdbc:drill:schema=dfs.tmp> select t.c1 from `tooManyNulls.json` t where t.c1 IS NOT NULL; +--------------+ | c1 | +--------------+ | Hello World | +--------------+ 1 row selected (0.259 seconds)
Another interesting observation to note, the below query returns null. However, when the same query is used as a sub-query, we see the query returns correct results.
0: jdbc:drill:schema=dfs.tmp> select * from `tooManyNulls.json` t where t.c1 IS NOT NULL; +-------+ | * | +-------+ | null | +-------+ 1 row selected (0.203 seconds) 0: jdbc:drill:schema=dfs.tmp> select tmp.c1 from (select * from `tooManyNulls.json` t where t.c1 IS NOT NULL) tmp; +--------------+ | c1 | +--------------+ | Hello World | +--------------+ 1 row selected (0.25 seconds)