Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.15.0
Description
When the Drill is querying documents from MapRDBJSON that contain fields with null value, it returns the wrong result.
The issue is locally reproduced.
Please find the repro steps:
[1] Create a MaprDBJSON table. Say '/tmp/dmdb2/'.
[2] Insert the following sample records to table:
insert --table /tmp/dmdb2/ --value '{"_id": "1", "label": "person", "confidence": 0.24}' insert --table /tmp/dmdb2/ --value '{"_id": "2", "label": "person2"}' insert --table /tmp/dmdb2/ --value '{"_id": "3", "label": "person3", "confidence": 0.54}' insert --table /tmp/dmdb2/ --value '{"_id": "4", "label": "person4", "confidence": null}'
We can see that for field 'confidence' document 1 has value 0.24, document 3 has value 0.54, document 2 does not have the field and document 4 has the field with value null.
[3] Query the table from DRILL.
Query 1:
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2; +----------+-------------+ | label | confidence | +----------+-------------+ | person | 0.24 | | person2 | null | | person3 | 0.54 | | person4 | null | +----------+-------------+ 4 rows selected (0.2 seconds)
Query 2:
0: jdbc:drill:> select * from dfs.tmp.dmdb2; +------+-------------+----------+ | _id | confidence | label | +------+-------------+----------+ | 1 | 0.24 | person | | 2 | null | person2 | | 3 | 0.54 | person3 | | 4 | null | person4 | +------+-------------+----------+ 4 rows selected (0.174 seconds)
Query 3:
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is not null; +----------+-------------+ | label | confidence | +----------+-------------+ | person | 0.24 | | person3 | 0.54 | | person4 | null | +----------+-------------+ 3 rows selected (0.192 seconds)
Query 4:
0: jdbc:drill:> select label,confidence from dfs.tmp.dmdb2 where confidence is null; +----------+-------------+ | label | confidence | +----------+-------------+ | person2 | null | +----------+-------------+ 1 row selected (0.262 seconds)
As you can see, Query 3 which queries for all documents with confidence value 'is not null', returns a document with null value.
Other observation:
Querying the same data using DRILL without MapRDB provides the correct result.
For example, create 4 different JSON files with following data:
{"label": "person2"} {"label": "person3", "confidence": 0.54} {"label": "person4", "confidence": null}
Query it directly using DRILL:
Query 5:
0: jdbc:drill:> select label,confidence from dfs.tmp.t2; +----------+-------------+ | label | confidence | +----------+-------------+ | person4 | null | | person3 | 0.54 | | person2 | null | | person | 0.24 | +----------+-------------+ 4 rows selected (0.203 seconds)
Query 6:
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is null; +----------+-------------+ | label | confidence | +----------+-------------+ | person4 | null | | person2 | null | +----------+-------------+ 2 rows selected (0.352 seconds)
Query 7:
0: jdbc:drill:> select label,confidence from dfs.tmp.t2 where confidence is not null;
+----------+-------------+
| label | confidence |
+----------+-------------+
| person3 | 0.54 |
| person | 0.24 |
+----------+-------------+
2 rows selected (0.265 seconds)
As seen in query 6 & 7, it returns the correct result.
I believe the issue is at the MapRDB layer where it is fetching the results.
Attachments
Issue Links
- links to