Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.0
-
None
Description
Consider the following JSON file:
// file t2.json { "X" : { "key1" : "value1", "key2" : "value2" } } { "X" : { "key3" : "value3", "key4" : "value4" } } { "X" : { "key5" : "value5", "key6" : "value6" } }
Now create a table in Json format using CTAS:
0: jdbc:drill:zk=local> alter session set `store.format` = 'json'; 0: jdbc:drill:zk=local> create table dfs.tmp.jt12 as select t.`X` from `t2.json` t; +-----------+----------------------------+ | Fragment | Number of records written | +-----------+----------------------------+ | 0_0 | 3 | +-----------+----------------------------+
The output file has rows with union schema of all the fields in all the records. This creates extraneous Null fields in the output:
$ cat jt12/0_0_0.json { "X" : { "key1" : "value1", "key2" : "value2", "key3" : null, "key4" : null, "key5" : null, "key6" : null } } { "X" : { "key1" : null, "key2" : null, "key3" : "value3", "key4" : "value4", "key5" : null, "key6" : null } } { "X" : { "key1" : null, "key2" : null, "key3" : null, "key4" : null, "key5" : "value5", "key6" : "value6" } }
Note that if I change the output format to CSV or Parquet, there are no Null fields created in the output file. The expectation for a CTAS in json format is that the output should match that of the input json data.
Attachments
Issue Links
- links to