[DRILL-4694] CTAS in JSON format produces extraneous NULL fields - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: Storage - JSON
Labels:
- documentation

Description

Consider the following JSON file:

// file t2.json
{
"X" : {
  "key1" : "value1",
  "key2" : "value2"
  } 
}
{
"X" : {
  "key3" : "value3",
  "key4" : "value4"
  }
}
{
"X" : {
  "key5" : "value5",
  "key6" : "value6"
  }
}

Now create a table in Json format using CTAS:

0: jdbc:drill:zk=local> alter session set `store.format` = 'json';

0: jdbc:drill:zk=local> create table dfs.tmp.jt12 as select t.`X` from `t2.json` t;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 3                          |
+-----------+----------------------------+

The output file has rows with union schema of all the fields in all the records. This creates extraneous Null fields in the output:

$ cat jt12/0_0_0.json 
{
  "X" : {
    "key1" : "value1",
    "key2" : "value2",
    "key3" : null,
    "key4" : null,
    "key5" : null,
    "key6" : null
  }
} {
  "X" : {
    "key1" : null,
    "key2" : null,
    "key3" : "value3",
    "key4" : "value4",
    "key5" : null,
    "key6" : null
  }
} {
  "X" : {
    "key1" : null,
    "key2" : null,
    "key3" : null,
    "key4" : null,
    "key5" : "value5",
    "key6" : "value6"
  }
}

Note that if I change the output format to CSV or Parquet, there are no Null fields created in the output file. The expectation for a CTAS in json format is that the output should match that of the input json data.

Attachments

Issue Links

links to

GitHub Pull Request #514

Activity

People

Assignee:: Parth Chandra

Reporter:: Aman Sinha

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 25/May/16 18:56

Updated:: 22/Jul/16 21:28

Resolved:: 07/Jun/16 17:10