Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4694

CTAS in JSON format produces extraneous NULL fields

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • None
    • Storage - JSON

    Description

      Consider the following JSON file:

      // file t2.json
      {
      "X" : {
        "key1" : "value1",
        "key2" : "value2"
        } 
      }
      {
      "X" : {
        "key3" : "value3",
        "key4" : "value4"
        }
      }
      {
      "X" : {
        "key5" : "value5",
        "key6" : "value6"
        }
      }
      

      Now create a table in Json format using CTAS:

      0: jdbc:drill:zk=local> alter session set `store.format` = 'json';
      
      0: jdbc:drill:zk=local> create table dfs.tmp.jt12 as select t.`X` from `t2.json` t;
      +-----------+----------------------------+
      | Fragment  | Number of records written  |
      +-----------+----------------------------+
      | 0_0       | 3                          |
      +-----------+----------------------------+
      

      The output file has rows with union schema of all the fields in all the records. This creates extraneous Null fields in the output:

      $ cat jt12/0_0_0.json 
      {
        "X" : {
          "key1" : "value1",
          "key2" : "value2",
          "key3" : null,
          "key4" : null,
          "key5" : null,
          "key6" : null
        }
      } {
        "X" : {
          "key1" : null,
          "key2" : null,
          "key3" : "value3",
          "key4" : "value4",
          "key5" : null,
          "key6" : null
        }
      } {
        "X" : {
          "key1" : null,
          "key2" : null,
          "key3" : null,
          "key4" : null,
          "key5" : "value5",
          "key6" : "value6"
        }
      }
      

      Note that if I change the output format to CSV or Parquet, there are no Null fields created in the output file. The expectation for a CTAS in json format is that the output should match that of the input json data.

      Attachments

        Issue Links

          Activity

            People

              parthc Parth Chandra
              amansinha100 Aman Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: