Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4694

CTAS in JSON format produces extraneous NULL fields

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: None
    • Component/s: Storage - JSON
    • Labels:

      Description

      Consider the following JSON file:

      // file t2.json
      {
      "X" : {
        "key1" : "value1",
        "key2" : "value2"
        } 
      }
      {
      "X" : {
        "key3" : "value3",
        "key4" : "value4"
        }
      }
      {
      "X" : {
        "key5" : "value5",
        "key6" : "value6"
        }
      }
      

      Now create a table in Json format using CTAS:

      0: jdbc:drill:zk=local> alter session set `store.format` = 'json';
      
      0: jdbc:drill:zk=local> create table dfs.tmp.jt12 as select t.`X` from `t2.json` t;
      +-----------+----------------------------+
      | Fragment  | Number of records written  |
      +-----------+----------------------------+
      | 0_0       | 3                          |
      +-----------+----------------------------+
      

      The output file has rows with union schema of all the fields in all the records. This creates extraneous Null fields in the output:

      $ cat jt12/0_0_0.json 
      {
        "X" : {
          "key1" : "value1",
          "key2" : "value2",
          "key3" : null,
          "key4" : null,
          "key5" : null,
          "key6" : null
        }
      } {
        "X" : {
          "key1" : null,
          "key2" : null,
          "key3" : "value3",
          "key4" : "value4",
          "key5" : null,
          "key6" : null
        }
      } {
        "X" : {
          "key1" : null,
          "key2" : null,
          "key3" : null,
          "key4" : null,
          "key5" : "value5",
          "key6" : "value6"
        }
      }
      

      Note that if I change the output format to CSV or Parquet, there are no Null fields created in the output file. The expectation for a CTAS in json format is that the output should match that of the input json data.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                parthc Parth Chandra
                Reporter:
                amansinha100 Aman Sinha
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: