Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6359

All-text mode in JSON still reads missing column as Nullable Int

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.13.0, 1.14.0
    • None
    • None
    • None

    Description

      Suppose we have the following file:

      {a: 0}
      {a: 1}
      ...
      {a: 70001, b: 10.5}
      

      Where the "..." indicates another 70K records. (Chosen to force the appearance of b into a second or later batch.)

      Suppose we execute the following query:

      ALTER SESSION SET `store.json.all_text_mode` = true;
      SELECT a, b FROM `70Kmissing.json` WHERE b IS NOT NULL ORDER BY a;
      

      The query should work. We have an explicit project for column b and we've told JSON to always use text. So, JSON should have enough information to create column b as Nullable VarChar.

      Yet, the result of the query in sqlline is:

      Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type.
      
      Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE]
      Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)]], selectionVector=NONE]
      

      The expected result is that the query works because even missing columns should be subject to the "all text mode" setting because the JSON reader handles projection push-down, and is responsible for filling in the missing columns.

      This is with the shipping Drill 1.13 JSON reader. I think this is fixed in the "batch size handling" JSON reader rewrite, but I've not tested it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            paul-rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: