Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4961

Schema change error due to a missing column in a Json file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.8.0
    • None
    • Execution - Flow
    • None

    Description

      A missing column in a batch defaults to a (hard coded) nullable INT (e.g., see line 128 in ExpressionTreeMaterializer.java), which can cause a schema conflict when that column in another batch has a conflicting type (e.g. VARCHAR).

      To recreate (the following test also created DRILL-4960 ; which may be related) : Run a parallel aggregation over two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name").

      0: jdbc:drill:zk=local> alter session set planner.slice_target = 1;
      -------------------------------------+

      ok summary

      -------------------------------------+

      true planner.slice_target updated.

      -------------------------------------+
      1 row selected (0.091 seconds)
      0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` group by first_name, last_name;
      Error: SYSTEM ERROR: SchemaChangeException: Incoming batches for merging receiver have different schemas!

      Fragment 1:0

      [Error Id: 1315ddc5-5c31-404f-917b-c7a082d016cf on 10.250.57.63:31010] (state=,code=0)

      The above used a streaming aggregation; when switching to hash aggregation the same error manifests differently:

      0: jdbc:drill:zk=local> alter session set `planner.enable_streamagg` = false;
      -----------------------------------------+

      ok summary

      -----------------------------------------+

      true planner.enable_streamagg updated.

      -----------------------------------------+
      1 row selected (0.083 seconds)
      0: jdbc:drill:zk=local> select first_name, last_name from `drill/data/emp` group by first_name, last_name;
      Error: SYSTEM ERROR: IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.drill.exec.vector.NullableVarCharVector, field= last_name(VARCHAR:OPTIONAL)[$bits$(UINT1:REQUIRED), last_name(VARCHAR:OPTIONAL)[$offsets$(UINT4:REQUIRED)]]

      Fragment 2:0

      [Error Id: 58daaaa0-3bfe-4197-b4bd-44f9d7604d77 on 10.250.57.63:31010] (state=,code=0)

      Attachments

        Activity

          People

            Unassigned Unassigned
            ben-zvi Boaz Ben-Zvi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: