Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4505

Can't group by or sort across files with different schema

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Component/s: Storage - Parquet
    • Labels:
      None
    • Environment:

      Java 1.8

      Description

      We are currently trying out the support for querying across parquet files with different schemas.
      Simple selects work well but when we wan't to do sort or group by Drill returns "UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with changing schemas Fragment 0:0 [Error Id: ff490670-64c1-4fb8-990e-a02aa44ac010 on zookeeper-1:31010]"

      This is despite not even including the new columns in the query.
      Expected result would be to treat the non existing columns in certain files as either null or default value and allow them to be grouped and sorted

      Example
      SELECT APPLICATION_ID ,dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 >='2016-01-01' AND dir2<'2016-04-02' work with changing schema

      but SELECT max(APPLICATION_ID ),dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 >='2016-01-01' AND dir2<'2016-04-02'  group by dir0 does not work

      For us this hampers any possibility to have an evolving schema with moderatly complex queries

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tobad357 Tobias
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: