Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7090

Improve management of Optional(Nullable) / Required(Not nullable) type at least for parquet storage

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • Storage - Parquet
    • None

    Description

      It will be useful to have the ability to precise/define/cast the "mode" of columns for Parquet storage.

      Example of problem without this possibility : several files are created by different methods/process. all the files have the same columns. When requested all the file and group on a column

      SELECT source, count(*) FROM ....`ALL` GROUP BY source;
      =>
      java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change 
      Prior schema : BatchSchema [fields=[[`source` (VARCHAR:REQUIRED)]], selectionVector=NONE] 
      New schema : BatchSchema [fields=[[`source` (VARCHAR:OPTIONAL)]], selectionVector=NONE]
      

      Because source has different way of generation (example : use of a const, use of dir0*).

      It will be nice to have the ability to define himself the nullable attribute (required/optional) or at least the ability to cast on read the mode/type of the field - it will allows a better homogeneity of the files and avoid crash on simple operation like aggregation.

       

      In a surprising way,

      • dir0 => varchar<NULLABLE>
      • '' => varchar<NOT NULL>
      • coalesce(dir0, '') => varchar<NULLABLE>  ???

      User should have the ability to overrule the system choice to define if the column mode is required or optional

      Attachments

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: