Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
-
None
Description
It will be useful to have the ability to precise/define/cast the "mode" of columns for Parquet storage.
Example of problem without this possibility : several files are created by different methods/process. all the files have the same columns. When requested all the file and group on a column
SELECT source, count(*) FROM ....`ALL` GROUP BY source; => java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema change Prior schema : BatchSchema [fields=[[`source` (VARCHAR:REQUIRED)]], selectionVector=NONE] New schema : BatchSchema [fields=[[`source` (VARCHAR:OPTIONAL)]], selectionVector=NONE]
Because source has different way of generation (example : use of a const, use of dir0*).
It will be nice to have the ability to define himself the nullable attribute (required/optional) or at least the ability to cast on read the mode/type of the field - it will allows a better homogeneity of the files and avoid crash on simple operation like aggregation.
In a surprising way,
- dir0 => varchar<NULLABLE>
- '' => varchar<NOT NULL>
- coalesce(dir0, '') => varchar<NULLABLE> ???
User should have the ability to overrule the system choice to define if the column mode is required or optional