Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently we have a validation that if there are two carbondata files in a location with different schema, then we fail the query. I think there is no need to fail. If you see the parquet behavior also we cna understand.
Here i think failing is not good, we can read the latets schema from latest carbondata file in the given location and based on that read all the files and give query output. For the columns which are not present in some data files, it wil have null values for the new column.
But here basically we do not merge schema. we can maintain the same now also, only thing is can take latest schma.
for example:
1. one data file with columns a,b and c. 2nd file is with columns a,b,c,d,e. then can read and create table with 5 columns or 3 columns which ever is latest and create table(This will be when user does not specify schema). If he species table will be created with specified schema.
Attachments
Issue Links
- links to