Description
>>CREATE TABLE table1 ( deviceInformationId int, channelsId string, props map<key:int,value:string>) STORED BY 'org.apache.carbondata.format' >>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
format of data to be read from csv, with '$' as level 1 delimiter and map keys terminated by '#'
>>load data local inpath '/tmp/data.csv' into table1 options ('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 'COMPLEX_DELIMITER_FOR_KEY'='#') 20,channel2,2#user2$100#usercommon 30,channel3,3#user3$100#usercommon 40,channel4,4#user3$100#usercommon >>select channelId, props[100] from table1 where deviceInformationId > 10; 20, usercommon 30, usercommon 40, usercommon >>select channelId, props from table1 where props[2] = 'user2'; 20, {2,'user2', 100, 'usercommon'}
Following cases needs to be handled:
Sub feature | Pending activity | Remarks |
---|---|---|
Basic Maptype support | Develop | Create table DDL, Load map data from CSV, select * from maptable |
Maptype lookup in projection and filter | Develop | Projection and filters needs execution at spark |
NULL values, UDFs, Describe support | Develop | |
Compaction support | Test + fix | As compaction works at byte level, no changes required. Needs to add test-cases |
Insert into table | Develop | Source table data containing Map data needs to convert from spark datatype to string , as carbon takes string as input row |
Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop | Also needs to handle CarbonDictionaryDecoder to handle the same. |
Support multilevel Map | Develop | currently DDL is validated to allow only 2 levels, remove this restriction |
Support Map value to be a measure | Develop | Currently array and struct supports only dimensions which needs change |
Support Alter table to add and remove Map column | Develop | implement DDL and requires default value handling |
Projections of Map loopup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
Filter map loolup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
Update Map values | Develop | update map value |
Design suggestion:
Map can be represented internally stored as Array<Struct<key,Value>>, So that conversion of data is required to Map data type while giving to spark. Schema will have new column of map type similar to Array.
Attachments
Attachments
Issue Links
- duplicates
-
CARBONDATA-737 Add Map datatype support as Hive
- Closed
1.
|
Basic Maptype support | Open | Kunal Kapoor | |||||||||
2.
|
Load DDL support for Map DataType | Resolved | Unassigned | |||||||||
3.
|
Create Table DDL support for Map DataType | Resolved | Unassigned | |||||||||
4.
|
SDK support for Map DataType | Resolved | Manish Gupta |
|
||||||||
5.
|
Add support for complex map type through spark carbon file format API | Resolved | Manish Gupta |
|
||||||||
6.
|
Create DDL Support for Map Type | Resolved | MANISH NALLA |
|