Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-45

Support MAP type

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.2
    • core, sql
    • None

    Description

      >>CREATE TABLE table1 (
                       deviceInformationId int,
                       channelsId string,
                       props map<key:int,value:string>)
                    STORED BY 'org.apache.carbondata.format'
      
      >>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
      

      format of data to be read from csv, with '$' as level 1 delimiter and map keys terminated by '#'

      >>load data local inpath '/tmp/data.csv' into table1 options ('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 'COMPLEX_DELIMITER_FOR_KEY'='#')
      
      20,channel2,2#user2$100#usercommon
      30,channel3,3#user3$100#usercommon
      40,channel4,4#user3$100#usercommon
      
      >>select channelId, props[100] from table1 where deviceInformationId > 10;
      
      20, usercommon
      30, usercommon
      40, usercommon
      
      >>select channelId, props from table1 where props[2] = 'user2';
      
      20, {2,'user2', 100, 'usercommon'}
      

      Following cases needs to be handled:

      Sub feature Pending activity Remarks
      Basic Maptype support Develop Create table DDL, Load map data from CSV, select * from maptable
      Maptype lookup in projection and filter Develop Projection and filters needs execution at spark
      NULL values, UDFs, Describe support Develop
      Compaction support Test + fix As compaction works at byte level, no changes required. Needs to add test-cases
      Insert into table Develop Source table data containing Map data needs to convert from spark datatype to string , as carbon takes string as input row
      Support DDL for Map fields Dictionary include and Dictionary Exclude Develop Also needs to handle CarbonDictionaryDecoder to handle the same.
      Support multilevel Map Develop currently DDL is validated to allow only 2 levels, remove this restriction
      Support Map value to be a measure Develop Currently array and struct supports only dimensions which needs change
      Support Alter table to add and remove Map column Develop implement DDL and requires default value handling
      Projections of Map loopup push down to carbon Develop this is an optimization, when more number of values are present in Map
      Filter map loolup push down to carbon Develop this is an optimization, when more number of values are present in Map
      Update Map values Develop update map value

      Design suggestion:

      Map can be represented internally stored as Array<Struct<key,Value>>, So that conversion of data is required to Map data type while giving to spark. Schema will have new column of map type similar to Array.

      Attachments

        1. MAP DATA-TYPE SUPPORT.pdf
          108 kB
          Manish Gupta

        Issue Links

          Activity

            People

              manishgupta_88 Manish Gupta
              cenyuhai cen yuhai
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 23h 20m
                  23h 20m